This PLoS ONE paper about stable identifiers for genome projects

When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene that is a representative k-mer, a string of length k, selected from the sequence of that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without requiring that previous annotations be lifted over by sequence alignment. We assign UniqTag identifiers to ten builds of the Ensembl human genome spanning eight years to demonstrate this stability.

Manuscript completely reproducible (automated) using this repository.

This web app for learning the shell in your browser!

The bootcamp tutorial text was adapted from the original by Keith Bradnam.

This checklist of common spreadsheat mistakes

Excel is an arguably horrible tool for data management, but it's a starting place for many scientists. Great guidelines on how to make the spreadsheet a lot less of a frustrating experience.

This quote on academic salesmanship

Once upon a time, I assumed that excellent work would transcend the need for clever marketing and charismatic presentation to sell it to the proper audience, but the longer I work in science, the more I see examples of necessary hard work labeled as incremental and dismissed and attention lavished on the well constructed story. The veracity of which appears secondary, or is assumed.

This from Genome Biology on self-serving motivations for reproducible research

And so, my fellow scientists: ask not what you can do for reproducibility; ask what reproducibility can do for you! Here, I present five reasons why working reproducibly pays off in the long run and is in the self-interest of every ambitious, career-oriented scientist.

This quote about DOIs and citability

So what do we mean when we say a DOI makes something ‘citable?’ If this is shorthand for the properties we would want in something citable: persistent identifier, archival content, machine-readable metadata, than we should start to recognize other things that share these features. Further innovation requires valuing the features the DOI provides, not simply a “brand name” researchers recognize.

This tweet on software debugging :-)

Comments