Psst! I've posted some details about my notebook setup here.

Streaming data from the SRA with fastq-dump

NCBI's Sequence Read Archive is the go-to repository for published genome-scale sequence data sets. Although there are a variety of ways to download sequence data from SRA, the fastq-dump command from the SRA Toolkit is the most convenient in my opinion. In fact, with a few settings tweaks fastq-dump can stream data directly from the SRA into an analysis pipeline …

more…

Composing generator functions in Python

In which I briefly motivate the utility of generator functions and demonstrate that they can be nested to create a data processing stream.

more…