Notebook streaming
Psst! I've posted some details about my notebook setup here.
Streaming data from the SRA with fastq-dump
NCBI's Sequence Read Archive is the go-to repository for published genome-scale sequence data sets.
Although there are a variety of ways to download sequence data from SRA, the fastq-dump
command from the SRA Toolkit is the most convenient in my opinion.
In fact, with a few settings tweaks fastq-dump
can stream data directly from the SRA into an analysis pipeline …
Composing generator functions in Python
In which I briefly motivate the utility of generator functions and demonstrate that they can be nested to create a data processing stream.
more…