Notebook sra
Psst! I've posted some details about my notebook setup here.
Streaming data from the SRA with fastq-dump
NCBI's Sequence Read Archive is the go-to repository for published genome-scale sequence data sets.
Although there are a variety of ways to download sequence data from SRA, the fastq-dump
command from the SRA Toolkit is the most convenient in my opinion.
In fact, with a few settings tweaks fastq-dump
can stream data directly from the SRA into an analysis pipeline …
That darn cache! Configuring the SRA Toolkit
Last night I started a batch job on our group's cluster to download and process 9 Illumina libraries from the NCBI SRA.
In the past, I have almost always downloaded such data via direct links to .sra
files on the SRA FTP site, and then converted these files to Fastq format using the fastq-dump
command from the SRA Toolkit.
However …