Psst! I've posted some details about my notebook setup here.

Loading paired reads from position-sorted BAM files

Permalink: 2018-06-12 by Daniel S. Standage in blog tags: ngs bam

BAM files with sequence alignments sorted by genomic position seem to be the new currency of exchange for large-scale human genome sequencing projects. This is convenient and practical in many ways for many people. But in my current research I work a lot with tools that only want/need the sequence information and, for whatever reasons, support only FASTA or …

more…

Streaming data from the SRA with fastq-dump

NCBI's Sequence Read Archive is the go-to repository for published genome-scale sequence data sets. Although there are a variety of ways to download sequence data from SRA, the fastq-dump command from the SRA Toolkit is the most convenient in my opinion. In fact, with a few settings tweaks fastq-dump can stream data directly from the SRA into an analysis pipeline …

more…

That darn cache! Configuring the SRA Toolkit

Permalink: 2016-05-18 by Daniel S. Standage in blog tags: sra ngs

Last night I started a batch job on our group's cluster to download and process 9 Illumina libraries from the NCBI SRA. In the past, I have almost always downloaded such data via direct links to .sra files on the SRA FTP site, and then converted these files to Fastq format using the fastq-dump command from the SRA Toolkit. However …

more…