Permalink: 2016-05-18 by Daniel S. Standage in blog tags: sra ngs

Last night I started a batch job on our group's cluster to download and process 9 Illumina libraries from the NCBI SRA. In the past, I have almost always downloaded such data via direct links to .sra files on the SRA FTP site, and then converted these files to Fastq format using the fastq-dump command from the SRA Toolkit. However, for a while I've been aware that the fastq-dump command is capable of downloading data (by accession) directly from NCBI. So last night, I took advantage of this convenience function.

This morning, I woke up to a barrage of warnings on the cluster saying that their was not disk space left. Impossible, I thought, there are terabytes of free space on the scratch mount. It was not the scratch partition that had filled up though, it was the $HOME partition. I'm lucky I was even able to log on to the machine. What the heck is going on?!?!

It turns out that by default, fastq-dump doesn't actually stream .sra files from the FTP site. Instead, it downloads intermediate cache files, which by default are stored in ~/ncbi/sra/. This is a huge issue for several reasons. First of all, this behavior is not at all obvious to a new user. Sure, there may be use cases where this behavior is very useful, but it seems much more appropriate as an opt-in feature than a default. Secondly, this is bound to create problems for cluster/HPC users all over the place that have very limited storage in their $HOME directories.

Fortunately, the SRA Toolkit allows you to configure this behavior. The officially-sanctioned approach is to fire up a config program, which will allow you to disable caching behavior and/or remote network access of any kind. The unofficial unsanctioned approach would be to make the config file yourself with a couple of shell commands.

mkdir -p ~/.ncbi
echo '/repository/user/main/public/root = "/scratch/standage/sra-cache"' > ~/.ncbi/user-settings.mkfg
# Uncomment the next command if you want to disable network access altogether
# echo '/repository/user/cache-disabled = "true"' > ~/.ncbi/user-settings.mkfg