DownloadSRAData.pdf

From Array Suite Wiki

Revision as of 10:52, 23 March 2021 by Joseph (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Download SRA Data

The "Download SRA Data" command allows the user to specify an SRA ID for downloading public sequencing data for use in Array Studio. The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms.

In many cases (see caveat below) ArrayExpress FASTQ data can also be retrieved, using ***ERR/ERX*** identifiers.

This download method only supports the download of sequencing data (NGS) projects.

SRA download 1.png

SRA Download EBI NCBI Option.png

The user must first specify an output folder to place the retrieved data by selecting the "Browse" button and selecting the appropriate folder location.

SRA files will automatically convert to fastq.gz files, which can be imported to ArrayStudio for further analysis.

Options

Combine Runs Within An Experiment: If input are SRX IDs, the option would combine SRR files within a SRX into one single fastq file (SRXID.fastq.gz). When individual SRR or combination of SRR and SRX IDs are used as input, users will need to set combineRun=False to download the files individually.

Use Aspera: Download using Aspera if checked, and using wget if not.

Use Cluster: If checked, a job will be submitted to your server's cluster (if available) to retrieve the SRA data. If left unchecked, the server "head node" (where Array Server is running ) will process. Downloading large numbers of files can use significant resources, so it is generally recommended to "Use Cluster" if available.

Download from EBI: If checked, file download is performed from EBI first. If file is not found on EBI will try download from NCBI.

Download from NCBI: If checked, file download is performed from NCBI.

There is also an option to preview the records that will be retrieved prior to running the module.

SRA 2.png

Downloading using EBI identifiers

Users can input **ERR/ERX** identifiers such as ERR188407 or ERX162711. Generally this works well, but sometimes data were submitted to repositories using non-standard names, in which case ***Download SRA Data*** download mechanism will not work.

You can use the ***Preview*** function to check whether files can be found. If ***Preview*** lists your identifiers, the files are properly submitted on the remote server and can be downloaded.

DownloadERR Success.png


In this example, the files are not in a discoverable location, because they were uploaded to the microarray location on the EBI server: ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/experiment/MTAB/E-MTAB-9489/6-Int-Fresh-Sorted_S6_L001_R2_001.fastq.gz. ***Preview*** does not list the files, indicating that the download function will not retrieve the data.

DownloadERR Failure.png

Results

Once completed, the requested files will be downloaded to the specified directory.