DownloadSRAData.pdf
From Array Suite Wiki
Contents |
Download SRA Data
The "Download SRA Data" command allows the user to specify an SRA ID for downloading public sequencing data for use in Array Studio. The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms.
In many cases (see caveat below) ArrayExpress FASTQ data can also be retrieved, using ***ERR/ERX*** identifiers.
This download method only supports the download of sequencing data (NGS) projects.
The user must first specify an output folder to place the retrieved data by selecting the "Browse" button and selecting the appropriate folder location.
SRA files will automatically convert to fastq.gz files, which can be imported to ArrayStudio for further analysis.
Options
Combine Runs Within An Experiment: If input are SRX IDs, the option would combine SRR files within a SRX into one single fastq file (SRXID.fastq.gz). When individual SRR or combination of SRR and SRX IDs are used as input, users will need to set combineRun=False to download the files individually.
Use Aspera: Download using Aspera if checked, and using wget if not.
Use Cluster: If checked, a job will be submitted to your server's cluster (if available) to retrieve the SRA data. If left unchecked, the server "head node" (where Array Server is running ) will process. Downloading large numbers of files can use significant resources, so it is generally recommended to "Use Cluster" if available.
Download from EBI: If checked, file download is performed from EBI first. If file is not found on EBI will try download from NCBI.
Download from NCBI: If checked, file download is performed from NCBI.
There is also an option to preview the records that will be retrieved prior to running the module.
Downloading using EBI identifiers
Users can input **ERR/ERX** identifiers such as ERR188407 or ERX162711. Generally this works well, but sometimes data were submitted to repositories using non-standard names, in which case ***Download SRA Data*** download mechanism will not work.
You can use the ***Preview*** function to check whether files can be found. If ***Preview*** lists your identifiers, the files are properly submitted on the remote server and can be downloaded.
In this example, the files are not in a discoverable location, because they were uploaded to the microarray location on the EBI server: ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/experiment/MTAB/E-MTAB-9489/6-Int-Fresh-Sorted_S6_L001_R2_001.fastq.gz. ***Preview*** does not list the files, indicating that the download function will not retrieve the data.
Results
Once completed, the requested files will be downloaded to the specified directory.