Example of running 100 CCLE samples on cloud

From Array Suite Wiki

(Difference between revisions)
Jump to: navigation, search
(ArrayServer with Cloud configuration)
(ArrayServer with Cloud configuration)
 
Line 10: Line 10:
 
  Provider=Amazon
 
  Provider=Amazon
 
  Region=us-east-1
 
  Region=us-east-1
  #[[AvailabilityZone]]=xxxx [optional]
+
  #[[AvailabilityZone]]=xxxx
 
  #[[SecurityGroupID]]=xxxxxx [optional]
 
  #[[SecurityGroupID]]=xxxxxx [optional]
 
  [[AccessKey]]=xxxxxxxxxxxxxxxxxxxxxxxxx
 
  [[AccessKey]]=xxxxxxxxxxxxxxxxxxxxxxxxx

Latest revision as of 16:16, 28 November 2017


Contents

Overview

Studio on the cloud is OmicSoft's solution to manage and analyze large Omics data using Cloud. Array Server with Cloud will handle security credentials on server and submit/manage cloud files/jobs seamlessly as running on ArrayServer. Folders in S3 brackets are mapped to ArrayServer folder structure and user will feel the same way as locally mapped folder.

ArrayServer with Cloud configuration

In this example, 100 CCLE samples were on an ArrayServer with following cloud configurations:

[Cloud]
Provider=Amazon
Region=us-east-1
#AvailabilityZone=xxxx
#SecurityGroupID=xxxxxx [optional]
AccessKey=xxxxxxxxxxxxxxxxxxxxxxxxx
SecretKey=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
SubnetID=subnet-82cxxxxx
UseHttp=False
OmicsoftCloudDirectory=s3://east.cloud.xxxx/ArrayServerxxxOmicsoftHome
MaxInstanceCount=20
MaxInstanceCountPerJob=10
UseReducedRedundancy=False
EnableDataEncryption=True
DefaultCloudJobNumber=20
SimulateQueue=False
InstanceProfileArn=arn:aws:iam::851065383862:instance-profile/Test03SGESimple
OAlignInstanceType=m4.xlarge
OSummaryInstanceType=m4.large

[CloudFolder]
CloudFolder=/east.cloud.omicsoft/xxxxxx
GaryCloudFolder=/east.test.omicsoft/xxx

[CloudInstanceTag]
ContactPerson=Gary Ge
Application=ArrayServer
Email=gary.ge@omicsoft.com

[CloudVolumeTag]
ContactPerson=Gary Ge
Application=ArrayServer
Email=gary.ge@omicsoft.com

With the cloud configuration, two server folders CloudFolder and CloudTestGary will show up under the root of ArrayServer file browser:

ServerCloudFolders.png

The access management to the mapped cloud folder can be controlled in the same way as locally mapped folder (details: ArrayServer folder mapping and management).

Cloud logic

Array Server submit/manage cloud files/jobs seamlessly as running on ArrayServer. The design of cloud analysis is following the logic below:

  1. User selects raw data from a cloud folder or its subfolder, such as /CloudFolder/CCLE
  2. If #1 is true, user has to set output folder to a cloud folder too, such as /CloudFolder/TestOutput or /GaryCloudFolder/TestOutput
  3. Then ArrayServer will launch one machine for one sample and running analysis using EC2; alignment related jobs are using OAlignInstanceType; other jobs are using OSummaryInstanceType.
  4. Input files in S3 are copied to EC2 machines where EBS storage are attached (size is calculated based on input file size)
  5. Cloud instances are launched with Omicsoft software installed and they receives message from ArraySever and running analysis
  6. If admin set MaxInstanceCount=20, at most 20 EC2 machines will be started. If there are more than 20 samples, extra samples will be queued.
  7. When a job is finished, all results are uploaded to S3 output folder
  8. When a job is finished, the machine will wait 30min to run analysis on new samples in the queue
  9. EC2 machines are terminated when no jobs in queue and it is idle > 30min. No EC2 machines are running when all samples are finished.
  10. Data object (NgsData: links to S3 BAM files; OmicData: expression with design/annotation; Table report (such as mutation/fusion report) are summarized and saved in the ArrayServer server machine
  11. Cloud and SGE/PBS/LSF cluster can be co-existed. When you select input files from non-cloud folders, it is running on cluster.

Cloud Test using 100 CCLE RNA-Seq samples

In this Cloud test, we randomly selected 100 CCLE samples, and run the Omicsoft RnaSeqPipeline function.

Analysis function in Omicsoft

After adding samples, specify options in GUI:

RnaSeqPipelineCloud.png

The analyses of 100 samples are submitted to ArrayServer where cloud jobs are scheduled and managed.

Jobs on the queue and cloud

Server job queue (right click to get real time full log):


ServerJobQueue.png


Admin will be able to see the instances status.

CloudInstance.png


Because I set MaxInstanceCount=20, 20 EC2 machines are started and will analyze 100 samples sequentially.

Cost explorer

Note: AWS console provides tools that you can check the cost. OmicSoft/ArrayServer does not provide such tools nor integrations.

  • The 100-sample job using 20 EC2 machines was finished in 2 days and 6 hours.
  • The total instance hours is 958 hours
  • It cost 312$ for the EC2 computing plus the S3 storage cost.

InstanceReport.png

Cost Reports

AWS100SampleCost.png

For this 100 samples, the total number of mapped reads is 15 billion 101bp reads:

' Total nucleotide# AverageReadLength Total read# Uniquely mapped & paired read# Uniquely mapped paired read% Mapped read%
Average (Per sample) 16,179,978,206 101 160,197,804 140,571,749 87.52% 93.64%
Total (100 Samples) 1,617,997,820,602 16,019,780,402

The cost of running the whole RNA-Seq pipeline (raw data QC, filtering, alignment, aligned QC, quantification on gene/transcript/exon/exon junction levels, mutation and fusion detection, BAM summary) is 2 cents per million mapped reads.

Detail job log

The full job log has almost 40,000 lines, recording the every step of analysis and data transfer, with the EC2 instance ID printed out. You can access the full job log here. Below are job log entries related to one sample.

# 201504010_CloudCCLE100SampleTest.log (209 hits)
Line 180: /CloudFolder/CCLE/RawData/G25239.MFE-296.1.1.fastq.gz
Line 181: /CloudFolder/CCLE/RawData/G25239.MFE-296.1.2.fastq.gz
Line 310: [00:03:18] Performing alignment for observation  G25239.MFE-296.1...
Line 312: [00:03:18] Sending Job: G25239.MFE-296.1 to cloud. InstanceID=i-5bae8374...
Line 370: [00:03:24] Job: G25239.MFE-296.1 sent to cloud. InstanceID=i-5bae8374...
Line 568: [00:12:04] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 583: [00:13:23] i-5bae8374 - Downloading file: CloudFolder/CCLE/RawData/G25239.MFE-296.1.1.fastq.gz...
Line 634: [00:22:18] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 642: [00:23:12] i-5bae8374 - Downloading file: CloudFolder/CCLE/RawData/G25239.MFE-296.1.2.fastq.gz...
Line 692: [00:32:06] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 1108: [02:02:45] i-5bae8374 - Performing raw data QC for observation: G25239.MFE-296.1.1...
Line 1110: [02:03:22] i-5bae8374 - Performing summarization (Mode=NgsQCWizard) for observation G25239.MFE-296.1.1...
Line 1112: [02:03:22] i-5bae8374 - Summarizing QC metrics for observation: G25239.MFE-296.1.1...
Line 1113: [02:03:22] i-5bae8374 - Format for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Input_/G25239.MFE-296.1.1.fastq.gz was set to FASTQ
Line 1114: [02:03:22] i-5bae8374 - Quality encoding for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Input_/G25239.MFE-296.1.1.fastq.gz was set to Sanger
Line 1140: [02:10:41] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 1241: [02:33:31] i-5bae8374 - Job: G25239.MFE-296.1.1 was finished at Friday, 10 April 2015 17:42:35
Line 1243: [02:33:31] i-5bae8374 - Performing raw data QC for observation: G25239.MFE-296.1.2...
Line 1245: [02:33:31] i-5bae8374 - Performing summarization (Mode=NgsQCWizard) for observation G25239.MFE-296.1.2...
Line 1247: [02:33:31] i-5bae8374 - Summarizing QC metrics for observation: G25239.MFE-296.1.2...
Line 1248: [02:33:31] i-5bae8374 - Format for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Input_/G25239.MFE-296.1.2.fastq.gz was set to FASTQ
Line 1249: [02:33:31] i-5bae8374 - Quality encoding for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Input_/G25239.MFE-296.1.2.fastq.gz was set to Sanger
Line 1315: [02:40:33] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 1516: [03:03:26] i-5bae8374 - Job: G25239.MFE-296.1.2 was finished at Friday, 10 April 2015 18:12:21
Line 1519: [03:03:26] i-5bae8374 - Performing summarization (Mode=FilterPairedNgsFile) for observation G25239.MFE-296.1...
Line 1521: [03:03:26] i-5bae8374 - Filtering observation: G25239.MFE-296.1...
Line 1522: [03:03:26] i-5bae8374 - Format for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Input_/G25239.MFE-296.1.1.fastq.gz was set to FASTQ
Line 1523: [03:03:26] i-5bae8374 - Quality encoding for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Input_/G25239.MFE-296.1.1.fastq.gz was set to Sanger
Line 1638: [03:09:54] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 2221: [04:03:19] i-5bae8374 - Filter file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Input_/G25239.MFE-296.1_20077933363.ff2 contains 103884147 entries and 4247947 marked as true. Expects 4247947 marked entries.
Line 2263: [04:03:33] i-5bae8374 - Saving filter file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Input_/G25239.MFE-296.1_20077933363.ff2...
Line 2264: [04:03:33] i-5bae8374 - Job: G25239.MFE-296.1 was finished at Friday, 10 April 2015 19:13:34
Line 2268: [04:03:33] i-5bae8374 - Performing alignment for observation  G25239.MFE-296.1...
Line 2275: [04:03:33] i-5bae8374 - Filter file will be used for G25239.MFE-296.1...
Line 2276: [04:03:33] i-5bae8374 - Format for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Input_/G25239.MFE-296.1.1.fastq.gz was set to FASTQ
Line 2277: [04:03:33] i-5bae8374 - Quality encoding for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Input_/G25239.MFE-296.1.1.fastq.gz was set to Sanger
Line 2278: [04:03:33] i-5bae8374 - Loading index for the gene model (G25239.MFE-296.1)...
Line 2279: [04:03:33] i-5bae8374 - Building mapping framework between transcriptome and genome (G25239.MFE-296.1)...
Line 2294: [04:05:28] i-5bae8374 - Aligning all reads to transcriptome + genome (G25239.MFE-296.1)...
Line 2395: [04:08:00] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 3539: [06:10:19] i-5bae8374 - Assembling exon junctions and performing additional alignments (G25239.MFE-296.1)...
Line 3553: [06:11:23] i-5bae8374 - Stage 2 alignment: loading novel exon junction library (G25239.MFE-296.1)...
Line 3560: [06:11:23] i-5bae8374 - Stage 2 alignment: aligning unmapped reads to novel exon junctions (G25239.MFE-296.1)...
Line 3588: [06:15:11] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 3796: [06:29:22] i-5bae8374 - Generating sorted bam file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 3899: [06:35:09] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 4447: [06:59:19] i-5bae8374 - Generating target bpl file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam.bpl...
Line 4448: [06:59:19] i-5bae8374 - BamFile=/OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam.175141418 reads uniquely paired, 2543196 reads non-uniquely paired, 12248046 reads not mapped.
Line 4449: [06:59:19] i-5bae8374 - Indexing BAM file for G25239.MFE-296.1...
Line 4451: [06:59:19] i-5bae8374 - Performing summarization (Mode=IndexBin) for observation G25239.MFE-296.1...
Line 4453: [06:59:19] i-5bae8374 - Checking index for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 4454: [06:59:19] i-5bae8374 - Building index for /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam
Line 4525: [07:04:07] i-5bae8374 - Job: G25239.MFE-296.1 was finished at Friday, 10 April 2015 22:14:27
Line 4526: [07:04:18] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 4532: [07:05:20] i-5bae8374 - Job: G25239.MFE-296.1 was finished at Friday, 10 April 2015 22:14:28
Line 4535: [07:05:20] i-5bae8374 - Summarizing mutation+SNP for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 4537: [07:05:20] i-5bae8374 - Performing combined fusion analysis for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 4538: [07:05:20] i-5bae8374 - Performing summarization (Mode=SummarizeMutation2Snp) for observation G25239.MFE-296.1...
Line 4542: [07:05:20] i-5bae8374 - Calculating RNA-Seq QC metrics for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 4543: [07:05:20] i-5bae8374 - Generating BAS for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 4545: [07:05:20] i-5bae8374 - Performing summarization (Mode=RnaSeqQCMetrics) for observation G25239.MFE-296.1...
Line 4547: [07:05:20] i-5bae8374 - Performing summarization (Mode=GenerateLandBas) for observation G25239.MFE-296.1...
Line 4551: [07:05:20] i-5bae8374 - Generating BAS for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 4552: [07:05:20] i-5bae8374 - Summarizing bam file /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam
Line 4553: [07:05:20] i-5bae8374 - Summarizing RNA-Seq QC metrics for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 4554: [07:05:20] i-5bae8374 - Summarizing mutation+SNP for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 4556: [07:05:20] i-5bae8374 - Calculating RNA-Seq metrics for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 4589: [07:07:26] i-5bae8374 - Performing alignment for observation  G25239.MFE-296.1...
Line 4718: [07:13:54] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 5313: [07:41:25] i-5bae8374 - Aligning reads in fusion mode (G25239.MFE-296.1)...
Line 5376: [07:43:21] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 5748: [07:56:40] i-5bae8374 - Assembling fusion junctions and performing additional alignments (G25239.MFE-296.1)...
Line 5824: [07:57:29] i-5bae8374 - Stage 2 alignment: loading fusion junction library (G25239.MFE-296.1)...
Line 5825: [07:57:29] i-5bae8374 - Stage 2 alignment: aligning unmapped reads to fusion junctions (G25239.MFE-296.1)...
Line 5985: [08:02:43] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 6146: [08:12:27] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 6261: [08:19:20] i-5bae8374 - Performing summarization (Mode=IndexBin) for observation G25239.MFE-296.1.FusionReads...
Line 6263: [08:19:20] i-5bae8374 - Checking index for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/Fusion/G25239.MFE-296.1.FusionReads.bam...
Line 6264: [08:19:20] i-5bae8374 - Building index for /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/Fusion/G25239.MFE-296.1.FusionReads.bam
Line 6265: [08:19:20] i-5bae8374 - Job: G25239.MFE-296.1.FusionReads was finished at Friday, 10 April 2015 23:27:47
Line 6267: [08:19:20] i-5bae8374 - Job: G25239.MFE-296.1 was finished at Friday, 10 April 2015 23:27:48
Line 6268: [08:19:20] i-5bae8374 - Perform quantification for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 6270: [08:19:20] i-5bae8374 - Performing summarization (Mode=Summarize2Expression) for observation G25239.MFE-296.1...
Line 6272: [08:19:20] i-5bae8374 - Summarizing expression for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 7096: [09:11:19] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 7100: [09:15:07] i-5bae8374 - Job: G25239.MFE-296.1 was finished at Saturday, 11 April 2015 00:25:24
Line 7103: [09:15:22] i-5bae8374 - Summarizing exon junction for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 7105: [09:15:22] i-5bae8374 - Performing summarization (Mode=SummarizeExonJunctions) for observation G25239.MFE-296.1...
Line 7107: [09:15:22] i-5bae8374 - Summarizing exon junctions for file: G25239.MFE-296.1...
Line 7171: [09:21:02] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 7481: [09:55:43] i-5bae8374 - Testing sorting status for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/ExonJunction/G25239.MFE-296.1.bam.ngsexj2_...
Line 7486: [09:57:31] i-5bae8374 - Job: G25239.MFE-296.1 was finished at Saturday, 11 April 2015 01:06:00
Line 7512: [10:00:07] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 7574: [10:07:19] i-5bae8374 - Performing summarization (Mode=IndexBasBin) for observation G25239.MFE-296.1...
Line 7576: [10:07:19] i-5bae8374 - Checking index for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bas...
Line 7577: [10:07:19] i-5bae8374 - Building index for /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bas
Line 7578: [10:07:19] i-5bae8374 - Job: G25239.MFE-296.1 was finished at Saturday, 11 April 2015 01:16:34
Line 7580: [10:07:19] i-5bae8374 - Job: G25239.MFE-296.1 was finished at Saturday, 11 April 2015 01:16:35
Line 7591: [10:09:59] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 7843: [10:42:08] i-5bae8374 - Testing sorting status for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/Mutation2Snp/G25239.MFE-296.1.bam.ngsm2s_...
Line 7864: [10:43:28] i-5bae8374 - Job: G25239.MFE-296.1 was finished at Saturday, 11 April 2015 01:52:56
Line 7952: [10:48:51] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 8061: [10:53:31] i-5bae8374 - Job: G25239.MFE-296.1 was finished at Saturday, 11 April 2015 02:03:53
Line 8092: [10:55:27] i-5bae8374 - Summarizing 5'->3' trend for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 8094: [10:55:27] i-5bae8374 - Performing summarization (Mode=SummarizeRnaSeqTrend53) for observation G25239.MFE-296.1...
Line 8096: [10:55:27] i-5bae8374 - Summarizing RNA-Seq trend 5'->3' for file: /OmicsoftWorking/i-5bae8374_bd730264b4f983ca/_Output_/G25239.MFE-296.1.bam...
Line 8111: [10:58:31] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 8207: [11:04:42] i-5bae8374 - Job: G25239.MFE-296.1 was finished at Saturday, 11 April 2015 02:14:53
Line 8224: [11:05:28] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/G25239.MFE-296.1.bam...
Line 8277: [11:08:21] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 8807: [12:08:08] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/G25239.MFE-296.1.bam.bim...
Line 8809: [12:09:29] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/G25239.MFE-296.1.bam.bpl...
Line 8865: [12:16:51] Job: G25239.MFE-296.1 is being executed. InstanceID=i-5bae8374...
Line 8900: [12:20:55] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/G25239.MFE-296.1.bam.summary.txt...
Line 8902: [12:21:29] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/G25239.MFE-296.1.bas...
Line 8903: [12:21:29] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/G25239.MFE-296.1.bas.bim...
Line 8919: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/AlignedQC/G25239.MFE-296.1.bam.ngsqcm...
Line 8920: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/AlignedQC/G25239.MFE-296.1.bam.ngst53...
Line 8921: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/Counts/G25239.MFE-296.1.bam.ngs2tex...
Line 8922: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/ExonJunction/G25239.MFE-296.1.bam.ngsexj2...
Line 8923: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/Filtered/G25239.MFE-296.1.ngsftr...
Line 8924: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/Fusion/G25239.MFE-296.1.FusionReads.bam...
Line 8925: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/Fusion/G25239.MFE-296.1.FusionReads.bam.bim...
Line 8926: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/Fusion/G25239.MFE-296.1.ngsfspe2...
Line 8927: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/Mutation2Snp/G25239.MFE-296.1.bam.ngsm2s...
Line 8928: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/RawQC/G25239.MFE-296.1.1.ngsdup...
Line 8929: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/RawQC/G25239.MFE-296.1.1.ngsmer...
Line 8930: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/RawQC/G25239.MFE-296.1.1.ngsqbd...
Line 8931: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/RawQC/G25239.MFE-296.1.1.ngsqbm...
Line 8932: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/RawQC/G25239.MFE-296.1.1.ngsqqm...
Line 8933: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/RawQC/G25239.MFE-296.1.2.ngsdup...
Line 8934: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/RawQC/G25239.MFE-296.1.2.ngsmer...
Line 8935: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/RawQC/G25239.MFE-296.1.2.ngsqbd...
Line 8936: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/RawQC/G25239.MFE-296.1.2.ngsqbm...
Line 8937: [12:22:52] i-5bae8374 - Uploading file to CloudFolder/CCLE/Output/100Samples.20150410/RawQC/G25239.MFE-296.1.2.ngsqqm...
Line 8938: [12:23:14] Cloud finished job: G25239.MFE-296.1. InstanceID=i-5bae8374.

Snapshot of ServerProject with results generated from AWS cloud runs:

ProjectAWS100samples.png


Note: the NgsData object contains file links to the actual BAM files in S3 bucket.

Cloud Test using 931 CCLE RNA-Seq samples

Other than the 100 sample test, we also tested the RNA-Seq pipeline run on all 931 CCLE RNA-Seq samples using MaxInstanceCount=100 (Launching 100 cloud instances for 931 samples).

  • The test was run in VPCx by setting SubnetID
  • The whole job was finished in 4 days and 4 hours
  • There is one retry for analysis in one sample; and two retries for data uploads to S3; Omicsoft implementation of Cloud analysis allows maximum retry of three times for analysis and data upload
  • AWS S3 may have storage delay (e.g, file is uploaded and can be listed, but it cannot be downloaded from S3). We only found such situation in some VPCx settings. It is totally fine in AWS without VPCx. We implemented a "Extended Delay" mechanism to deal with the situation of storage delay.