Goal
Enable customers to run Kallisto on multiple fastq samples in parallel, on server, HPC, or AWS Cloud, resulting in a pair of OmicData objects in an OmicSoft Suite project.
Kallisto Workflow via escript
A complete Kallisto flow requires the following steps:
Build transcriptome index
If aligning a transcriptome, an escript procedure is used to build the transcriptome index.
Kallisto Index
|
Begin RunEScript /RunOnServer=True;
Files "/VirtualCloudFolder/ArrayServer/Input/Transcripts/transcripts.fasta.gz";
EScriptName KallistoIndex;
Command kallisto index -i %OutputFolder%generated_transcripts.idx %FilePath%;
Options /ParallelJobNumber=1 /ThreadNumberPerJob=8 /Mode=Single /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /ImageName="omicdocker/kallisto:testing" /UseCloud=True /UseDev3=True /OutputFolder="/VirtualCloudFolder/Output/Transcripts";
End;
|
To run the above EScript properly, it's probably best to download the fastq files for the appropriate gene model you want to use. For example, I used Gencode.V24 for mine, so downloaded it from ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_24/gencode.v24.transcripts.fa.gz
The output *.idx file will show up when the job finishs.
RNA-seq quantification
For the RNA-seq quantification a new external script should be created, using the transcriptome index from the command bellow.
Kallisto Quantification
|
Begin RunEScript /RunOnServer=True;
Resources
"/VirtualCloudFolder/Output/Transcripts/generated_transcripts.idx";
Files
"/VirtualCloudFolder/ArrayServer/Input/Fastqs/SRR521461_1.fastq.gz"
"/VirtualCloudFolder/ArrayServer/Input/Fastqs/SRR521461_2.fastq.gz"
"/VirtualCloudFolder/ArrayServer/Input/Fastqs/SRR521462_1.fastq.gz"
"/VirtualCloudFolder/ArrayServer/Input/Fastqs/SRR521462_2.fastq.gz";
EScriptName KallistoQuant;
Command kallisto quant -i %Resource1% -o "%OutputFolder%" -b 100 %FilePath1% %FilePath2%;
Options /ParallelJobNumber=2 /ThreadNumberPerJob=8 /Mode=Paired /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /ImageName="omicdocker/kallisto:testing" /UseCloud=True /UseDev3=True /OutputFolder="/VirtualCloudFolder/Output/Abundances/";
Output "/VirtualCloudFolder/Output/Abundances/abundance.tsv => /VirtualCloudFolder/Output/Abundances/%PairName%_abundance.tsv" /Type=tsv;
End;
|
Merge Kallisto ouput
Kallisto ouput can be further merged to be imported in the ArrayStudio Analyses tab.
A script supporting this step has been built in the following Docker image: omicdocker/pandas:latest. It can be included in the Kallisto pipeline, using the following syntax:
Kallisto Merge
|
Begin RunEScript /RunOnServer=True;
Files
"/VirtualCloudFolder/ArrayServer/Output/Abundances/SRR521461_abundance.tsv"
"/VirtualCloudFolder/ArrayServer/Output/Abundances/SRR521462_abundance.tsv";
EScriptName KallistoMerge;
Command python Anisto.py -i %FileDirectory% -o %OutputFolder%;
Options /ParallelJobNumber=1 /ThreadNumberPerJob=8 /Mode=Multiple /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /RunOnDocker=True /ImageName="omicdocker/pandas:latest" /UseCloud=True /UseDev3=True /OutputFolder="/VirtualCloudFolder/Output/Results";
End;
|
Import in ArrayStudio
Import can be done via ArrayStudio or script:
Import
|
Begin UnstackTableFile /Namespace=Table /RunOnServer=True /RunOnCloud=False /RunOnGenomicCloud=False;
File "@OutputFolderName@/merged.txt";
RowID target_id;
SplitBy SampleID;
Value est_counts;
Covariate ;
Options /Format=TabDelimited;
Output mergedCounts;
End;
|