STAR on EScript

From Array Suite Wiki

(Difference between revisions)
Jump to: navigation, search
(Alignment)
 
(2 intermediate revisions by one user not shown)
Line 33: Line 33:
 
End;<br />
 
End;<br />
 
|}
 
|}
Index output folder should contain:
+
Index output folder should contain:<br />
<gallery>
+
[[File:StarIndex.JPG]]
File:EStarIndex.JPG|STAR Index Output
+
=== Alignment ===
</gallery>
+
Preconditions:
 +
 
 +
* InstanceType: m4.4xlarge
 +
* Additional Volume: 100GB
 +
* All resources must be located in the same folder
 +
* STAR align "--genomeDir" parameter must be given a directory where all the output files from the index building are located. Escript currently supports a parameter %ResourceFolder% for this purpose, which will be replaced with the value of the folder in which the resources are located. All the index files used by the alignment process must be specified in the "Resources" section.
 +
{| class="wikitable"
 +
|-
 +
! STAR Align
 +
|-
 +
| Begin Macro;<br />
 +
@NSLOTS@ 18;<br />
 +
@readlength@ 100;<br />
 +
End;<br />
 +
 +
Begin RunEScript /RunOnServer=True;<br />
 +
 +
Resources<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/SAindex"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/chrLength.txt"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/chrName.txt"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/chrNameLength.txt"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/chrStart.txt"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/exonGeTrInfo.tab"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/exonInfo.tab"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/geneInfo.tab"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/Genome"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/genomeParameters.txt"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/sjdbInfo.txt"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/sjdbList.fromGTF.out.tab"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/sjdbList.out.tab"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/transcriptInfo.tab"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/Genome"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/SA"<br />
 +
"/VirtualCloudFolder/Output/Results/StarIndex/transcriptInfo.tab";<br />
 +
Files<br />
 +
"/VirtualCloudFolder/ArrayServer/Input/Fastqs/SRX1852924_1.fastq.gz"<br />
 +
"/VirtualCloudFolder/ArrayServer/Input/Fastqs/SRX1852924_2.fastq.gz";<br />
 +
EScriptName StarTest;<br />
 +
Command STAR --runMode alignReads --runThreadN @NSLOTS@ --genomeDir "%ResourceFolder%" --readFilesIn %FilePath1% %FilePath2% --outSAMattrRGline "ID:%PairName%" --outSAMtype BAM SortedByCoordinate --outSAMattributes RG NM NH --outSAMunmapped Within --outSAMorder Paired --outFilterMultimapNmax 10 --outFilterType Normal --outFilterIntronMotifs None --outFileNamePrefix %OutputFolder% --readNameSeparator "." --readFilesCommand zcat;<br />
 +
Options /ParallelJobNumber=1 /ThreadNumberPerJob=8 /Mode=Paired /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /ImageName="quay.io/biocontainers/star:2.7.3a--0" /UseCloud=True /UseDev2=True /OutputFolder="/GhindariuCloudFolder/Output/Results/StarAlign/SRX1852924" /InstanceType=m4.4xlarge /VolumeSize=100;<br />
 +
End;<br />
 +
|}
 +
Output
 +
 
 +
The output of the alignment step will be a folder per each sample, containing the following files:<br />
 +
[[File:StarAlign.JPG]]

Latest revision as of 15:50, 10 April 2020

Contents

Goal

Enable customers to run STAR on multiple fastq samples in parallel, on server, HPC, or AWS Cloud, resulting in a pair of OmicData objects in an OmicSoft Suite project.

Customer Workflow (via escript)

The above examples contain a fully functional flow on STAR alignment flow, on cloud, with docker.

The following environment preconditions have been set:

  • AMI with docker V19.03.8: ami-012cb9a6d92521948
  • Docker Image (Docker images don't require prerequisites)

A complete STAR flow requires the following steps:

Build index

Preconditions:

  • InstanceType: m4.4xlarge
  • Additional Volume: 100GB
STAR Index
Begin Macro;

@NSLOTS@ 20;
@readlength@ 100;
End;
Begin RunEScript /RunOnServer=True;
Resources
"/VirtualCloudFolder/ArrayServer/Input/gencode.v33.annotation.gtf";
Files "/VirtualCloudFolder/ArrayServer/Input/GRCh38.p13.genome.fa";
EScriptName StarTest;
Command STAR --runThreadN @NSLOTS@ --runMode genomeGenerate --genomeDir %OutputFolder% --genomeFastaFiles %FilePath% --sjdbGTFfile %Resource1% --sjdbOverhang @readlength@;
Options /ParallelJobNumber=1 /ThreadNumberPerJob=8 /Mode=Single /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /ImageName="quay.io/biocontainers/star:2.7.3a--0" /UseCloud=True /UseDev2=True /OutputFolder="/VirtualCloudFolder/Output/Results/StarIndex" /InstanceType=m4.4xlarge /VolumeSize=100;
End;

Index output folder should contain:
StarIndex.JPG

Alignment

Preconditions:

  • InstanceType: m4.4xlarge
  • Additional Volume: 100GB
  • All resources must be located in the same folder
  • STAR align "--genomeDir" parameter must be given a directory where all the output files from the index building are located. Escript currently supports a parameter %ResourceFolder% for this purpose, which will be replaced with the value of the folder in which the resources are located. All the index files used by the alignment process must be specified in the "Resources" section.
STAR Align
Begin Macro;

@NSLOTS@ 18;
@readlength@ 100;
End;

Begin RunEScript /RunOnServer=True;

Resources
"/VirtualCloudFolder/Output/Results/StarIndex/SAindex"
"/VirtualCloudFolder/Output/Results/StarIndex/chrLength.txt"
"/VirtualCloudFolder/Output/Results/StarIndex/chrName.txt"
"/VirtualCloudFolder/Output/Results/StarIndex/chrNameLength.txt"
"/VirtualCloudFolder/Output/Results/StarIndex/chrStart.txt"
"/VirtualCloudFolder/Output/Results/StarIndex/exonGeTrInfo.tab"
"/VirtualCloudFolder/Output/Results/StarIndex/exonInfo.tab"
"/VirtualCloudFolder/Output/Results/StarIndex/geneInfo.tab"
"/VirtualCloudFolder/Output/Results/StarIndex/Genome"
"/VirtualCloudFolder/Output/Results/StarIndex/genomeParameters.txt"
"/VirtualCloudFolder/Output/Results/StarIndex/sjdbInfo.txt"
"/VirtualCloudFolder/Output/Results/StarIndex/sjdbList.fromGTF.out.tab"
"/VirtualCloudFolder/Output/Results/StarIndex/sjdbList.out.tab"
"/VirtualCloudFolder/Output/Results/StarIndex/transcriptInfo.tab"
"/VirtualCloudFolder/Output/Results/StarIndex/Genome"
"/VirtualCloudFolder/Output/Results/StarIndex/SA"
"/VirtualCloudFolder/Output/Results/StarIndex/transcriptInfo.tab";
Files
"/VirtualCloudFolder/ArrayServer/Input/Fastqs/SRX1852924_1.fastq.gz"
"/VirtualCloudFolder/ArrayServer/Input/Fastqs/SRX1852924_2.fastq.gz";
EScriptName StarTest;
Command STAR --runMode alignReads --runThreadN @NSLOTS@ --genomeDir "%ResourceFolder%" --readFilesIn %FilePath1% %FilePath2% --outSAMattrRGline "ID:%PairName%" --outSAMtype BAM SortedByCoordinate --outSAMattributes RG NM NH --outSAMunmapped Within --outSAMorder Paired --outFilterMultimapNmax 10 --outFilterType Normal --outFilterIntronMotifs None --outFileNamePrefix %OutputFolder% --readNameSeparator "." --readFilesCommand zcat;
Options /ParallelJobNumber=1 /ThreadNumberPerJob=8 /Mode=Paired /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /ImageName="quay.io/biocontainers/star:2.7.3a--0" /UseCloud=True /UseDev2=True /OutputFolder="/GhindariuCloudFolder/Output/Results/StarAlign/SRX1852924" /InstanceType=m4.4xlarge /VolumeSize=100;
End;

Output

The output of the alignment step will be a folder per each sample, containing the following files:
StarAlign.JPG