Dockerized STAR Index

From Array Suite Wiki

Jump to: navigation, search

This example uses a Macro section to set certain parameters.

  1. ThreadNum should be adjusted to the number of cores available on the EC2 instance.
  2. STAR is very memory-hungry, especially for Human-size genomes. We recommend an instance with at least 64 GB RAM (e.g. m4.4xlarge) for index generation.

STAR Index Generation Oscript

Begin Macro;
@ThreadNum@ 4;
@readlength@ 100;
@InstanceType@ "m4.4xlarge";
@VolumeSize@ 100;
End;
Begin RunEScript /RunOnServer=True;
Resources
"/CloudFolderSupport/TestDatasets/ReferenceLibraries/GRCh38/GenCode.Basic.B33/GenCode.Basic.B33.gtf";
Files "/CloudFolderSupport/TestDatasets/ReferenceLibraries/GRCh38/GRCh38.primary_assembly.genome.fa";
EScriptName StarTest;
Command STAR --runThreadN @ThreadNum@ --runMode genomeGenerate --genomeDir %OutputFolder% --genomeFastaFiles %FilePath% --sjdbGTFfile %Resource1% --sjdbOverhang @readlength@;
Options /ParallelJobNumber=1 /ThreadNumberPerJob=@ThreadNum@ /Mode=Single /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /ImageName="quay.io/biocontainers /star:2.7.3a--0" /UseCloud=True /UseDev4=True /OutputFolder="/CloudFolderSupport/Users/joe.pearson/STARIndex" /InstanceType=@InstanceType@ /VolumeSize=@VolumeSize@; 
End;