STAR on EScript

From Array Suite Wiki

Revision as of 15:40, 10 April 2020 by Ana.ghindariu (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Goal

Enable customers to run STAR on multiple fastq samples in parallel, on server, HPC, or AWS Cloud, resulting in a pair of OmicData objects in an OmicSoft Suite project.

Customer Workflow (via escript)

The above examples contain a fully functional flow on STAR alignment flow, on cloud, with docker.

The following environment preconditions have been set:

  • AMI with docker V19.03.8: ami-012cb9a6d92521948
  • Docker Image (Docker images don't require prerequisites)

A complete STAR flow requires the following steps:

Build index

Preconditions:

  • InstanceType: m4.4xlarge
  • Additional Volume: 100GB
STAR Index
Begin Macro;

@NSLOTS@ 20;
@readlength@ 100;
End;
Begin RunEScript /RunOnServer=True;
Resources
"/VirtualCloudFolder/ArrayServer/Input/gencode.v33.annotation.gtf";
Files "/VirtualCloudFolder/ArrayServer/Input/GRCh38.p13.genome.fa";
EScriptName StarTest;
Command STAR --runThreadN @NSLOTS@ --runMode genomeGenerate --genomeDir %OutputFolder% --genomeFastaFiles %FilePath% --sjdbGTFfile %Resource1% --sjdbOverhang @readlength@;
Options /ParallelJobNumber=1 /ThreadNumberPerJob=8 /Mode=Single /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /ImageName="quay.io/biocontainers/star:2.7.3a--0" /UseCloud=True /UseDev2=True /OutputFolder="/GhindariuCloudFolder/Output/Results/StarIndex" /InstanceType=m4.4xlarge /VolumeSize=100;
End;