OmicsoftDirectory, STAR on EScript

From Array Suite Wiki

(Difference between pages)
Jump to: navigation, search
(Build index)
Line 1: Line 1:
{{ArrayServerOption|[My Documents]\Omicsoft (under Windows) or /home/[UserName]/Omicsoft (under Linux)}}
+
== Goal ==
 +
Enable customers to run STAR on multiple fastq samples in parallel, on server, HPC, or AWS Cloud, resulting in a pair of OmicData objects in an OmicSoft Suite project.
  
The Omicsoft directory is used to contain folders used in Server-based projects.
+
== Customer Workflow (via escript) ==
 +
The above examples contain a fully functional flow on STAR alignment flow, on cloud, with docker.  
  
This is similar to the standard Omicsoft home directory in Array Studio and contains folders for Annotation and ReferenceLibrary, among others. 
+
The following environment preconditions have been set:
  
This folder could become large if users are running NGS commands.
+
* AMI with docker V19.03.8: ami-012cb9a6d92521948
 +
* Docker Image (Docker images don't require prerequisites)
  
It is recommended that this folder be contained on a local drive, with up to 100gb of space available.
+
A complete STAR flow requires the following steps:
 +
=== Build index ===
 +
Preconditions:
  
A standard Omicsoft home folder will contain:
+
* InstanceType: m4.4xlarge
 
+
* Additional Volume: 100GB
[[Affymetrix Folder]]
+
{| class="wikitable"
 
+
|-
[[Annotation Folder]]
+
! STAR Index
 
+
|-
[[Backup Folder]]
+
| Begin Macro;<br />
 
+
@NSLOTS@ 20;<br />
[[DataService Folder]]
+
@readlength@ 100;<br />
 
+
End;<br />
[[Favorites Folder]]
+
Begin RunEScript /RunOnServer=True;<br />
 
+
Resources<br />
[[GenomeBrowser Folder]]
+
"/VirtualCloudFolder/ArrayServer/Input/gencode.v33.annotation.gtf";<br />
 
+
Files "/VirtualCloudFolder/ArrayServer/Input/GRCh38.p13.genome.fa";<br />
[[Log Folder]]
+
EScriptName StarTest;<br />
 
+
Command STAR --runThreadN @NSLOTS@ --runMode genomeGenerate --genomeDir %OutputFolder% --genomeFastaFiles %FilePath% --sjdbGTFfile %Resource1% --sjdbOverhang @readlength@;<br />
[[Mapping Folder]]
+
Options /ParallelJobNumber=1 /ThreadNumberPerJob=8 /Mode=Single /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /ImageName="quay.io/biocontainers/star:2.7.3a--0" /UseCloud=True /UseDev2=True /OutputFolder="/VirtualCloudFolder/Output/Results/StarIndex" /InstanceType=m4.4xlarge /VolumeSize=100;<br />
 
+
End;<br />
[[Ontology Folder]]
+
|}
 
+
[[Plugin Folder]]
+
 
+
[[ReferenceLibrary Folder]]
+
 
+
[[RemoteSessions Folder]]
+
 
+
[[RScripts Folder]]
+
 
+
[[ServerCache Folder]]
+
 
+
[[ServerProjects Folder]]
+
 
+
[[Temp Folder]]
+
 
+
[[Settings files]]
+

Revision as of 15:40, 10 April 2020

Goal

Enable customers to run STAR on multiple fastq samples in parallel, on server, HPC, or AWS Cloud, resulting in a pair of OmicData objects in an OmicSoft Suite project.

Customer Workflow (via escript)

The above examples contain a fully functional flow on STAR alignment flow, on cloud, with docker.

The following environment preconditions have been set:

  • AMI with docker V19.03.8: ami-012cb9a6d92521948
  • Docker Image (Docker images don't require prerequisites)

A complete STAR flow requires the following steps:

Build index

Preconditions:

  • InstanceType: m4.4xlarge
  • Additional Volume: 100GB
STAR Index
Begin Macro;

@NSLOTS@ 20;
@readlength@ 100;
End;
Begin RunEScript /RunOnServer=True;
Resources
"/VirtualCloudFolder/ArrayServer/Input/gencode.v33.annotation.gtf";
Files "/VirtualCloudFolder/ArrayServer/Input/GRCh38.p13.genome.fa";
EScriptName StarTest;
Command STAR --runThreadN @NSLOTS@ --runMode genomeGenerate --genomeDir %OutputFolder% --genomeFastaFiles %FilePath% --sjdbGTFfile %Resource1% --sjdbOverhang @readlength@;
Options /ParallelJobNumber=1 /ThreadNumberPerJob=8 /Mode=Single /ErrorOnStdErr=False /ErrorOnMissingOutput=True /RunOnDocker=True /ImageName="quay.io/biocontainers/star:2.7.3a--0" /UseCloud=True /UseDev2=True /OutputFolder="/VirtualCloudFolder/Output/Results/StarIndex" /InstanceType=m4.4xlarge /VolumeSize=100;
End;