Server Script

From Array Suite Wiki

Jump to: navigation, search

Contents

Generate a Pipeline Omicscript for Array Server

Server scripts are scripts defined by Manage Scripts.

The script must contain three blocks:

Info

Implemented using the <Info> block.

The <Info> block allows the administrator to specify a label for the script, a description of the script, as well as a category (separated by \ for multiple levels) for the script, as in the example below:

<Info>
Label=Illumina RNA-Seq Alignment
Description=Import Illumina reads with B37/R62
Category=NGS\RNA-Seq\Illumina

Input

Implemented using the <Input>

The <Input> block allows the user to specify variables that will be used by the script. Variables are named by using the @VariableName@ pattern. These variables can be substituted at appropriate places within the script.

Tips.png Although the @VariableName@ syntax is similar to the Macro @Parameter@ syntax, the behavior is slightly different in that the values (DefaultOption and Levels) generally should not be quoted here. If quotes are included, they will be interpreted literally as part of the value and included in the substitution. For example

@ReportHeader@="my special /* report header"

when substituted would result in something like this example Command statement from a RunEScript procedure:

Command echo "@ReportHeader@";

which would yield the following command being sent to the shell:

echo ""my special */ report header""

which, when executed, each sequential "" is interpreted as an empty string such that the */ is effectively unquoted and prints a directory listing instead of the literal '*/' value


Each variable should follow this pattern:

Standard Variable

@VariableName@=DefaultOption

~@VariableName@=Variable description

~@VariableName@Levels=Level1, Level2 (Levels separated by commas—this is optional)

~@VariableName@ExclusiveLevels=True (Options: True or False; Indicates if only those levels can be selected, or the user should be allowed to enter free text—this is optional)

Example:

<Input>

@PairedSamples@=True
~@PairedSamples@=Data is paired. Options are True or False (True by default)
~@PairedSamples@Levels=True,False
~@PairedSamples@ExclusiveLevels=True 

@ThreadNumberPerJob@=4
~@ThreadNumberPerJob@=Number of threads to run for each of the steps.
~@ThreadNumerPerJob@Levels=1,2,3,4,5,6,7,8
~@ThreadNumberPerJob@ExclusiveLevels=False

@ParallelJobNumber@=8
~@ParallelJobNumber@=Number of parallel jobs to run for each of the steps

File Variables

Example to select a folder

@OutputFolderName@=
~@OutputFolderName@Type=FilePath
~@OutputFolderName@=Output folder for results and BAM files


Example to select a file

@TestFile@=
~@TestFile@=TestFileName
~@TestFile@Type=FileName
~@TestFile@Filter=fastq files|*.fastq;*.fastq.gz|Fasta Files|*.fasta;.fasta.gz;*.fa;*.fa.gz

The standard filter pattern (defined by windows, not us) is FilterDescription|*.ext1|FilterDescription23|*.ext2;*.ext3

  • Use | to separate both description and filter as well as different filters
  • Use ; to separate different filters in the same subblock (separated by |).

Warning.png WARNING: For these two Types, the DefaultOption and Levels are assumed to be OServer virtual paths unless otherwise qualified and will be decoded to their corresponding physical path before substitution

Special Variables

Special types of variables include those that pull from the design table (i.e those needed for running a 2-way ANOVA)

Column Name variable

@VariableName@=

~@VariableName@IsColumnName=True (Set to true if this variable should pull from any available sample meta data columns for the sampleset)

~@VariableName@=Description of the variable

Column Level variable

@VariableName@=

~@VariableName@=Description of the variable

~@VariableName@IsColumnLevel=True (Set to true if this variable should pull levels from a particular column

~@VariableName@ColumnName=NameOfColumnVariable (Specify the column name to use for the levels)

Script

Implemented using the <Script> block.

The <Script> tab contains the OmicScript that should be run for the pipeline. User has to be familiar with the Omic Script which can also be used for both Oshell and Server Script. The script in this wiki page may change when we update the software. Please get the latest script from ArrayStudio GUI.

Example

Ngs Data Example with LockDataName

Begin MapRnaSeqReadsToGenome /Namespace=NgsLib;

Files

"@FileNames@";

Reference Human.B37;

GeneModel Ensembl.R62;

Options /PairedEnd=@PairedSamples@ /FileFormat=FASTQ /AutoPenalty=True /FixedPenalty=2 /Greedy=False/ExcludeNonUniqueMapping=False /ReportCutoff=10 /WriteReadsInSeparateFiles=True /OutputFolder="" /GenerateSamFiles=False /ThreadNumber=@ThreadNumber@ /GenerateAlignmentSummary=True /TrimByQuality=True /ReadTrimSize=1024 /ReadTrimQuality=2 /InsertSizeStandardDeviation=40 /ExpectedInsertSize=300 /InsertOnSameStrand=False /InsertOnDifferentStrand=True /QualityEncoding=Automatic /Gzip=False /ExpressionMeasurement=None /SearchNovelExonJunction=False;

Output @DataName@;

End;


Begin LockDataName;

Data @LastNgsDataName@;

As @NgsData@;

End;

Special Options for Scripting

Special options are reserved in server script for particular data object or options. These do not need to defined in the input block before use.

  1. @DesignFile@
  2. @FileNames@
  3. @FileNames.BAM@
  4. @JobID@
  5. @LastDataName@
  6. @LastNgsDataName@
  7. @LastOmicDataName@
  8. @LastTableDataName@
  9. @LastVcfDataName@
  10. LockDataName proc
  11. @MachineName@
  12. @OutputFolder@
  13. @ProcessorCount@
  14. @ProjectName@
  15. @PScriptLogFolder@
  16. @UserName@
  17. @TempDirectory@
  18. @OmicsoftDirectory@


Two-way ANOVA Example

Server scripts are primarily being used for NGS analysis, but they can also be used for Omic data analysis, such as statistical testing.

In this example, the administrator has setup a ForEach variable, that allows the user to specify a design column that should be used as the first factor of a two-way ANOVA. They’ve then setup a Compare variable, that allows the user to specify a second design column, to be used as the second factor of a two-way ANOVA. A CompareTo variable has been setup that, using the levels of the Compare variable, allows the user to specify which level should be compared to, when running the two-way ANOVA (i.e if you had Time and Treatment, and the Treatment column had a “control” level, the user might specify ForEach: Time, Compare:Treatment, CompareTo:Control)

In the <Script> block, these variables are used in the LinearModel proc (Begin LinearModel…)

@ForEach@=

~@ForEach@IsColumnName=True

~@ForEach@=Specify first factor for ANOVA

@Compare@=

~@Compare@IsColumnName=True

~@Compare@=Specify second factor for ANOVA.

@CompareTo@=

~@CompareTo@=Compare to level

~@CompareTo@IsColumnLevel=True

~@CompareTo@ColumnName=Compare

Begin LinearModel /Namespace=MicroArray;

Data @MicroarrayData@;

Model ~@ForEach@ + @Compare@ + @ForEach@:@Compare@;

Class @ForEach@,@Compare@;

ForEach @ForEach@;

Compare @Compare@;

CompareTo @CompareTo@;

End;

Example of a server pipeline script for RNA-Seq analysis

Before using the script example, please read the following notes:

  • The script is updated in 03/01/2013 as an example of a custom script, used as-is
  • The following script is for RNA-Seq data.
  • The following script is designed for server pipeline. Oshell script will be little bit different.
  • The script may change when we update the software. Please get the latest script from ArrayStudio GUI.
  • For better code visualization, copy/paste the following code section to editors, such as Notepad++, and select C# language.


<Info>
Label=RNA-Seq Custom Pipeline (with save after each step)
Description=Raw data QC, Filtering NGS files, ERCC, Align to B37.3 with RefGene, Fusion detection with Refgene, Post Alignment QC with Ensembl, 5' to 3' Trend, Generate counts/RPKM at transcript/gene level, Summarize and annotate Mutations, Generate unannotated peaks and putative exons. Save at each step.
Category=NGS\RNA-Seq\Illumina
<Input>
 
@PairedSamples@=True
~@PairedSamples@=Data is paired. Options are True or False (True by default)
~@PairedSamples@Levels=True,False
~@PairedSamples@ExclusiveLevels=True
 
@ThreadNumberPerJob@=4
~@ThreadNumberPerJob@=Number of threads to run for each of the steps.
~@ThreadNumerPerJob@Levels=1,2,3,4,5,6,7,8
~@ThreadNumberPerJob@ExclusiveLevels=False
 
@ParallelJobNumber@=8
~@ParallelJobNumber@=Number of parallel jobs to run for each of the steps
 
@PreviewMode@=True
~@PreviewMode@=Set to true to run raw data QC in preview mode
~@PreviewMode@Levels=True,False
~@PreviewMode@ExclusiveLevels=True
 
@Gzip@=Gzip
~@Gzip@=Set to Gzip if input files are gzipped or None
~@Gzip@Levels=Gzip,None
~@Gzip@ExclusiveLevels=True
 
@ERCC@=False
~@ERCC@=Set to True to filter out ERCC reads
~@ERCC@Levels=True,False
 
@OutputFolderName@=
~@OutputFolderName@Type=FilePath
~@OutputFolderName@=Output folder for results and BAM files
 
<Script>
//Raw data QC section
Begin NgsQCWizard /Namespace=NgsLib;
Files 
"@FileNames@";
Options /FileFormat=AUTO /QualityEncoding=Automatic /CompressionMethod=@Gzip@ /PreviewMode=@PreviewMode@ 
/ParallelJobNumber=@ParallelJobNumber@ /BasicStatistics=True /BaseDistribution=True /QualityBoxPlot=True /KMerAnalysis=True 
/SequenceDuplication=True /IgnoreFF=True /OutputFolder="@OutputFolderName@";
Output ;
End;
 
Begin SaveProject;
End;
 
Begin FilterNgsFiles /Namespace=NgsLib;
Files 
"@FileNames@";
Trimming /Mode=TrimByQuality /ReadTrimQuality=2;
FilterSource IlluminaAdapters,Human.rRNA,Human.tRNA,Ercc /CustomFile="";
Options /FileFormat=AUTO /QualityEncoding=Automatic /EnableLengthCutoff=True 
/LengthCutoff=25 /EnableMaxQualityCutoff=False /MaxQualityCutoff=15 /EnableAverageQualityCutoff=True 
/AverageQualityCutoff=10 /EnablePolyRateCutoff=True /PolyRateCutoff=80 /OutputFolder="@OutputFolderName@" 
/CompressionMethod=@Gzip@  /PairedEnd=@PairedSamples@ /FilterPairByBothEnds=True /ThreadNumberPerJob=@ThreadNumberPerJob@
/WriteFilterFiles=True /FilterErcc=False /ParallelJobNumber=@ParallelJobNumber@;
End;
 
Begin SaveProject;
End;
 
//ERCC section
Begin CountErcc /Namespace=NgsLib /Run=@ERCC@;
Files 
"@FileNames@";
Trimming /Mode=TrimByQuality /ReadTrimQuality=2;
Options /FileFormat=AUTO /QualityEncoding=Automatic /CalculateRpkm=True /CompressionMethod=@Gzip@ 
/PairedEnd=@PairedSamples@ /ThreadNumberPerJob=@ThreadNumberPerJob@ /WriteFilterFiles=True /OutputFolder="@OutputFolderName@" /ParallelJobNumber=@ParallelJobNumber@;
End;
 
Begin SaveProject;
End;
 
//Mapping Section
Begin MapRnaSeqReadsToGenome /Namespace=NgsLib;
Files 
"@FileNames@";
Reference Human.B37.3;
GeneModel RefGene;
Trimming /Mode=TrimByQuality /ReadTrimQuality=2;
Options /ParallelJobNumber=@ParallelJobNumber@ /PairedEnd=@PairedSamples@ /FileFormat=AUTO /AutoPenalty=True 
/FixedPenalty=2 /Greedy=false /IndelPenalty=2 /DetectIndels=False /MaxMiddleInsertionSize=10 
/MaxMiddleDeletionSize=10 /MaxEndInsertionSize=10 /MaxEndDeletionSize=10 /MinDistalEndSize=3 
/ExcludeNonUniqueMapping=False /ReportCutoff=10 /WriteReadsInSeparateFiles=True /OutputFolder="@OutputFolderName@" 
/GenerateSamFiles=False /ThreadNumberPerJob=@ThreadNumberPerJob@ /InsertSizeStandardDeviation=40 /ExpectedInsertSize=300 
/InsertOnSameStrand=False /InsertOnDifferentStrand=True /QualityEncoding=Automatic /CompressionMethod=@Gzip@ 
 /SearchNovelExonJunction=True /ExcludeUnmappedInBam=False;
Output ;
End;
 
Begin SaveProject;
End;
 
//Attach Sample Annotation
Begin AttachDesign;
Target @LastNgsDataName@;
Project @ProjectName@;
File "@DesignFile@";
Options  /Format=Txt;
End;
 
//Lock data name as NgsData
Begin LockDataName;
Data @LastNgsDataName@;
As @NgsData@;
End;
 
//Generate Fusions
Begin MapFusionReads /Namespace=NgsLib;
Data @NgsData@;
Reference Human.B37.3;
GeneModel RefGene;
Trimming /Mode=TrimByQuality /ReadTrimQuality=2;
Options /ParallelJobNumber=@ParallelJobNumber@ /PairedEnd=False /RnaMode=True /FileFormat=BAM /AutoPenalty=True /FixedPenalty=2 
/OutputFolder="@OutputFolderName@" /MaxMiddleInsertionSize= /ThreadNumberPerJob=@ThreadNumberPerJob@ /QualityEncoding=Automatic /CompressionMethod=None 
Gzip=False /MinimalFusionAlignmentLength=25 /FilterUnlikelyFusionReads=True /FullLengthPenaltyProportion=8 
/OutputFusionReads=True /MinimalHit=4 /MinimalFusionSpan=5000 /FusionReportCutoff=1 /NonCanonicalSpliceJunctionPenalty=2 
/FilterBy=DefaultList /DefaultFilterListVersion=v1 /FilterGeneListFileName="" /FilterGeneFamilyFileName="" /GenerateTableland=True
/FusionVersion=2;
Output ;
End;
 
Begin SaveProject;
End;
 
//Run QC Metrics on dataset
Begin RnaSeqQCMetrics /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\@NgsData@;
GeneModel Ensembl.R68;
Metrics Alignment,Flag,Profile,Source,InsertSize,Duplication,Coverage,Strand;
Options  /ExcludeFailedAlignments=True /ExcludeSecondaryAlignments=True /ExcludeMultiReads=False /ExcludeSingletons=False /OutputFolder="@OutputFolderName@" /ParallelJobNumber=@ParallelJobNumber@;
Output ;
End;
 
Begin SaveProject;
End;
 
//Summarize RnaSeq 5' 3' Trend
Begin SummarizeRnaSeqTrend53 /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\@NgsData@;
GeneModel RefGene;
Options /BinNumber=100 /TranscriptLengthBins=500, 1000, 2000, 3000, 4000, 5000  
/ExcludeGenesWithMultipleIsoforms=True /ScaleCoverage=True /ReportTranscriptData=False /ExcludeMultiReads=False 
/ExcludeSingletons=False /OutputFolder="@OutputFolderName@" /ParallelJobNumber=@ParallelJobNumber@;
Output ;
End;
 
Begin SaveProject;
End;
 
//Report both counts and RPKM/FPKM at the gene level
Begin ReportGeneTranscriptCounts /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\@NgsData@;
GeneModel RefGene;
Options /ExpressionMeasurement=RPKM+Count  /Add1=False /CountFragments=True 
/ExcludeMultiReads=False /OutputFolder="@OutputFolderName@" /ParallelJobNumber=@ParallelJobNumber@;
Output Genes;
End;
 
Begin SaveProject;
End;
 
//Report both counts and RPKM/FPKM at the transcript level
Begin ReportGeneTranscriptCounts /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\@NgsData@;
GeneModel RefGene;
Options /ExpressionMeasurement=RPKM+Count_Transcript /Add1=False /CountFragments=True 
/ExcludeMultiReads=False /OutputFolder="@OutputFolderName@" /ParallelJobNumber=@ParallelJobNumber@;
Output Transcripts ;
End;
 
Begin SaveProject;
End;
 
//Summarize mutation
Begin SummarizeMutation /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\@NgsData@;
Options /BaseQualityCutoff=6 /MapQualityCutoff=0 /MinimalIndelSize=1 /ExcludeSingletons=False /ExcludeMultiReads=False 
/LeftExclusion=0 /RightExclusion=0  /MinimalTotalHit=5 /MinimalMutationHit=1 /MinimalMutationFrequency=0.05 
/ExcludeNonMutantSites=True /GenerateSummarizedReport=True /GenerateIndividualReport=False /GenerateTableland=True 
/MaxFrequencyCutoff=0.25 /DbsnpVersion=v135 /OutputFolder="@OutputFolderName@" /ParallelJobNumber=@ParallelJobNumber@;
Output ;
End;
 
Begin SaveProject;
End;
 
//Annotate mutations with Refgene and DbSnp
Begin AnnotateMutation /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\@LastDataName@;
ID ID;
Chromosome Chromosome;
Position Position;
Mutation Mutation;
Other (Default);
Options /ReferenceLibraryID=Human.B37.3 /GeneModelID=RefGene /DbsnpVersion=v135 /GenerateClusteringFlag=False 
/ClusteringFlagWindowSize=100 /GenerateTableland=True /AnnotateLongestTranscriptOnly=False /OutputFolder="@OutputFolderName@"
/ParallelJobNumber=@ParallelJobNumber@;
Output ;
End;
 
Begin SaveProject;
End;
 
//Putative exons
Begin NgsUnannotatedPeaks /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\@NgsData@;
GeneModel RefGene;
Options  /MergeCutoff=100 /ReportPutativeExons=True /AlignmentCutoff=100 /OutputFolder="@OutputFolderName@"
/ParallelJobNumber=@ParallelJobNumber@;
Output PutativeExons;
End;
 
Begin SaveProject;
End;
 
//Unannotated peaks
Begin NgsUnannotatedPeaks /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\@NgsData@;
GeneModel RefGene;
Options  /MergeCutoff=100 /SearchCutoff=50000 /AlignmentCutoff=1000 /OutputFolder="@OutputFolderName@"
/ParallelJobNumber=@ParallelJobNumber@;
Output UnannotatedPeaks;
End;
 
Begin SaveProject;
End;
 
//Export all views
Begin ExportView;
Project @ProjectName@;
OutputFolder "@OutputFolderName@/ExportedViewsAndTables";
End;

Also Read