OmicScript pipeline for RNA-Seq data analysis

From Array Suite Wiki

Jump to: navigation, search

RNA-Seq pipeline

As we shown in Oshell, you can virtually run any OmicSoft function (e.g., mutation detection, fusion detection, etc.) in Oshell, and even run them in a batch mode in cluster and Amazon Cloud if needed. Below, I will give one example of OmicScript pipeline for RNA-Seq data analysis.

Data analysis modules

RNASeq pipeline in OmicSoft Project Environment.

Description of each pipeline function

OmicSoft Function Description Note
NewProject Create the OmicSoft project enviornment .
OpenProject Open exisiting OmicSoft project enviornment .
NgsQCWizard The QC Wizard is a quick and easy way to run multiple QC commands simultaneously. Available options are Basic statistics, Base Distribution, Quality BoxPlot, K-Mer Analysis and Sequence Duplication. .
MapRnaSeqReadsToGenome Align to genome reference using OmicSoft Aligner .
AddGenomeMappedRnaSeqReads Add genome mapped RNASeq reads if user decides to use other aligners .
RNASeqQCMetrics Report alignment QC Metrics, such as mapping statistics, Duplication rate and other, see RNASeqQCMetricsTable for details .
SummarizeRnaSeqTrend53 Report alignment QC: RNA-Seq positional bias, 5'->3' trend .
ReportGeneTranscriptCounts Quantify expression on gene and transcript level .
ReportExonCounts Quantify: Exon/Jxn based on a gene model (all known junctions/exons) .
ReportExonJunctions Report all exon junctions with annotation telling whether it is novel or known .
MapFusionReads Fusion detecting based on the alignment of fusion junction spanning reads .
ReportPairedEndFusionGenes Report paired end fusion based on inter-transcript read pairs .
SummarizeMutation Mutation detection .
AnnotateMutation Annotate with gene coding info and dbSNP .
ExportView Export all figures and report tables to files .
BamTools /Action=Bas Create BAS file for quick genome browser loading For OmicSoft GenomeBrowser only
SaveProject Save the OmicSoft project enviornment Recommended to save the environment after each step
CloseProject Close the OmicSoft project enviornment .

Script example

Before using the script example, please read the following notes:

  • The script is updated to 03/01/2013
  • The following script is based on a paired-end RNA-Seq data. You should change the file path accordingly in your runs.
  • The following script is designed for Oshell. Server pipeline script will be little bit different. Please contact OmicSoft Support for more information.
  • The script may change when we update the software. Please click the link in the table above for the latest configuration for each module.
  • Annotation for each parameter can be found through the link in the table above.
  • For better code visualization, copy/paste the following code section to editors, such as Notepad++, and select C# language.
  • The code is using a Macro to define some key sample specific parameters before the entire script. You can find a regular OmicScript (without Macro) for this sample here


#region sample specific parameters
// Use quote to protect path separator in Linux
Begin Macro;
 
@ThreadNum@ 6;
@ProjectName@ "PipelineTest";
@ProjectFolder@ "/test/OmicTest/OmicSoft/20130109_OshellScriptTest";
 
@FileNames@
"
/TestDataSets/HumanRNASeqPaired/SRR243575.s.1.fastq.gz
/TestDataSets/HumanRNASeqPaired/SRR243575.s.2.fastq.gz
";
 
@CompressionMethod@ Gzip;
@Gzip@ True;
@PairedEnd@ True;
@ReferenceName@ Human.B37.3;
@GeneModelName@ RefGene;
 
End;
#endregion
 
#region Oshell run
//Create the OmicSoft project enviroment
Begin NewProject;
File "@ProjectFolder@/@ProjectName@.osprj";
Options /Distributed=true;
End;
 
#region Raw data QC
//QC: Raw sequence, using NgsQCWizard
Begin NgsQCWizard /Namespace=NgsLib;
Files
"@FileNames@";
Options /ThreadNumberPerJob=@ThreadNum@ /FileFormat=AUTO /CompressionMethod=@CompressionMethod@ /QualityEncoding=Automatic /PreviewMode=false 
/BasicStatistics=true /BaseDistribution=true /QualityBoxPlot=true /KMerAnalysis=true /SequenceDuplication=true;
Output rawseq_qc;
End;
 
// save the OmicSoft project enviroment after each major step
Begin SaveProject;
Project @ProjectName@;
File "@ProjectFolder@/@ProjectName@.osprj";
End;
#endregion
 
#region RNA-Seq alignment to human reference using OSA
Begin MapRnaSeqReadsToGenome /Namespace=NgsLib;
Files 
"@FileNames@";
Reference @ReferenceName@;
GeneModel @GeneModelName@;
Trimming  /Mode=TrimByQuality /ReadTrimQuality=2;
Options  /BamSubFolder=primary_alignment /ParallelJobNumber=1 /PairedEnd=@PairedEnd@ /FileFormat=AUTO /AutoPenalty=True 
/FixedPenalty=2 /Greedy=false /IndelPenalty=2 /DetectIndels=False /MaxMiddleInsertionSize=10 /MaxMiddleDeletionSize=10 
/MaxEndInsertionSize=10 /MaxEndDeletionSize=10 /MinDistalEndSize=3 /ExcludeNonUniqueMapping=False /ReportCutoff=10 
/WriteReadsInSeparateFiles=True /OutputFolder="" /GenerateSamFiles=False /ThreadNumberPerJob=@ThreadNum@ 
/InsertSizeStandardDeviation=40 /ExpectedInsertSize=300 /InsertOnSameStrand=False /InsertOnDifferentStrand=True 
/QualityEncoding=Automatic /CompressionMethod=@CompressionMethod@ /Gzip=@Gzip@ /SearchNovelExonJunction=True /ExcludeUnmappedInBam=False;
Output primary_alignment;
End;
 
Begin SaveProject;
Project @ProjectName@;
File "@ProjectFolder@/@ProjectName@.osprj";
End;
 
#endregion
 
#region Alignment QC
//Alignment QC Metrics using RnaSeqQCMetrics
Begin RnaSeqQCMetrics /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\primary_alignment;
GeneModel @GeneModelName@;
Metrics Alignment,Flag,Profile,Source,InsertSize,Duplication,Coverage,Strand;
Options /ThreadNumberPerJob=@ThreadNum@ /ExcludeFailedAlignments=true /ExcludeSecondaryAlignments=true 
/ExcludeMultiReads=false /ExcludeSingletons=false;
Output alignment_qc;
End;
 
//RNA-Seq 5'->3' trend using SummarizeRnaSeqTrend53
Begin SummarizeRnaSeqTrend53 /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\primary_alignment;
GeneModel @GeneModelName@;
Options /OutputFolder="@ProjectFolder@/@ProjectName@/primary_alignment" /ThreadNumberPerJob=@ThreadNum@ 
/BinNumber=100 /TranscriptLengthBins=500, 1000, 2000, 3000, 4000, 5000 /ExcludeGenesWithMultipleIsoforms=true 
/ScaleCoverage=true /ReportTranscriptData=false;
Output qc_trend53;
End;
 
Begin SaveProject;
Project @ProjectName@;
File "@ProjectFolder@/@ProjectName@.osprj";
End;
#endregion
 
#region Quantification at gene, isoform, exon and exon-junction level
//gene level rsem estimation
Begin ReportGeneTranscriptCounts /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\primary_alignment;
GeneModel @GeneModelName@;
Options /ExpressionMeasurement=RPKM /ThreadNumberPerJob=@ThreadNum@ /Add1=False /CountFragments=True /ExcludeMultiReads=False 
/UseEffectiveTranscriptLength=True /CountStrandedReads=True /CountReverseStrandedReads=True;
Output quantification_gene;
End;
 
//transcript level rsem estimation
Begin ReportGeneTranscriptCounts /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\primary_alignment;
GeneModel @GeneModelName@;
Options /ExpressionMeasurement=RPKM_Transcript /ThreadNumberPerJob=@ThreadNum@ /Add1=False /CountFragments=True /ExcludeMultiReads=False 
/UseEffectiveTranscriptLength=True /CountStrandedReads=True /CountReverseStrandedReads=True;
Output quantification_tx;
End;
 
//quantify the known exon and exon junctions based on a gene model
Begin ReportExonCounts /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\primary_alignment;
GeneModel @GeneModelName@;
Options /ThreadNumberPerJob=@ThreadNum@ /ReportExonCounts=True /ReportExonJunctionCounts=True /ExcludeSingletons=False /Rpkm=True 
/CountStrandedReads=True /CountReverseStrandedReads=True;
Output quantification_known;
End;
 
//quantify the each exon junction based on alignment and annotate with gene model info (known or novel exon junctions)
Begin ReportExonJunctions /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\primary_alignment;
Options /ExcludeMultiReads=False /ExcludeSingletons=False /NMCutoff=4 /JunctionOverhangCutoff=4 /GenerateReport=True 
/GenerateBedFile=False /BedFileOutputFolder="" /MinimalHit=1 /ThreadNumberPerJob=@ThreadNum@ 
/OutputFolder="@ProjectFolder@/@ProjectName@/primary_alignment";
Output quantification_jxn;
End;
 
Begin SaveProject;
Project @ProjectName@;
File "@ProjectFolder@/@ProjectName@.osprj";
End;
#endregion
 
#region Fusion gene detection
//Detect fusion based on inter-transcript read pairs (PE)
Begin ReportPairedEndFusionGenes /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\primary_alignment;
GeneModel @GeneModelName@;
Options /GenerateData=True /OutputFusionReads=True /MinimalHit=2 /FusionReportCutoff=1 /ThreadNumberPerJob=@ThreadNum@ 
/FilterBy=DefaultList /DefaultFilterListVersion=v1 /FilterGeneListFileName="" /FilterGeneFamilyFileName="" 
/GenerateTableland=True /OutputFolder="@ProjectFolder@/@ProjectName@/fusion_PEReads";
Output FusionPE;
End;
 
//Detect fusion based alignment of junction spanning reads (SE)
Begin MapFusionReads /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\primary_alignment;
Reference @ReferenceName@;
GeneModel @GeneModelName@;
Trimming /Mode=TrimByQuality /ReadTrimQuality=2;
Options /FusionVersion=2 /ParallelJobNumber=1 /PairedEnd=False /RnaMode=True /FileFormat=BAM /AutoPenalty=True /FixedPenalty=2 
/OutputFolder="@ProjectFolder@/@ProjectName@/fusion_SE_alignment" /MaxMiddleInsertionSize=10 
/ThreadNumberPerJob=@ThreadNum@ /QualityEncoding=Automatic /CompressionMethod=None /Gzip=False /MinimalFusionAlignmentLength=25 
/FilterUnlikelyFusionReads=True /FullLengthPenaltyProportion=8 /OutputFusionReads=True /MinimalHit=4 /MinimalFusionSpan=5000 
/FusionReportCutoff=1 /NonCanonicalSpliceJunctionPenalty=2 /FilterBy=DefaultList /DefaultFilterListVersion=v1 
/FilterGeneListFileName="" /FilterGeneFamilyFileName="" /GenerateTableland=True;
Output FusionSE;
End;
#endregion
 
#region Mutation detection
//detect mutation and annotate with dbSNP; will download dbSNP from omicsoft at first time 
Begin SummarizeMutation /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\primary_alignment;
Options /BaseQualityCutoff=20 /MapQualityCutoff=20 /MinimalIndelSize=1 /ExcludeSingletons=False /ExcludeMultiReads=False 
/LeftExclusion=5 /RightExclusion=5 /ThreadNumberPerJob=@ThreadNum@ /MinimalTotalHit=20 /MinimalMutationHit=5 /MinimalMutationFrequency=0.15 
/ExcludeNonMutantSites=True /GenerateSummarizedReport=True /GenerateIndividualReport=False /GenerateTableland=True 
/MaxFrequencyCutoff=0.15 /DbsnpVersion=v135 /OutputFolder="@ProjectFolder@/@ProjectName@/mutation";
Output MutationSummary;
End;
 
//Annotate Mutation 
Begin AnnotateMutation /Namespace=NgsLib;
Project @ProjectName@;
Data @ProjectName@\\MutationSummary.MutationReport;
ID ID;
Chromosome Chromosome;
Position Position;
Mutation Mutation;
Other (default);
Options /ReferenceLibraryID=@ReferenceName@ /GeneModelID=@GeneModelName@ /DbsnpVersion=v135 /GenerateClusteringFlag=True 
/ClusteringFlagWindowSize=10 /GenerateTableland=True /AnnotateLongestTranscriptOnly=False /AnnotateFunctionalMutation=False 
/AnnotateSomaticMutation=False;
Output MutationAnnotated.MutationReport;
End;
 
Begin SaveProject;
Project @ProjectName@;
File "@ProjectFolder@/@ProjectName@.osprj";
End;
 
#endregion
 
Begin ExportView;
Project @ProjectName@;
OutputFolder "@ProjectFolder@/@ProjectName@/ExportedResults";
End;
 
Begin CloseProject;
Project @ProjectName@;
End;
 
#endregion