How to load bam files from external into OmicSoft RNAseq pipeline

From Array Suite Wiki

Jump to: navigation, search

Sometimes users want to input files from external. This wiki describes the use case of loading BAM files into OmicSoft's RNASeq data analysis workflow (including aligned data QC, quantification at gene, transcript, exon and exon junction levels, and detection of fusions and mutations).

Contents

Usage

Step 1: register samples [[1]]

There are two things to keep in mind when registering external BAM files.

First, best practice is to register samples [[2]] so that the SampleID and BAM file are an exact match, which will append the sample metadata into the project design.

Examle sampleregistration.png

Second, this pipeline requires that you register the BAM file with the prefix of BAM in the File path.

Example SampleRegistration2.png

If users happen to register the sample with file path not including the prefix of BAM= the error message you will see after starting the pipeline will be:

[00:00:01] Setting job ID...

[00:00:01] Error occurred from OJobProcess.Run. Error message=Invalid virtual path: @FileNames.BAM@. Virtual path must be provided. InputPath=@FileNames.BAM@.@@@

Step 2: Browse samples and Run Server Pipeline

For more information [[3]]

RunFromServer.png

Step 3: set parameters

The scripts requires values be input for some user defined parameters. Each value can be chosen from a drop-down menu (or for more flexibility, values can be typed freely into each box by clicking on the white space). Help for each parameter is found in the Description field.

ReferenceName and GeneModeName are required; therefore this information is a prerequisite. If users do not know this information, Omicsoft offers BAMTools that can be used to extract BAM header:

BAMTools Extract Header is a server script under the category of Misc. BAMTools.png

The parameter RunIndex has options True or False. External BAM files are most likely be paired with .bai index. OmicSoft also uses .bim index for the alignment summary information; therefore by default, we recommend to set RunIndex to True. This will output .bim and .bai index in the input folder (same as BAM file path).


ArrayLand Vector Files (ALV) store expression data for publish into OmicSoft Lands. The parameter ALVLogic is set to False by default since most use cases do not require the data be separately stored as ALV. Users can set to True to run this step and generate 1 ALV file per sample by data type.


Output Folder path is required.


RunServerPipeline.png

Script Content

The script below can be copied into a text file and renamed as filename.pscript. Use Manage Server Scripts [[4]] to add the .pscript to your instance of ArrayServer.

<Info>

Label=RNA-Seq Pipeline From BAM file

Description=Start with BAM files, Post Alignment QC, Summarize Fusion, Exon Junctions, and Mutations, Perform Quantification, Trim UTRs by data, Generate BAS files. Land ALV files optional.

Category=NGS\RNA-Seq\Illumina


<Input>

@DataName@=RNASeq
~@DataName@=The output data name
@ReferenceName@=Human.B38
~@ReferenceName@Levels=Human.B37.3, Human.B38, Mouse.B38, Rat.B6.0, Dog.CanFam3.1, Cyno.WashU2013
~@ReferenceName@=Genome was used for the alignment. If unknown, use BAMTools to extract BAM header.

@GeneModelName@=OmicsoftGenCode.V30
~@GeneModelName@Levels=OmicsoftGene20130723, OmicsoftGenCode.V30, Ensembl.R78, Ensembl.R83, Ensembl.R84, Cyno.WashU2013
~@GeneModelName@=Genome annotation to use for quantification. User can reference standard RNAseq platforms for each organism to find standard recommended Gene Model.

@ParallelJobNumber@=10
~@ParallelJobNumber@=Number of parallel jobs to run for each of the steps

@RunIndex@=True
~@RunIndex@=Create OmicSoft BAM .bim Index. Output filepath is defaulted to input BAM filepath.
~@RunIndex@Levels=True,False
~@RunIndex@ExclusiveLevels=True

@ALVLogic@=False
~@ALVLogic@=ArrayLand Vector File (ALV) required for creating a Land from an analysis. Options are True or False (False by default)
~@ALVLogic@Levels=True,False
~@ALVLogic@ExclusiveLevels=True

@OutputFolderName@=
~@OutputFolderName@Type=FilePath
~@OutputFolderName@=Output folder for results


<Script>
//Create .bim and .bai index //This step is optional and will be skipped if both index already exist Begin BamTools /Namespace=NgsLib /Run=@RunIndex@;
Files
"@FileNames.BAM@";
Options /Action=IndexBin /ForceIndexing=True /ParallelJobNumber=@ParallelJobNumber@ /OutputFolder="";
End;

Begin SaveProject;
End;

Begin RnaSeqPipeline /Namespace=NgsLib;
Files
"@FileNames.BAM@";
Reference @ReferenceName@;
GeneModel @GeneModelName@;
NgsQCWizard /Run=False;
FilterNgsFiles /Run=False;
FilterSource IlluminaAdapters,Ercc,Human.rRNA,Human.tRNA;
Count /Run=True /AutoTrimUtr=True;
RnaSeqQCMetrics /Run=True;
SummarizeExonJunction /Run=True;
SummarizeMutation2Snp /Run=True;
CombinedFusionAnalysis /Run=True;
GenerateBas /Run=True;
GenerateLandAlv /Run=@ALVLogic@;
Options /ParallelJobNumber=@ParallelJobNumber@ /ThreadNumberPerJob=4 /PairedEnd=False /FileFormat=BAM
/OutputFolder="@OutputFolderName@";
Output @DataName@;
End;

Begin SaveProject;
End;

// Attach metadata to design of RNA-Seq NgsData
Begin AttachDesign;
Target @DataName@;
Project @ProjectName@;
File "@DesignFile@";
Options /Format=Txt;
End;

// Attach metadata to design of RNA-Seq Quantification OmicData
Begin AttachDesign;
Target @DataName@.FPKM;
Project @ProjectName@;
File "@DesignFile@";
Options /Format=Txt;
End;

Begin AttachDesign;
Target @DataName@.Count;
Project @ProjectName@;
File "@DesignFile@";
Options /Format=Txt;
End;

Begin AttachDesign;
Target @DataName@.Transcript_Count;
Project @ProjectName@;
File "@DesignFile@";
Options /Format=Txt;
End;

Begin AttachDesign;
Target @DataName@.Transcript_FPKM;
Project @ProjectName@;
File "@DesignFile@";
Options /Format=Txt;
End;

Begin SaveProject;
End;

EnvelopeLarge2.png