Ngs RnaSeqPipeline.pdf

From Array Suite Wiki

Jump to: navigation, search


NGS RNA-seq Pipeline


[back to top]

To process RNA-seq data, users can run individual functions using the Array Studio GUI, or follow a pre-defined workflow. Alternatively, they can use the RNA-Seq Pipeline module, which allows users to finish the whole RNA-Seq analysis in a single run. Based on user's selection, it will run the following pipeline (click on a step to see the individual function page):

Fastq BAM file NGS QC Wizard Filter NGS Files Map RNA-seq reads to genome Add Genome-mapped RNA-seq Reads Combined Fusion Analysis Report Gene Transcript Counts Report Exon Junctions NGS Data Type RNA-seq QC Metrics Summarize Mutations/Variants Create BAS files from BAM files Generate .alv file for Land

After alignment, it will load BAM files once and finish all downstream analyses. If "Generate ALV" option is selected, ALV files will also be generated for users to publish samples into their internal Land.

The pipeline module is particularly better than individual modules when using cloud-based analysis. There will be only one transfer between S3 storage and EC2 machines, and reduces EC2 machine setup time.


[back to top]

Accepted file formats include FASTQ, FASTA, QSEQ, and BAM. AUTO option will let software determine the file format automatically.

If user is using BAM files as input, the module will use Add Genome Mapped RNA-Seq Reads function to add alignment file as NgsData directly for all downstream analysis. The pipeline has been tested briefly for external BAM files generated by other aligners (outside of OmicSoft), but we recommend that users start from raw (fastq/fasta) files. There is a Convert Ngs Files module for user to convert BAM back to fastq.gz.

General Options

[back to top]


Add files to menu

  • Add button will add samples by selections
  • Add Folder will add all samples in the selected folder (local project only)
  • Search will find files based on sample registration (server project only)
  • Add list will allow users to add files from a list (even add a grouping file for alignment functions).

In the General tab, the user has a number of options:

  • The user can choose whether this is a paired end sequencing analysis, and if so, the reads will automatically be paired using a numbering logic (e.g. _1, _2 or .1, .2).
  • Choose a Genome and Gene model: User can build their own reference and gene model. The option here lists ones provided by OmicSoft or built by users on your server.
  • File format in Gzip, Bzip2 or None
  • Replace existing BAM files:
    • Unchecked this option (default) and the alignment step will check output folder and skip alignment step for samples already having alignment ouptut (.bam, .bim and .summary.txt) files in the folder. It is design to allow users to re-run analysis on the whole batch but skip alignment to existing samples. Note: it only skip alignment steps and will re-run others.
    • Check this option will re-run alignment for all samples and overwrite old alignment files in the output folder is there are any.
  • Number of thread for each alignment and number of jobs/samples running in parallel. The number of job can control the total number of jobs in cluster queue and cloud.
  • The alignment will use default parameters you can find in Map RNA-Seq Reads to Genome. Additional alignment settings can be added using Oscript syntax. User can get all possible alignment parameter options from MapRnaSeqReadsToGenome command.

Analysis Choices

Check box to run each individual pipeline module. The pipeline will run each analysis based on default parameters setup in individual NGS analysis module


  • When filter raw data is checked:
    • If the reference genome contains Human key word, then analysis will use FilterSource IlluminaAdapters,Ercc,Human.rRNA,Human.tRNA;
    • If the reference genome does NOT contain Human key word, then analysis will use FilterSource IlluminaAdapters,Ercc;
  • Enable Trim UTRs by data in RNA-Seq Analysis can improve transcript-level quantification and RPKM accuracy.
  • Additional options from the Oscript parameters for individual steps can also be added, e.g. AdapterStripping 3'End /AdapterSequence=AAAAAAAAAAAA /ExcludeUnmatched=False /TrimReadsFirst=True;

Users are required to specify output name and folder.


[back to top]

Project in the solution explorer

There are multiple result data objects in Omicsoft Project:

  • NgsData objects for RNA-Seq alignment and Fusion supporting reads
  • OmicData objects for FPKM/Count at gene and transcript level
  • Table report for raw data QC, alignment report, alignment QC, exon junctions, reports from fusion and mutation detection

For more detailed explanation of each output, please read the wiki pages for functions mentioned in Analysis Choices.

The image below is using "Test" as output name:


Files in the output folder



[back to top]


Related Articles

[back to top]
[back to top]