Oshell

From Array Suite Wiki

Jump to: navigation, search

Contents

Overview

Oshell.exe is a .NET application, and it has also been optimized in the Linux environment using the MONO application. In this article, I will give an introduction to Oshell, its installation and wiki links to typical usages.

Oshell/OmicSoft Project Environment

Oshell environment is a project-oriented analysis environment which contains popular analysis modules for data generated from sequencing and microarray platforms. Each project in the environment is associated with its data objects and analysis modules. Comprehensive data analysis pipelines can be constructed as projects in the environment in a user-friendly fashion. Pipeline is written and executed in OmicScript format, which is a brief script specifying data objects and running parameters. Data objects can be passed on to their corresponding downstream analysis modules smoothly.

OmicSoft project is

  • A collection of data objects (NGS object, Omics object, and table)
    • NGS data is a collection of BAM file links. BAM file will load to software when necessary. Multiple projects can share the same BAM file.
    • Omics data can be any result table combined with sample design and feature (e.g. gene) annotation, such as gene expression or CNV results.
    • Table is anything like an excel table, such as sequence alignment report.
    • List can be a list of IDs (e.g. gene). It can be used to filter result in Omics data and table.
  • An environment for analysis
    • Analysis runs on one/multiple/subset of objects
    • Analysis steps/scripts are tracked
  • An entity sharable on the server

Installation

Based on direct implementation of all its analysis modules, Oshell environment can be installed and running without dependency on other bioinformatics software.

Install OShell on Windows

Oshell is coded in C#, and Windows .Net is its native running platform. Users can install Oshell very easily:

  1. Create a folder with name "Oshell"
  2. Download and save OmicsoftUpdater to "Oshell" folder
  3. In "Oshell" folder, create an empty file with name oshell.exe [note: the file extension is .exe]
  4. Double click OmicsoftUpdater.exe and all software binaries will be automatically downloaded into "Oshell" folders

Install OShell on Linux

Oshell has also been optimized in Linux by using Mono. The following libraries are required to run the full functions of Oshell.

Libgdiplus

The libgdiplus package must be installed (either using yum, apt-get, or installing from source at libgdiplus).

cd /opt
wget http://download.mono-project.com/sources/libgdiplus/libgdiplus-2.10.tar.bz2
tar  jxvf   libgdiplus-2.10.tar.bz2
cd  /opt/libgdiplus-2.10
./configure  --prefix=/opt/libgdiplus-2.10
make
make  install

If installing from source, you may need to add the "/libgdiplusPrefix/lib" to the shared library search paths, and check to make sure libgdiplus library is on the shared library search paths. To check, type

 $ ldconfig -p | grep libgdiplus
     libgdiplus.so (libc6,x86-64) => /opt/libgdiplus-2.10/lib/libgdiplus.so

Here the libgdiplus is installed at "/opt/libgdiplus" and has been set correctly. If not, one way to add it to the shared library path is by doing this (with root privilege),

 echo "/opt/libgdiplus/lib" > /etc/ld.so.conf.d/libgdiplus.conf
 ldconfig

If Unix admin does want to add libgdiplus in shared library, he/she can modify the following config file: /MonoPrefix/etc/mono/config: Add the following line at the end of the file before </configuration>

<dllmap dll="gdiplus.dll" target="/opt/libgdiplus-2.10/lib/libgdiplus.so"/>

If installing from yum, it may be necessary to additionally install the following:

yum install libungif libungif-devel
[back to top]

zlib

The zlib-devel package must be installed (either using yum, apt-get, or installing zlib from source.

other requirement

Make sure the ulimit for "max user processes" and "open files" are set to the max value: 65536. You can check the values by typing: ulimit -a.

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 515184
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65536
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Modify ulimit in two config files following ulimit setup wiki.


[back to top]

Mono

We recommended to use mono 2.10.9 for NGS alignment in Linux.

  • Download Mono 2.10.9
    cd /opt
    wget -c  http://origin-download.mono-project.com/sources/mono/mono-2.10.9.tar.bz2

The bz2 file can be saved to a temporary location, e.g. ~/temp/

  • Extract and modify certificate if necessary
    tar  jxvf   mono-2.10.9.tar.bz2
    cd  /opt/mono-2.10.9

For mono 2.10.9, it is recommended to modify X509Certificate to the latest standard.

  • Compile and install. On the command line, type
    cd  /opt/mono-2.10.9
    ./configure  --prefix=/opt/mono-2.10.9 --with-large-heap=yes
    make
    make  install

Note:

  • The location of mono installed is set by the option "--prefix" in the configure step, which can be changed to another location.
  • The option --with-large-heap=yes is to enable support for GC heaps larger than 3gb, which is required for NGS alignment, as well as some Array Server functions)

Double check mono installation and version

ls /opt/mono-2.10.9/bin/mono* -all
/opt/mono-2.10.9/bin/mono --version
/opt/mono-2.10.9/bin/mono-sgen --version
[back to top]

Install Oshell

Assume mono (version 2.10.9) has been installed in /opt/mono-2.10.9/. Here we install oshell in the home directory.

mkdir ~/oshell
cd ~/oshell
wget -c  http://omicsoft.com/software_update/OmicsoftUpdater.exe
touch oshell.exe
/opt/mono-2.10.9/bin/mono OmicsoftUpdater.exe

Install OShell on Mac

Oshell is not officially supported in Mac. It may have some issue when running heavy jobs such as alignment. It should work well for Land R API.

Oshell in Mac also relies on Mono. Besides steps above for Linux, here are some tips from our user (Thanks Mike).

sqlite3 and zlib through homebrew

Install homebrew (http://brew.sh), then:

brew install sqlite3
brew install zlib

Mono and Oshell

Same as Linux installation.

Pleas also set ulimit as suggested above.

Add dylib links to ~/oshell directory

touch ~/oshell/ocorelib.dll.config

Paste this into ~/oshell/ocorelib.dll.config:

<configuration>
 <dllmap dll="z" target="/usr/local/opt/zlib/lib/libz.dylib" />
 <dllmap dll="sqlite3" target="/usr/local/opt/sqlite/lib/libsqlite3.dylib" />
</configuration>

Getting Started

Check Oshell Version

Get Oshell version

mono oshell.exe

You will get something like:

--------------------------------------------------------------------------------
Version: 9.0.0.1
Analysis mode not specified
--------------------------------------------------------------------------------

Keep updated

User can always update Oshell to our latest development using OmicsoftUpdater.

mono OmicsoftUpdater.exe

Run OmicScript in Oshell

If you have an OmicScript ready, it can be executed by

mono oshell.exe --runscript Base_Dir Script_path Temp_Dir Mono_Path > PathToRun.log

where

  • Base_Dir is the path to Oshell base directory where the ReferenceLibrary folder should be located, e.g. /opt/omicsoft
  • Script_path is the path to the oshell script, e.g. /opt/omicsoft/test/run.oscript
  • Temp_Dir is the path to a directory storing temporary files, e.g. /scratch
  • Mono_Path is the path to the mono so that Oshell will remember during the run, e.g. /opt/omicsoft/mono/mono
  • PathToRun.log is the path to the log file recording all logs, e.g. /opt/omicsoft/test/run.oscript.log

Note: The mono command is not required in Windows OS.

If running on a machine with Array Studio or ArrayServer, BaseDir and TempDir can use existing directories (i.e. no need to specify a second BaseDirectory for oshell to hold separate genome references/gene models etc).

In the section below, we will provide more details about How to write OmicScript.

Build genome reference index and gene model

In most of NGS functions, Oshell requires the user to have a reference genome and a gene model built prior to running the actual functions. The indexing needs to be generated only once for each reference. By default, when it is the first time to run jobs using certain reference and gene model, the program will automatically download a compiled genome and gene model.

User has to specifies the right name for the reference genome and gene model. See A list of compiled genome and gene model from OmicSoft. For example, if we run alignment detection with Human.B37.3 and RefGene model using the OmicScript for Alignment. It will download the Human.B37.3 and RefGene model in your local folder. You will find folders under the Base_Dir:

Base_Dir
--ReferenceLibrary
---- Human.B37.3.dreflib1
---- Human.B37.3.gindex1
---- Human.B37.3_GeneModels
---- ---- RefGene.gmodel2

Users can choose to build their own reference library, it is recommended to use Oshell --runscript with OmicScript functions: BuildReferenceLibrary and BuildGeneModel, see example below.

If users want to use the command line directly, please read Build Reference Library and Gene Model through Oshell subcommand.

OmicScript

If you have ArrayStudio software, please read Generate and run OmicScript in ArrayStudio GUI. Other users can write OmicScript based on our OmicScript Collection. We will provide some examples below.

OmicScript to build reference index and gene model

 Begin BuildReferenceLibrary /Namespace=NgsLib;
 Reference Reference_library_id;
 Files "/pathToFile/reference.fa";
 Options /cDNA=False /ReverseComplement=False /Build64BitIndex=True /Build32BitIndex=False /Species=Unspecified /NcbiBuild=1.0;
 End;
 
 Begin BuildGeneModel /Namespace=NgsLib;
 Reference Reference_library_id;
 GeneModel Gene_model_id;
 Files "/pathToFile/genemodel.gtf";
 Options /AppendChr=False /BuildGeneLevelAnnotation=True /BuildTranscriptLevelAnnotation=True;
 End;

Save above script into buildIndex.oscript and run the script using

mono oshell.exe --runscript Base_Dir Script_path/buildIndex.oscript Temp_Dir Mono_Path

OmicScript for OmicSoft Alignment

Details about OmicSoft Aligner (OSA) are in the following publication:

Hu, Jun, et al. "OSA: a fast and accurate alignment tool for RNA-Seq." Bioinformatics 28.14 (2012): 1933-1934.

We have migrated the OSA to Oshell environment. The RNA-Seq alignment function MapRnaSeqReadsToGenome has to be wrapped by NewProject (create the environment) and SaveProject, CloseProject (closes the environment):

Begin NewProject;
File "/test/omicsoft/AlignmentProject.osprj";
Options /Distributed=True;
End;
 
Begin MapRnaSeqReadsToGenome /Namespace=NgsLib;
Files 
"/pathToFile/SampleA_1.fastq.gz
/pathToFile/SampleA_2.fastq.gz
/pathToFile/SampleB_1.fastq.gz
/pathToFile/SampleB_2.fastq.gz";
Reference Human.B37.3;
GeneModel RefGene;
Trimming /Mode=TrimByQuality /ReadTrimQuality=2;
Options /ParallelJobNumber=2 /PairedEnd=True /FileFormat=AUTO /AutoPenalty=True /FixedPenalty=2 /Greedy=false /IndelPenalty=2 
/DetectIndels=False /MaxMiddleInsertionSize=10 /MaxMiddleDeletionSize=10 /MaxEndInsertionSize=10 /MaxEndDeletionSize=10 /MinDistalEndSize=3 
/ExcludeNonUniqueMapping=False /ReportCutoff=10 /WriteReadsInSeparateFiles=True /OutputFolder="/test/omicsoft/AlignmentProject/BAMOutput" 
/GenerateSamFiles=False /ThreadNumber=6 /InsertSizeStandardDeviation=40 /ExpectedInsertSize=300 /InsertOnSameStrand=False 
/InsertOnDifferentStrand=True /QualityEncoding=Automatic /CompressionMethod=Gzip /Gzip=True /SearchNovelExonJunction=True /ExcludeUnmappedInBam=False;
Output Alignment;
End;
 
Begin SaveProject;
Project AlignmentProject;
File "/test/omicsoft/AlignmentProject.osprj";
End;
 
Begin CloseProject;
Project AlignmentProject;
End;

Save above script into Alignment.oscript and run the script using

mono oshell.exe --runscript Base_Dir Script_path/Alignment.oscript Temp_Dir Mono_Path

When Oshell is run in standalone mode on a single workstation, multiple alignment or summary jobs are automatically spawned off so that each job occupies one process using multiple threads. Here /ParallelJobNumber=2 /ThreadNumber=6, two samples will run simultaneously, each will use 6 threads.

For details about each parameters, please read articles: MapRnaSeqReadsToGenome, NewProject, SaveProject and CloseProject.

OmicScript for FusionMap

Details about FusionMap are in the following publication:

Ge, H, et al. "FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution." Bioinformatics 27.14 (2011): 1922-1928.

We have migrated the FusionMap to Oshell environment, with the MapFusionReads function.

Begin NewProject;
File "/test/omicsoft/FusionDetection.osprj";
Options /Distributed=True;
End;
 
Begin MapFusionReads /Namespace=NgsLib;
Files 
"/pathToData/Illumina.Paired.1.fastq.gz
/pathToData/Illumina.Paired.2.fastq.gz";
Reference Human.B37.3;
GeneModel RefGene;
Trimming /Mode=TrimByQuality /ReadTrimQuality=2;
Options /FusionVersion=2 /ParallelJobNumber=4 /PairedEnd=False /RnaMode=True /FileFormat=BAM /AutoPenalty=True 
/FixedPenalty=2 /OutputFolder="/ouput/xxxx" /MaxMiddleInsertionSize= /ThreadNumber=2 
/QualityEncoding=Automatic /CompressionMethod=None /Gzip=False /FilterUnlikelyFusionReads=False 
/FullLengthPenaltyProportion=8 /OutputFusionReads=True /MinimalHit=4 /MinimalFusionAlignmentLength=0 
/MinimalFusionSpan=0 /FusionReportCutoff=1 /ReportUnannotatedFusion=False 
/NonCanonicalSpliceJunctionPenalty=2 /RealignToGenome=True;
Output FusionDetection;
End;
 
Begin ExportView;
Project FusionDetection;
OutputFolder "/test/omicsoft/FusionDetection/Results";
End;
 
Begin SaveProject;
Project FusionDetection;
File "/test/omicsoft/FusionDetection.osprj";
End;
 
Begin CloseProject;
Project FusionDetection;
End;

Also Read:

OmicScript pipeline for RNA-Seq data analysis

Please read OmicScript pipeline for RNA-Seq data analysis, the pipeline includes the alignment, fusion detection, mutation detection and many other steps.

OmicScript pipeline for DNA-Seq data analysis

Please read OmicScript pipeline for DNA-Seq data analysis

Deploy Oshell in Cluster

Use build-in scheduler

When Oshell is run in cluster mode on a grid engine, each job occupies one spot (one or more slots based on the thread number setting and cluster queue setting). The built-in scheduling system supports both SGE and PBS which can accelerate the analysis of tremendous amount of RNA-Seq data.

Oshell uses SetEnvironment function to set up the cluster for Oshell jobs. Here is one example of OmicScript which will schedule jobs to cluster, monitor the process of each job, handle running logs from multiple jobs, summarize jobs outputs into one Oshell project.

Example OmicScript running on SGE

#Enable cluster
Begin SetEnvironment;
Cluster /EnableCluster=True /ClusterAlignmentPath="/Oshell/ClusterAlignment.sh" /ClusterSummaryPath="/Oshell/ClusterSummary.sh" 
/ClusterParallelEnvironment=peomics /ClusterParallelRatioFactor=1 /ClusterQueueName=all.q /ClusterGridEngine=SGE 
/DefaultClusterJobNumber=12
End;
 
#Create the Oshell project environment
Begin NewProject;
File "/test/AlignmentTest/OshellClusterTest.osprj";
Options /Distributed=true;
End;
 
#Alignment
Begin MapRnaSeqReadsToGenome /Namespace=NgsLib;
Files 
"
/TestDataSets/HumanRNASeqPaired/SRR327893.subset.1.fastq.gz
/TestDataSets/HumanRNASeqPaired/SRR327893.subset.2.fastq.gz
/TestDataSets/HumanRNASeqPaired/SRR065521.subset.1.fastq.gz
/TestDataSets/HumanRNASeqPaired/SRR065521.subset.2.fastq.gz
/TestDataSets/HumanRNASeqPaired/simulationread200PE_1.fastq.gz
/TestDataSets/HumanRNASeqPaired/simulationread200PE_2.fastq.gz
/TestDataSets/HumanRNASeqPaired/simulationread400PE_1.fastq.gz
/TestDataSets/HumanRNASeqPaired/simulationread400PE_2.fastq.gz
";
Reference Human.B37.3;
GeneModel RefGene;
Trimming  /Mode=TrimByQuality /ReadTrimQuality=2;
Options  /ParallelJobNumber=4 /PairedEnd=True /FileFormat=AUTO /AutoPenalty=True
/FixedPenalty=2 /Greedy=false /IndelPenalty=2 /DetectIndels=False /MaxMiddleInsertionSize=10 /MaxMiddleDeletionSize=10
/MaxEndInsertionSize=10 /MaxEndDeletionSize=10 /MinDistalEndSize=3 /ExcludeNonUniqueMapping=False /ReportCutoff=10 
/WriteReadsInSeparateFiles=True /OutputFolder="/test/AlignmentTest/OshellClusterTest/BAMFiles" /GenerateSamFiles=False 
/ThreadNumberPerJob=4 /InsertSizeStandardDeviation=40 /ExpectedInsertSize=300 /InsertOnSameStrand=False 
/InsertOnDifferentStrand=True /QualityEncoding=Automatic /CompressionMethod=Gzip /Gzip=True /SearchNovelExonJunction=True /ExcludeUnmappedInBam=False;
Output primary_alignment;
End;
 
# save OmicSoft project enviroment
Begin SaveProject;
Project OshellClusterTest;
File "/test/AlignmentTest/OshellClusterTest.osprj";
End;
 
# close Oshell project enviroment
Begin CloseProject;
Project OshellClusterTest;
End;
SGE Cluster jobs scheduled by Oshell.

Also Reads: SetEnvironment, ClusterAlignmentPath and ClusterSummaryPath.

Wrap Oshell to cluster jobs

User can also wrap Oshell jobs in qsub script, such as the one below for SGE. It gives users greater controls on job submission since the default job scheduler using SetEnvironment has limited options. Users do not have to SetEnvironment in Oscript using this method.

#!/bin/bash
#
# SGE submission options
#$ -q all.q                   # Select the queue
#$ -o /home/ge/job.o
#$ -e /home/ge/job.e
#$ -N test                    # A name for the job
#$ -pe smp 1                  # Select the parallel environment
 
# Run Oshell projects
MONO=/App/mono-2.10.9/mono
OSHELL=/App/omicsoft/Oshell/oshell.exe
BASEDIR=/App/omicsoft
TMP=/scratch
OSCRIPT=/App/Oscirpt/runpipeline.oscript
LOG=/App/Oscirpt/runpipeline.log
"$MONO" "$OSHELL" --runscript "$BASEDIR" "$OSCRIPT" "$TMP" "$MONO" > "$LOG"

Oshell subcommand

In the previous version, Oshell provides individual subcommand to run each function, such as

  • oshell.exe --buildref to build reference
  • oshell.exe --buildgm to build gene model
  • oshell.exe --alignrna to do RNA-Seq alignment
  • oshell.exe --semap to do fusion alignment
  • For more, please read Oshell subcommand

We have completely migrated the Oshell to work in environment setting as described in this article. The development of these subcommands has been discontinued. We only support these subcommands through the end of year 2013.

License

Commercial users: please contact OmicSoft to get a license.

Publication

RNA-Seq Analysis Pipeline Based on Oshell Environment

Citation

@null{6808521, 
author={Li, J. and Hu, J. and Newman, M. and Liu, K. and Ge, H.}, 
journal={Computational Biology and Bioinformatics, IEEE/ACM Transactions on}, 
title={RNA-Seq Analysis Pipeline Based on Oshell Environment}, 
year={2014}, 
month={}, 
volume={PP}, 
number={99}, 
pages={1-1}, 
doi={10.1109/TCBB.2014.2321156}, 
ISSN={1545-5963},}