OSA

From Array Suite Wiki

Jump to: navigation, search

Warning.png WARNING: OSA is not actively maintained after 2015. OSA is now part of Oshell toolkit which is actively maintained. Please read Oshell for RNA-Seq Alignment.


Contents

What is OSA

OSA (Omicsoft Sequence Aligner) is a super-fast and accurate alignment tool for RNA-Seq data. Benchmarked with existing methods, OSA improves mapping speed 4-10 fold with better sensitivity and fewer false positives.

OSA is now part of Omicsoft Oshell Environment. If you have an ArrayStudio License, you can run RNA-Seq alignment easily through its GUI or use oshell scripts. Please read ArrayStudio guide and Oshell guide instead. This tutorial mainly focuses on the free version of OSA, although most of the options and implementations are the same as the commercial version.

System Requirements

OSA is coded in C#. It runs under both Windows and Linux (requiring MONO) platforms and in both 32- and 64-bit modes. 64-bit mode performs much faster but it requires 6GB RAM or more.

For Windows machine

OSA runs under Windows native .Net environment. Simply launch the executable under the command line:

C:\test\OSA\bin>osa.exe

For Linux machine

OSA requires MONO to run on Linux. MONO Version 2.10.8 or higher is required. MONO should be compiled with the "with-large-heap=YES" compilation option, or OSA will not work.

Configure and install mono like below:

./configure --with-large-heap=yes
make
make install (use sudo make install if permission denied)

Run OSA under command line:

machine$ mono osa.exe

For Mac OS X

It is also working in Mac OS X with the same configuration with Mono.

Getting Started

Download the latest version of OSA (v4.1.1.1) from the link below

OSA

v4.1.1.1 fixes some bugs from v4.1.0.1 and improves memory usage.

Building a reference library and gene model

OSA requires the user to have a reference genome and a gene model built prior to running the actual fusion alignment. The indexing needs to be generated only once for each reference. By default, when it is the first time to run OSA using certain reference and gene model, the program will automatically download a compiled genome and gene model.

User has to specifies the right name for the reference genome and gene model. See the complete list of compiled genome and gene model from us.

For example, if we run OSA detection with Human.B37.3 (latest genome build 37.3) and RefGene model using the following command:

osa.exe --alignrna C:\test\OSA\Base_Dir Human.B37.3 RefGene C:\test\OSA\TestData1\secontrol_win.ini > C:\test\OSA\TestData1\secontrol_win.log

OSA will download the Human.B37.3 and RefGene model in your local folder. You will find two folders under the Base_Dir: -Temp

----sdf231g3654a23sd1f6 (randomly named folder to store temp files)
-ReferenceLibrary
---- Human.B37.3.dreflib1
---- Human.B37.3.gindex1
---- Human.B37.3_GeneModels
---- ---- RefGene.gmodel2

It only takes a few minute to download and build an index into your local cache.

Note, we found some users are having problem letting the software download these files directly due to their proxy. User can download all required files directly from web and put them in the correct folder structure above. When download *.gmodel2 file, user should make sure to remove the genome name (such as rename the Human.B37.3_RefGene.gmodel2 filename to RefGene.gmodel2)

User can also choose to build their own reference library by following commands:

  • Building a Reference Library
osa.exe --buildref Base_Dir fasta_file_name ref_lib_name
  • Building a Gene Model
osa.exe --buildgm Base_Dir gtf_file_name ref_lib_name gene_model_name

RNA-Seq alignment

OSA is super fast alignment tool. It can align RNA-Seq reads to genome with or without the help of existing gene model

Command to run OSA

mono osa.exe --alignrna Base_Dir ref_lib_name gene_model_name control_file_name

When gene_model_name=none, OSA will align reads to genome without the help of gene model (GTF/GFF).

Example of alignment with gene model

mono osa.exe --alignrna /test/OSA/Base_Dir Human.B37 RefGene /test/OSA/TestData1/secontrol_linux.ini > /test/OSA/TestData1/secontrol_linux.log

Example of alignment without gene model

mono osa.exe --alignrna /test/OSA/Base_Dir Human.B37 none /test/OSA/TestData1/secontrol_linux.ini > /test/OSA/TestData1/secontrol_linux_nogm.log

Alignment options

Besides the options for reference and gene model, all the other parameters are specified in a control file:

<Files>
/home/omicsoft-root/Illumina.Paired.1.fastq
/home/omicsoft-root/Illumina.Paired.2.fastq

<Options>
PairedEnd=True // Possible values: True, False. Default value=False
FileFormat=FASTQ // Possible values: FASTQ, FASTA, QSEQ. Default value=FASTQ
AutoPenalty=True // Possible values: True, False. Default value=True
FixedPenalty=2 // Possible values: 0-100. Default value=2
Greedy=False // Possible values: True, False. Default value=False
Use32BitMode=False // Possible values: True, False. Default value = True for 64-bit OS and False for 32-bit OS
ExcludeNonUniqueMapping=False // Possible values: True, False. Default value=False
ReportCutoff=10 //Default value=10
WriteReadsInSeparateFiles=True // Possible values: True, False. Default value=True
GenerateSamFiles=False // Possible values: True, False. Default value=False
ThreadNumber=4 //Possible values: 1-100. Default depends on machine
GenerateAlignmentSummary=True // Possible values: True, False. Default value=True
TrimByQuality=True // Possible values: True, False. Default value=True
ReadTrimSize=1024 //Default value=1024
ReadTrimQuality=2 //Possible values: 0-66.Default value=2
InsertSizeStandardDeviation=40 //Default value=40
ExpectedInsertSize=300 //Default value=300
InsertOnSameStrand=False // Possible values: True, False. Default value=False
InsertOnDifferentStrand=True // Possible values: True, False. Default value=True
QualityEncoding=Automatic // Possible values: Automatic, Illumina, Sanger. Default value=Automatic
Gzip=False // Possible values: True, False. Default value=False
ExpressionMeasurement=None // possible values: None, TPM, RPKM, TPM_Transcript, RPKM_Transcript. Default value =None
SearchNovelExonJunction=False // Possible values: True, False. Default value=False

<Output>
OutputName=alignRNA
OutputPath=/home/omicsoft-root/Output/alignRNA_Command

License

Commercial users: please contact OmicSoft to get a license.

Reference