OSA
From Array Suite Wiki
WARNING: OSA is not actively maintained after 2015. OSA is now part of Oshell toolkit which is actively maintained. Please read Oshell for RNA-Seq Alignment.
Contents |
What is OSA
OSA (Omicsoft Sequence Aligner) is a super-fast and accurate alignment tool for RNA-Seq data. Benchmarked with existing methods, OSA improves mapping speed 4-10 fold with better sensitivity and fewer false positives.
OSA is now part of Omicsoft Oshell Environment. If you have an ArrayStudio License, you can run RNA-Seq alignment easily through its GUI or use oshell scripts. Please read ArrayStudio guide and Oshell guide instead. This tutorial mainly focuses on the free version of OSA, although most of the options and implementations are the same as the commercial version.
System Requirements
OSA is coded in C#. It runs under both Windows and Linux (requiring MONO) platforms and in both 32- and 64-bit modes. 64-bit mode performs much faster but it requires 6GB RAM or more.
For Windows machine
OSA runs under Windows native .Net environment. Simply launch the executable under the command line:
C:\test\OSA\bin>osa.exe
For Linux machine
OSA requires MONO to run on Linux. MONO Version 2.10.8 or higher is required. MONO should be compiled with the "with-large-heap=YES" compilation option, or OSA will not work.
Configure and install mono like below:
./configure --with-large-heap=yes make make install (use sudo make install if permission denied)
Run OSA under command line:
machine$ mono osa.exe
For Mac OS X
It is also working in Mac OS X with the same configuration with Mono.
Getting Started
Download the latest version of OSA (v4.1.1.1) from the link below
v4.1.1.1 fixes some bugs from v4.1.0.1 and improves memory usage.
Building a reference library and gene model
OSA requires the user to have a reference genome and a gene model built prior to running the actual fusion alignment. The indexing needs to be generated only once for each reference. By default, when it is the first time to run OSA using certain reference and gene model, the program will automatically download a compiled genome and gene model.
User has to specifies the right name for the reference genome and gene model. See the complete list of compiled genome and gene model from us.
For example, if we run OSA detection with Human.B37.3 (latest genome build 37.3) and RefGene model using the following command:
osa.exe --alignrna C:\test\OSA\Base_Dir Human.B37.3 RefGene C:\test\OSA\TestData1\secontrol_win.ini > C:\test\OSA\TestData1\secontrol_win.log
OSA will download the Human.B37.3 and RefGene model in your local folder. You will find two folders under the Base_Dir: -Temp
----sdf231g3654a23sd1f6 (randomly named folder to store temp files) -ReferenceLibrary ---- Human.B37.3.dreflib1 ---- Human.B37.3.gindex1 ---- Human.B37.3_GeneModels ---- ---- RefGene.gmodel2
It only takes a few minute to download and build an index into your local cache.
Note, we found some users are having problem letting the software download these files directly due to their proxy. User can download all required files directly from web and put them in the correct folder structure above. When download *.gmodel2 file, user should make sure to remove the genome name (such as rename the Human.B37.3_RefGene.gmodel2
filename to RefGene.gmodel2
)
User can also choose to build their own reference library by following commands:
- Building a Reference Library
osa.exe --buildref Base_Dir fasta_file_name ref_lib_name
- Building a Gene Model
osa.exe --buildgm Base_Dir gtf_file_name ref_lib_name gene_model_name
RNA-Seq alignment
OSA is super fast alignment tool. It can align RNA-Seq reads to genome with or without the help of existing gene model
Command to run OSA
mono osa.exe --alignrna Base_Dir ref_lib_name gene_model_name control_file_name
When gene_model_name=none, OSA will align reads to genome without the help of gene model (GTF/GFF).
Example of alignment with gene model
mono osa.exe --alignrna /test/OSA/Base_Dir Human.B37 RefGene /test/OSA/TestData1/secontrol_linux.ini > /test/OSA/TestData1/secontrol_linux.log
Example of alignment without gene model
mono osa.exe --alignrna /test/OSA/Base_Dir Human.B37 none /test/OSA/TestData1/secontrol_linux.ini > /test/OSA/TestData1/secontrol_linux_nogm.log
Alignment options
Besides the options for reference and gene model, all the other parameters are specified in a control file:
<Files> /home/omicsoft-root/Illumina.Paired.1.fastq /home/omicsoft-root/Illumina.Paired.2.fastq <Options> PairedEnd=True // Possible values: True, False. Default value=False FileFormat=FASTQ // Possible values: FASTQ, FASTA, QSEQ. Default value=FASTQ AutoPenalty=True // Possible values: True, False. Default value=True FixedPenalty=2 // Possible values: 0-100. Default value=2 Greedy=False // Possible values: True, False. Default value=False Use32BitMode=False // Possible values: True, False. Default value = True for 64-bit OS and False for 32-bit OS ExcludeNonUniqueMapping=False // Possible values: True, False. Default value=False ReportCutoff=10 //Default value=10 WriteReadsInSeparateFiles=True // Possible values: True, False. Default value=True GenerateSamFiles=False // Possible values: True, False. Default value=False ThreadNumber=4 //Possible values: 1-100. Default depends on machine GenerateAlignmentSummary=True // Possible values: True, False. Default value=True TrimByQuality=True // Possible values: True, False. Default value=True ReadTrimSize=1024 //Default value=1024 ReadTrimQuality=2 //Possible values: 0-66.Default value=2 InsertSizeStandardDeviation=40 //Default value=40 ExpectedInsertSize=300 //Default value=300 InsertOnSameStrand=False // Possible values: True, False. Default value=False InsertOnDifferentStrand=True // Possible values: True, False. Default value=True QualityEncoding=Automatic // Possible values: Automatic, Illumina, Sanger. Default value=Automatic Gzip=False // Possible values: True, False. Default value=False ExpressionMeasurement=None // possible values: None, TPM, RPKM, TPM_Transcript, RPKM_Transcript. Default value =None SearchNovelExonJunction=False // Possible values: True, False. Default value=False <Output> OutputName=alignRNA OutputPath=/home/omicsoft-root/Output/alignRNA_Command
License
Commercial users: please contact OmicSoft to get a license.