FusionMap

From Array Suite Wiki

Jump to: navigation, search

Contents

Overview

FusionMap is an efficient fusion aligner that aligns reads spanning fusion junctions directly to the genome without prior knowledge of potential fusion regions. It detects and characterizes fusion junctions at base-pair resolution. FusionMap can be applied to detect fusion junctions in both single- and paired-end datasets from either gDNA-Seq or RNA-Seq studies.

Stop reading this article if you have an ArrayStudio License; you can run fusion detection easily through its GUI. Please read ArrayStudio guide and Oshell guide.

Also Check FusionMap Change Log.

Installation

FusionMap runs under both Windows and Linux (requiring MONO) environments. Although it can run on 32 bit machine, it is recommended to run on 64-bit machine with 12GB RAM or more.

Install FusionMap on Windows

FusionMap is coded in C# and Windows .Net is its native running platform. Users can install FusionMap easily. Simply download the zipped file:

FusionMap

Once unzipped, you can run FusionMap directly in CMD:

C:\App\FusionMap.exe

Install FusionMap on Linux

Install Mono

We recommended to use mono 2.10.9 for NGS alignment in Linux.

  • Download Mono 2.10.9
    cd /opt
    wget -c  http://origin-download.mono-project.com/sources/mono/mono-2.10.9.tar.bz2

The bz2 file can be saved to a temporary location, e.g. ~/temp/

  • Extract and modify certificate if necessary
    tar  jxvf   mono-2.10.9.tar.bz2
    cd  /opt/mono-2.10.9

For mono 2.10.9, it is recommended to modify X509Certificate to the latest standard.

  • Compile and install. On the command line, type
    cd  /opt/mono-2.10.9
    ./configure  --prefix=/opt/mono-2.10.9 --with-large-heap=yes
    make
    make  install

Note:

  • The location of mono installed is set by the option "--prefix" in the configure step, which can be changed to another location.
  • The option --with-large-heap=yes is to enable support for GC heaps larger than 3gb, which is required for NGS alignment, as well as some Array Server functions)

Double check mono installation and version

ls /opt/mono-2.10.9/bin/mono* -all
/opt/mono-2.10.9/bin/mono --version
/opt/mono-2.10.9/bin/mono-sgen --version
[back to top]


FusionMap in Linux

Assume mono (version 2.10.9) has been installed in /opt/mono-2.10.9/.

download the zipped file:

FusionMap

Once unzipped, you can run FusionMap using mono, such as:

/opt/mono-2.10.9/bin/mono /bin/FusionMap.exe

Getting started

FusionMap is designed to detect and align fusion junction-spanning reads to the genome directly. It can be applied to both paired-end and single-end NGS dataset starting from raw reads files or aligned BAM files (BAM files that keep unmapped reads).

Command to execute FusionMap

FusionMap.exe --semap FusionMap_Base_Dir ref_lib_name gene_model_name control_file_name

Example in Windows:

FusionMap.exe --semap C:\FusionMap Human.B37.3 RefGene C:\test\secontrol.txt > C:\test\log.txt

Example In Linux:

/bin/mono FusionMap.exe --semap /App/FusionMap Human.B37.3 RefGene /tmp/secontrol.txt > /tmp/log.txt

To save time, we only illustrate all following examples using Linux

Genome reference library and gene model

FusionMap requires the user to have a reference genome and a gene model built prior to running the actual fusion alignment. The indexing needs to be generated only once for each reference. By default, when it is the first time to run FusionMap using certain reference and gene model, the program will automatically download a compiled genome and gene model.

User has to specifies the right name for the reference genome and gene model. See the complete list of compiled genome and gene model from us.

For example, if we run FusionMap detection with Human.B37.3 (latest genome build 37.3) and RefGene model using the following command:

FusionMap.exe --semap /pathTo/FusionMap_Base_Dir Human.B37.3 RefGene /test/secontrol.txt > /test/run.log

FusionMap will download the Human.B37.3 and RefGene model in your local folder. You will find two folders under the FusionMap_Base_Dir:

-Temp
----sdf231g3654a23sd1f6 (randomly named folder to store temp files)
-ReferenceLibrary
---- Human.B37.3.dreflib1
---- Human.B37.3.gindex1
---- Human.B37.3_GeneModels
---- ---- RefGene.gmodel2
-Fusion
---- *BlackList*.txt

It only takes a few minute to download and build an index into your local cache.

Note, we found some users are having problem letting the software download these files directly due to their proxy. User can download all required files directly from web and put them in the correct folder structure above. When download *.gmodel2 file, user should make sure to remove the genome name (such as rename the Human.B37.3_RefGene.gmodel2 filename to RefGene.gmodel2)

User can also choose to build their own reference library by following commands:

  • Building a Reference Library
FusionMap.exe --buildref FusionMap_Base_Dir fasta_file_name ref_lib_name
  • Building a Gene Model
FusionMap.exe --buildgm FusionMap_Base_Dir gtf_file_name ref_lib_name gene_model_name

Control file example

All example control files are avaible in the downloaded zip file, in TestDataset folder.

For raw reads file

<Files>
/IData/App/FusionMap/FusionMap_2015-03-31/TestDataset/input/DatasetP2_SimulatedReads_1.fastq.gz
/IData/App/FusionMap/FusionMap_2015-03-31/TestDataset/input/DatasetP2_SimulatedReads_2.fastq.gz

<Options>
//MonoPath option is required when path to mono are not in PATH and job cannot start for spawn off jobs
MonoPath=/IData/App/mono/mono-2.10.9/bin/mono
PairedEnd=True			//Automatically pair two fastq files as one sample to run fusion analysis
RnaMode=True			//Detect fusion results 
ThreadNumber=8			//Possible values: 1-100. Default value=1
FileFormat=FASTQ		//Possible values: FASTQ, QSEQ, FASTA, BAM. Default value=FASTQ
CompressionMethod=Gzip		//Gzip formatted input files
Gzip=True			//Gzip
OutputFusionReads=True		//Out put Fusion reads as BAM files for genome browser. Default value=False

<Output>
TempPath=/IData/temp/FusionMapTemp
OutputPath=/IData/App/FusionMap/FusionMap_2015-03-31/TestDataset/output
OutputName=01_TestDataset_InputFastq

MonoPath option is not required in Windows since Mono is required to run FusionMap.

For BAM file

<Files>
/IData/App/FusionMap/FusionMap_2015-03-31/TestDataset/input/DatasetP2_SimulatedReads.bam

<Options>
//MonoPath option is required when path to mono are not in PATH and job cannot start for spawn off jobs
MonoPath=/IData/App/mono/mono-2.10.9/bin/mono
RnaMode=True			//Detect fusion results 
ThreadNumber=8			//Possible values: 1-100. Default value=1
FileFormat=BAM			//Possible values: BAM or SAM
OutputFusionReads=True		//Out put Fusion reads as BAM files for genome browser. Default value=False

<Output>
TempPath=/IData/temp/FusionMapTemp
OutputPath=/IData/App/FusionMap/FusionMap_2015-03-31/TestDataset/output
OutputName=04_TestDataset_InputBAM

With all available options

<Files>
/IData/App/FusionMap/FusionMap_2015-03-31/TestDataset/input/DatasetP2_SimulatedReads_1.fastq.gz
/IData/App/FusionMap/FusionMap_2015-03-31/TestDataset/input/DatasetP2_SimulatedReads_2.fastq.gz

<Options>
//MonoPath option is required when path to mono are not in PATH and job cannot start for spawn off jobs
MonoPath=/IData/App/mono/mono-2.10.9/bin/mono
PairedEnd=True			//Automatically pair two fastq files as one sample to run fusion analysis
RnaMode=True			//Detect fusion results 
ThreadNumber=8			//Possible values: 1-100. Default value]]=1
FileFormat=FASTQ		//Possible values: FASTQ, QSEQ, FASTA. Default value]]=FASTQ
CompressionMethod=Gzip		//Gzip formatted input files
Gzip=True			//Gzip
QualityEncoding=Automatic	//Auto detect quality coding in the fastq file or specify with Illumina or Sanger
AutoPenalty=True		//Set alignment penalty cutoff to automatic based on read length: Max (2,(read length-31)/15)
FixedPenalty=2			//If AutoPenalty=False, Fixed Penalty will be used
FilterUnlikelyFusionReads=True	//Enable filtering step
FullLengthPenaltyProportion=8	//Filtering normal reads allowing 8% of alignment mismatches of the reads
MinimalFusionAlignmentLength=0	//Default (alpha in the paper) value=0 and the program will automatically set minimal Seed Read end length to Min(20, Max(17,floor(ReadLength/3))). The program will use the specified value if user sets any > 0.
FusionReportCutoff=1		//# of allowed multiple hits of read ends; Possible values: 1-5. Default value=1 (beta in paper); 
NonCanonicalSpliceJunctionPenalty=4 //Possible values: 0-10. Default value= 2 (G); 
MinimalHit=4			//Minimal distinct fusion read; Possible values: 1-10000, Default value=2 
MinimalRescuedReadNumber=1	//Minimal rescued read number. Default value= 1
MinimalFusionSpan=5000		//Minimal distance (bp) between two fusion breakpoints
RealignToGenome=True		//If True, seed read ends are re-aligned to genome to see if it is <= FusionReportCutoff in RNA-Seq.
OutputFusionReads=True		//Out put Fusion reads as BAM files for genome browser. Default value=True

<Output>
TempPath=/IData/temp/FusionMapTemp
OutputPath=/IData/App/FusionMap/FusionMap_2015-03-31/TestDataset/output
OutputName=01_TestDataset_InputFastq

Notice for old users

Black list

In older versions, fusion genes in black list will be removed in the final report using the FilterBy option. In the latest version, we do not remove them. Fusion report contains a Filter column label fusion candidates using our accumulated black list on gene and gene pair level. The possible value could be empty, InBlackList, InFamilyList, InParalogueList or SameEnsemblGene. It is up to user to filter them.

FusionMap results

FusionMap generates a fusion report and a BAM files of aligned fusion reads for genome browser.

Fusion report table

Example fusion report can be found in the TestDataset/output folder in the downloaded zip file. Here is the description of each column of fusion report.

FusionMap reports as many fusion candidates as possible and provides multiple features for users to filter out false positives. To reduce false positives for REAL data, the recommended filtering sets are

To be more stringent, user can further filter using

Please be aware of these filtering columns when you are bechmarking FusionMap to other fusion detection algorithms.

Fusion reads in BAM files

Each fusion read will be cut into two ends which aligned to different locations. In the BAM/SAM output, we use two entries (in paired-end fashion) to describe the alignment of each fusion reads. Each line also contain the tag of fusion read type and fusion junction ID. See description of each element in the FusionMap SAM output

Example of a fusion read in the BAM/SAM file:

R_1782:2	67	1	40776789	255	30M45S	X	65824263	0	
ACCCAGAATCCCGCGTTTGCCCGCATGCCCATTGAACCTCAGAGGTGGGGGTCTGCTTCGTGCACGGGATGCACT
283348985863448157653149274159715381432428925324118868345815774332128375427	
ZF:Z:FUS_40776817_2946874630(++)	ZT:Z:Seed

R_1782:2	131	X	65824263	255	30S45M	1	40776789	0	
ACCCAGAATCCCGCGTTTGCCCGCATGCCCATTGAACCTCAGAGGTGGGGGTCTGCTTCGTGCACGGGATGCACT
283348985863448157653149274159715381432428925324118868345815774332128375427	
ZF:Z:FUS_40776817_2946874630(++)	ZT:Z:Seed

Read "R_1782:2" has been cut into two partial reads, one with 30nt and the other with 45nt. The first line represents the alignment of 30nt end using CIGAR 30M45S; the second line represents the alignment of 45nt end using CIGAR 30S45M .

Fusion PE Utility

FusionMap is developed to detect and align fusion junction-spanning reads to the genome directly. It also provides an utility to extracted possible fusion based on discordant read pairs.

In paired-end NGS dataset, a discordant read pair is one that is not aligned to the reference genome with the expected distance and orientation. If a set of discordant read pairs are mapped to two different genes, a fusion gene is suggested.

It is a simple module taking a set of BAM or SAM aligned files, and detects potential fusions based discordant read pairs, by the following command:

FusionMap.exe --pereport FusionMap_Base_Dir ref_lib_name gene_model_name control_file_name > pathTolog.txt

Here is one example control file:

<Files>
/IData/App/FusionMap/FusionMap_2015-03-31/TestDataset/input/DatasetP2_SimulatedReads.bam 

<Options>
//MonoPath option is required when path to mono are not in PATH and job cannot start for spawn off jobs
MonoPath=/IData/App/mono/mono-2.10.9/bin/mono
FileFormat=BAM		// possible values: SAM, BAM. Default value=SAM 
RnaMode=True		//Possible values: True, False. Default value=True 
MinimalHit=2		//Possible values: 1-5, Default value =2, minimal read pairs
ReportUnannotatedFusion=False //Possible values: True, False. Default value = False
FusionReportCutoff=1	//Possible values: 1-1000. Default value=1, require unique mapping of each read
OutputFusionReads=True	//Possible values: True, False. Default value = True 

<Output>
TempPath=/IData/temp/FusionMapTemp
OutputPath=/IData/App/FusionMap/FusionMap_2015-03-31/TestDataset/output
OutputName=PEFusion


Example fusion report can be found in the TestDataset/output folder in the downloaded zip file. Here is the description of each column of fusion report.

Change log

Check FusionMap Change Log.

License

Commercial users: please contact OmicSoft to get a license.

Citation

Please cite FusionMap as:

FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution
Huanying Ge; Kejun Liu; Todd Juan; Fang Fang; Matthew Newman; Wolfgang Hoeck
Bioinformatics (2011) 27 (14): 1922-1928. doi: 10.1093/bioinformatics/btr310

BibTex format

@article{ge2011fusionmap,
title={FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution},
author={Ge, H. and Liu, K. and Juan, T. and Fang, F. and Newman, M. and Hoeck, W.},
journal={Bioinformatics},
year={2011},
volume={27},
number={14},
pages={1922-1928},
publisher={Oxford Univ Press}
}

Contact

Please email to fusionmapauthors AT gmail DOT com to report any issue with FusionMap.

Also Read