Fusion strand and strands of fusion partners
From Array Suite Wiki
ArrayStudio Map Fusion Reads function will detect and align fusion junction-spanning reads to the reference, which characterize fusion genes comprehensively at base-pair resolution. Based on the alignment, there are several filters to reduce false positives, including the strand filter. As you can find in fusion report, there are Strand, KnownTranscriptStrand1 and KnownTranscriptStrand2 columns. In fusion report, gene1/gene2, transcript1/transcript2, are assigned based on predict fusion orientation. Gene1->Gene2 is the predicted 5'->3' fusion direction in the FusionGene column.
In fusion report, there are only 8 valid types of fusion genes from RNA-Seq data. Suppose the fusion is joined by a canonical spice pattern GT-AG, here are 8 possible true fusions:
Due to the relative higher prevalence of read-through fusions in fusion detection, you will see higher number of fusions are belong to type (1) and (3).
Below are some logic used in fusion detection, you can read the following technical details if you are interested.
Strand indicates the strand of the fusion junction, with the value indicating the transcriptional strand of fusion in gene1 to gene2.
In RNA-Seq fusion detection, fusion strand filter is applied based on the principle that transcriptional orientation of fusion transcript before and after the fusion breakpoints should be the same as two gene partners. Based on the illustration above, here are the fusion report (FilterStatus=Filtered are wrong fusions and filtered out):
If fusion (3) is a true fusion, the 3' fusion sequence is the antisense sequence of gene B3. It is not a fusion between A and B3. It is same idea in filtering fusion (6).
Fusion Gene Prediction
Fusion gene 5'->3' prediction is based on the assumption that reads are generated following the transcription orientation in 5' gene and 3' gene, and direction of 5->3 gene in linking two read ends. It is the same logic as we designed the Strand filter above.
The SplicePattern sequence is get from two nucleotides after the mapping location of gene1 breakpoint and two nucleotides before the mapping location of gene2 breakpoint. Nucleotides are taken following the orientation specified in the Strand.
FusionID contains the fusion breakpoint1, breakpoint2 and Strand information. However, it is always based on Cumulative Position (CP). During Cumulative position calculation, chromosomes are sorted alphabetically and concatenated, such as 1, 10, 11, 12, ..., 18, 19, 2, 20, 21, 22, 3, 4, 5..., in human genome. If gene1's CP is smaller than gene2's, [[FusionID] is FUS_CP1_CP2_(Strand). If gene1's CP is larger than gene2's, [[FusionID] is FUS_CP1_CP2_(Strand) is FUS_CP2_CP1_(RevComp(Strand)). There is historical and also technical reason for this definition. The technical reason is to make the inference of Strand and SplicePattern always the same regardless of whether the read is from forward or reverse complement of genome/transcript.
The strand information in the FusionID may not the same as the values in Strand column. In newer version, we have tried to make gene1 as 5’ fusion gene and gene2 as 3’ fusion gene, we have swapped Strand if their orders are not the same as Cumulative Position (CP).
Since we always follow the same rule in determine strand, splice pattern and fusion gene orientation, we can infer the whole fusion annotation purely based FusionID.
Below is from a real fusion report