Alternative splicing detection

From Array Suite Wiki

Jump to: navigation, search

Given the NGS data, there are many methods have been proposed to detect the Alternative splicing. Most of these methods are quite complicated, involving EM algorithm or Hidden Markov chain. So here we presents a simple yet intuitive way to detect the alternative splicing.

You can use the module Differentially Expressed Isoforms to further infer the alternative splicing.

Suppose you have RNAseq data for two groups tumor vs. normal. The basic algorithm behind the module is this: 1). select the effective samples from the tumor group according the pre-set parameters. The default setting for selecting the effective case sample is that those tumor samples have the transcript > 120% or <80% of that in average normal samples. 2). use the selected tumor samples to compare transcript expression with the normal samples. The statistical comparison would generate the raw P value.

Here is one example, in this example, there are 6 observations belonging to 2 groups: hESC (can be treated as normal) and N2 (can be treated as tumor).


Using NGS Report Gene Transcript Counts, we can get the Count or FPKM on the transcript level. For gene MYL6, it has 7 transcripts:


Then we compute the gene level expression and normalize the expression by taking the ratio of each transcript's expression and total expression.


According to this table, for transcript uc001sjx.2, only all the three observations in the N2 group has more than 20% difference between the average of the hESC group. Thus the effective sample size for uc001sjx.2 is 3. Then the algorithm would compare the ratio values between the effective N2 observations and the hESC observations by T test to get the raw p value.

For gene MYL6, according to the result table from Differentially Expressed Isoforms, we can find that both the transcripts uc001sjx.2 and uc001sjw.2 have significant p value. The gene expression ratio between the two group N2 and hESC is not big (only 1.38), this may indicate that for one group (N2), one transcript(uc001sjx.2) is dominating, and for the other group (hESC), the other transcript is dominating (uc001sjw.2)


Based on this idea, we can filter the potential genes with alternative splicing between two groups by setting: (In the picture below, N2 is the tumor group and hESC is the normal group)


Once you find the potential gene, it is always a good idea to double check with the genome browser. Firstly select the gene/transcripts you are interested in, then open it in the genome browser.


To have a better view in the genome browser, you can combine one group into one track to have a better comparison between groups.


By checking the genome browser, we can clear see that different expression level of the two groups in the two different transcripts.