TPM and FPKM
From Array Suite Wiki
RSEM software from published paper can only quantify gene expression based on transcriptome mapped reads. Our reimplementation is to allow quantification using genome mapped BAM files directly. It is described in more details in our published Oshell paper or read wiki page: Omicsoft RPKM/FPKM/Count values.
RSEM is a way to calculate TPM, and RPKM is linear to TPM for any given sample . They all have transcript length in the denominator. TPM is really just RPKM scaled by a constant to make sure the sum of all values is 1 million.
In our OmicSoft Land results, we scaled the FPKM/RPKM one more time, using this Land Normalization logic; empirically, this additional normalization allows for better cross-sample and cross-project comparison of gene expression.
Certain OmicLands support "on-the-fly" TPM scaling of displayed FPKM data.
If user does want to have TPM values, it can be computed based on the fact that:
- RSEM estimated theta θ value from EM algorithm. θ represents relative expression level in a measurement called “the probability of nucleotides”. θi is the probability of mapped read nucleotide belong to isoform i.
- RPKM = (1,000,000*1,000* θi*TotalNumberOfMappedReads) / (ℓi * TotalNumberOfMappedReads)=(1,000,000,000* θi)/ℓi, where ℓi is the length, in nucleotides, of isoform i.
- TPM (transcript per million) = 1,000,000*θi/(ℓi*c), where c is a constant value, sum_[j](θj/ℓj), sum_[i]TPM=1,000,000
- TotalNumberOfMappedReads are only sum of reads mapped to exon or exon junction region on the chromosome. It is not the total number of alignments in BAM file nor total number of aligned reads in the alignment report.
- The ratio of RPKM/TPM is c*1,000, a constant, for transcript i.
- If sum_[i](RPKM_[i])=Z, because sum_[i](TPM_[i])=1,000,000, then c*1000=Z/1,000,000