How Omicsoft calculated percentage values in alteration distribution plot in Land

From Array Suite Wiki

Jump to: navigation, search

In Land Alteration.Distribution Plot, the user may notice that the length of the bars may not match the percentage of altered samples in Alteration.Omicprint.

One example of TXK in TCGA land:

Alteration plot TXK.png

Alteration plot TXK2.png

This is because of the counting of the total sample number.

For the alteration distribution plot, we count the total sample as:

  • Count number of samples with DNA mutation data only: T1
  • Count number of samples with CNV data only: T2
  • Count number of samples with both DNA mutation and CNV data: T3
  • T1+T2+T3=total number of samples with DNA measurements

Then, we count the samples with alteration as

  • For Amp/Del status, count number of samples with Amp/Del status only: A/D
  • For mutation status, count number of samples with mutation only: M
  • For multiple alteration status, count number of samples with both mutation and Amp/Del status : MDA

We used the percentage values:

  • M/(T1+T3): percentages of mutation
  • A/(T2+T3): percentages of amplification
  • D/(T2+T3): percentages of deletion
  • MDA/T3 the percentages of multiple alteration

We adapted this approach to address the fact that not all samples have both CNV and mutation measured. As you can see that these bars are not using the same denominators. It is also possible that the sum of them can be larger than 1 if total number of samples with both DNA mutation and CNV are low. The ratio of MDA/T3 can inflate the percentage values. If all samples have both DNA mutation and CNV data, all denominators will be the same since T1=T2=0.

In the example in the screenshot above, for the alteration distribution plot, it is in the following way for each sample group:

  • Count number of samples with DNA mutation data only: T1=0
  • Count number of samples with CNV data only: T2=312
  • Count number of samples with both DNA mutation and CNV data: T3=178
  • T1+T2+T3=490: total number of samples with DNA measurements

In Alternation Distribution Plot:

  • For Amp/Del status, count number of samples with Amp/Del status only: A=20-1=19, D=3
  • For mutation status, count number of samples with mutation only: M=2-1=1
  • For multiple alteration status, count number of samples with both mutation and Amp/Del status : MDA=1
  • Total 24 samples with alteration.

Percentages:

  • Mutation: M/(T1+T3) = 1/178= 0.00561797752808989
  • Amplification: A/(T2+T3) =19/(312+178)= 0.0387755102040816
  • Homozygous Deletion: D/(T2+T3) =3/(312+178)= 0.00612244897959184
  • Multiple Alteration: MDA/T3 =1/178=0.00561797752808989
  • The total percentage is 5.6%

In Omicprint,

  • The percentage of altered sample in OmicPrint is 24/490=4.9%
  • The percentage of mutation sample is 2/178=1.1%
  • The percentage of Amp/Del sample is 23/490=4.7%

Thus, 1.1% and 4.7% do not sum up to 4.9%

Note, in both views, user can click "Open in Excel" or "Open in Editor" to get the underlying values for each view. In the Omicprint, you can get the sample level WT/MUT, Amp/Del status for each sample.