CCLE GTEx TCGA Virtual Land

From Array Suite Wiki

Revision as of 15:36, 27 September 2020 by Joseph (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Build a Virtual Land combining GTEx, CCLE, and TCGA

GTEx, CCLE, and TCGA are three of the most popular Lands, because they provide access to thousands of normal tissue samples, over 1000 cell line datasets, and a premier oncology consortium. A Virtual Land combining these three Lands enables quick interrogation of expression, mutation, fusion, and other data across normal, tumor, and cell line data.

In the Sample Distribution View, you will see all available samples for CCLE (cancer cell lines), GTEx (normal tissue), and TCGA (tumor samples).

CCLE.GTEx.TCGA.Distr.png

Recommended Parameters

Description=TCGA + GTEx + CCLE Human B38 genome virtual land for cross land searching.

//Use TissueCategory to specify the default vertical grouping of samples
PrimaryGrouping=TissueCategory
PrimaryGroupingName=TissueCategory
//Secondary Grouping defines the subsetting of data across a Primary Grouping. Sample Type will be used as the "SampleTypeColumn" to pre-color. DiseaseCategory can also be useful here
SecondaryGrouping=Sample Type
SecondaryGroupingName=Sample Type
//Sample Type is a special column that automatically colors samples by Tumor or Normal status, especially useful for oncology databases. Normal Samples are defined by "ControlSampleLevels"
SampleTypeColumn=Sample Type
ControlSampleLevels=Blood Derived Normal,Bone Marrow Normal,Normal,Solid Tissue Normal,Cord blood,Venous blood,Control Analyte,Buccal Cell Normal,Other Normal,Cell Lines Normal,EBV Immortalized Normal

//For each Source Land, specify the mappings of the source columns to Primary and Secondary Grouping 
CCLE_B38.PrimaryGrouping=TissueCategory
//In CCLE, the best column to map to SampleType is "OncoSampleType"
CCLE_B38.SecondaryGrouping=OncoSampleType
CCLE_B38.TissueColumn=Tissue
//In addition to TissueCategory and Onco Sample Type, additional columns Tumor Or Normal, DiseaseState, and Tissue should be included from CCLE
CCLE_B38.VirtualColumns=Tumor Or Normal<-Tumor Or Normal,DiseaseState,Tissue
 
GTEx_B38.PrimaryGrouping=TissueCategory
//In GTEx, the best column to map to SampleType is "Land Sample Type". Notice the different source Land column naming does not matter since it is being remapped to SecondaryGrouping and we named SecondaryGroupingName to Sample Type
GTEx_B38.SecondaryGrouping=LandSampleType
GTEx_B38.TissueColumn=Tissue
//Notice the remapping of TumorOrNormal from GTEx to Tumor Or Normal to match the formatting of the other two Lands. In addition to TissueCategory and Onco Sample Type, additional columns Tumor Or Normal, DiseaseState, and Tissue should be included from CCLE
GTEx_B38.VirtualColumns=Tumor Or Normal<-TumorOrNormal,DiseaseState,Tissue

TCGA_B38.PrimaryGrouping=TissueCategory
TCGA_B38.SecondaryGrouping=Land Sample Type
TCGA_B38.TissueColumn=Tissue
//In addition to TissueCategory and Onco Sample Type, additional columns Tumor Or Normal, DiseaseState, and Tissue should be included from CCLE
TCGA_B38.VirtualColumns=Tumor Or Normal<-Tumor Or Normal,DiseaseState,Tissue

Virtual Land usecase

The key value of combining these Lands is to quickly check a gene's expression across cell lines in tissues of interest, in both tumor and normal samples.

For example, after searching for EGFR, you can switch to the Gene FPKM View, and filter for Tissue Categories of interest (e.g. respiratory system, central nervous system, and breast). Either view the data from all three Lands, or profile the columns by TissueCategory+SourceLand to clearly reveal the differences between normal EGFR expression and tumor EGFR expression, and to verify whether common cancer cell lines reflect tumor expression of the gene in your tissue of interest.

VirtualLand.GeneExpr.png