Introduction to GTEx Land Content
From Array Suite Wiki
The Genotype-Tissue Expression project (GTEx) aims to create a comprehensive public atlas of gene expression and regulation across multiple human tissues. GTEx project can help to understand the correlation between tissue-specific gene expression and human diseases. According to GTEx Portal, “GTEx will help researchers to understand inherited susceptibility to disease and will be a resource database and tissue bank for many studies in the future.” It contains RNA-Seq and Affymetrix expression data for all normal tissues. It provides high quality normal control samples to benchmark researchers’ patient or drug response sample data.
It can be used in conjunction with other Lands (like TCGA_B37, for instance) to create virtual Lands, and allows comparisons across datasets as we use controlled vocabularies and we process our expression and RNA-Seq data with our standard pipelines.
- GTEx_B38_GC33: GTEx-v8 data, aligned to Human.B38 and OmicsoftGenCode.V33. This is the latest version of the Land and will continue to be updated with new content and metadata.
- GTEx_B37: GTEx-v8 data, aligned to Human.B37.3 and OmicsoftGene20130723.
- GTEx_B38: GTEx-v6 data, aligned to Human.B38 and OmicsoftGenCode.V24. This Land is superceded by GTEx_B38_GC33 and will not be updated.
GTEx Portal GTEx v8.
- 819 samples with Affymetrix Expression data (HuGene-1_1-st-v1)
- 16963 samples with RNA-Seq data; based on SRA files
- 418 samples in GTEx, flagged as poor quality or from cell lines, are excluded from this Land.
- Affymetrix Expression Array
- Illumina TrueSeq RNA sequencing
- Expression Data: Omicsoft Affymetrix Microarray Preprocessing
- Virus data: View viral sequence counts in Land RNA-seq Data
- 16S Microbial data: Bacterial counts from 16S rRNA
- HLA (Class I) identification using the RnaSeq aligned reads. GTEx has classified this information as restricted access. The HLA OptiType program aligns RNA-seq reads to the HLA Reference genome, and then performs an optimization to determine the most likely HLA Class I allele. See OptiType - precision HLA typing from next-generation sequencing data.pdf for a description of the algorithm.
52 statistical comparisons were performed, corresponding to the 52 sub-tissues described in TissueDetail_GTEx, using DEseq2 v1.30. For each comparison all samples in the Case group (one TissueDetail, e.g. Liver) was compared to an aggregated control group comprised of carefully-selected samples representing each other TissueDetail.
The control samples were chosen by:
- Calculating the mean expression per gene within a TissueDetail_GTEx category
- Creating groups of 8 (or 2 in case of Tissue_GTEx: Brain) and comparing the mean per gene expression of this subgroup to the mean per gene expression of all the samples
- Selecting the most representative group, defined as cosine similarity method between the profiles of expression seen in the subgroup and the entire tissue
- Sampling brain tissue: Because of the large number of TissueDetail_GTEx samples from different brain regions, only two samples from each TissueDetail was selected
Key Meta Data Columns
- Tissue: Tissues such as brain, blood, heart, lung, kidney etc., using OmicSoft controlled vocabulary
- Tissue_GTEx: Tissues, using GTEx controlled vocabulary
Tissue Detail Type: Sub-category within a tissue, such as Brain - Amygdala, Brain - Cortex, Brain - Hippocampus, Brain - Spinal cord (cervical c-1) etc., using GTEx terminology
Land Sample Type: is curated by OmicSoft Land cuRation team using Omicsoft's control vocabularies. Allow users to easily merge the data with other Lands.
Tumor or Normal: indicates whether a sample is from tumor sample for normal sample. All GTEx data are normal samples.
GTEx data are tissue-specific data. One of the most common way to visualize the data is to group the data by Tissue:
If the user is interested in more detailed information with in a tissue type, the data can be filtered for one or a few tissue types, and then grouped by Tissue Detail Type: