Introduction to GTEx Land Content

From Array Suite Wiki

Jump to: navigation, search


GTEx Lands

The Genotype-Tissue Expression project (GTEx) aims to create a comprehensive public atlas of gene expression and regulation across multiple human tissues. GTEx project can help to understand the correlation between tissue-specific gene expression and human diseases. According to GTEx Portal, “GTEx will help researchers to understand inherited susceptibility to disease and will be a resource database and tissue bank for many studies in the future.” It contains RNA-Seq and Affymetrix expression data for all normal tissues. It provides high quality normal control samples to benchmark researchers’ patient or drug response sample data.

It can be used in conjunction with other Lands (like TCGA_B37, for instance) to create virtual Lands, and allows comparisons across datasets as we use controlled vocabularies and we process our expression and RNA-Seq data with our standard pipelines.

Land Versions

  • GTEx_B38_GC33: GTEx-v8 data, aligned to Human.B38 and OmicsoftGenCode.V33. This is the latest version of the Land and will continue to be updated with new content and metadata.
  • GTEx_B37: GTEx-v8 data, aligned to Human.B37.3 and OmicsoftGene20130723.
  • GTEx_B38: GTEx-v6 data, aligned to Human.B38 and OmicsoftGenCode.V24. This Land is superceded by GTEx_B38_GC33 and will not be updated.

Data Source

GTEx Portal GTEx v8.

Data Types

  • 819 samples with Affymetrix Expression data (HuGene-1_1-st-v1)
  • 16963 samples with RNA-Seq data; based on SRA files
  • 418 samples in GTEx, flagged as poor quality or from cell lines, are excluded from this Land.

Laboratory Methods

  • Affymetrix Expression Array
  • Illumina TrueSeq RNA sequencing

Processing Methods

RNA-Seq data:

  • HLA (Class I) identification using the RnaSeq aligned reads. GTEx has classified this information as restricted access. The HLA OptiType program aligns RNA-seq reads to the HLA Reference genome, and then performs an optimization to determine the most likely HLA Class I allele. See OptiType - precision HLA typing from next-generation sequencing data.pdf for a description of the algorithm.

Statistical analyses

52 statistical comparisons were performed, corresponding to the 52 sub-tissues described in TissueDetail_GTEx, using DEseq2 v1.30. For each comparison all samples in the Case group (one TissueDetail, e.g. Liver) was compared to an aggregated control group comprised of carefully-selected samples representing each other TissueDetail.

The control samples were chosen by:

  • Calculating the mean expression per gene within a TissueDetail_GTEx category
  • Creating groups of 8 (or 2 in case of Tissue_GTEx: Brain) and comparing the mean per gene expression of this subgroup to the mean per gene expression of all the samples
  • Selecting the most representative group, defined as cosine similarity method between the profiles of expression seen in the subgroup and the entire tissue
    • Sampling brain tissue: Because of the large number of TissueDetail_GTEx samples from different brain regions, only two samples from each TissueDetail was selected

Key Meta Data Columns

  • Tissue: Tissues such as brain, blood, heart, lung, kidney etc., using OmicSoft controlled vocabulary
  • Tissue_GTEx: Tissues, using GTEx controlled vocabulary

Tissue Detail Type: Sub-category within a tissue, such as Brain - Amygdala, Brain - Cortex, Brain - Hippocampus, Brain - Spinal cord (cervical c-1) etc., using GTEx terminology

Land Sample Type: is curated by OmicSoft Land cuRation team using Omicsoft's control vocabularies. Allow users to easily merge the data with other Lands.

Tumor or Normal: indicates whether a sample is from tumor sample for normal sample. All GTEx data are normal samples.

Key Views

GTEx data are tissue-specific data. One of the most common way to visualize the data is to group the data by Tissue:

GTEx GeneFPKMforEGFR.png

If the user is interested in more detailed information with in a tissue type, the data can be filtered for one or a few tissue types, and then grouped by Tissue Detail Type:

GTEx GeneFPKMforEGFRbyTissueDetail.png

[back to top]

Related Articles