Gene sets curated by OmicSoft from Land

From Array Suite Wiki

Jump to: navigation, search

Contents

Gene sets from OmicSoft Lands

Overview

The Gene Set Analysis function allows you to compare your lists of significant genes (up- and down-regulated) to databases of curated Gene Sets from multiple sources, including curated datasets from Land collections, signatures derived from The Broad Institute's Molecular Signatures collection, and internal Gene Set databases.

This page describes how Land-derived Gene Sets are generated for each project, tissue, etc.

[back to top]

Gene set databases based on comparisons

Some Gene Set databases are based on comparisons, including those from DiseaseLand, OncoGeo, etc.

The below criteria are applied to the inference result for each comparison:

  1. p-value < 0.05
  2. Linear Fold change > 1.25
  3. Maximum gene number = 2000

If the comparison has p-values, genes are sorted by p-values, and then the top 2000 genes are picked.

If the comparison has no p-values, genes are sorted by the absolute value of fold change, then the top 2000 genes are picked.

[back to top]


Gene set databases based on group-specific expression or CNV

Some gene set databases are based on group-specific expression or CNV, including:

  1. Cancer-specific copy number in TCGA (OncoLand)
  2. Cancer-specific gene expression in TCGA (OncoLand)
  3. Cell type-specific gene expression in Blueprint (Reference)
  4. Tissue-specific gene expression in GTEx (Reference)
  5. Tissue/Cell/BiologicalGroup-specific gene expression in SingleCellLand (SingleCellLand)
  6. etc...

Gene set databases are generated as follows:

  1. For each gene, global mean and group mean are calculated (all values are in log2 scale). Global mean is the mean of a gene in all samples (e.g. all non-normal samples in TCGA). Group mean is the mean of a gene in a specific group (e.g. BLCA for BLCA specific gene expression).
  2. For each group, fold-change of a gene is defined as "group mean - global mean" (all values are in log2 scale).
  3. Genes are sorted by the absolute value of fold change. The top 2000 genes are picked.
  4. Specifically for CNV, a minimum value cutoff (0.01) is applied.
  5. Note: in the gene set analysis report, median fold change is already transformed into normal scale (not in log2 scale).

Related Articles

EnvelopeLarge2.png

[back to top]