Introduction to DiseaseLand Content

From Array Suite Wiki

Jump to: navigation, search


Contents

DiseaseLand

DiseaseLand is an integrated disease genomics database and visualization software that helps users explore public and private genomics datasets using OmicSoft's Land technology. DiseaseLand provides a user-friendly interface to functional genomics data for thousands of normal and disease samples, accelerating discovery of new connections in disease research. OmicSoft officially upgraded ImmunoLand and CVMLand into a single DiseaseLand in 2016. DiseaseLand is accessible via tiered subscriptions; users can choose to subscribe to immunological diseases, metabolic & cardiovascular diseases, or both. DiseaseLand focuses on datasets including, but not limited to, immunological diseases, metabolic diseases and cardiovascular diseases. Inherited from ImmunoLand, DiseaseLand includes immune-related diseases such as Asthma/Respiratory Diseases, Arthritis, Allergies, COPD, IBD, Psoriasis, SLE (systemic lupus erythematosus), Multiple Sclerosis, and Infectious Diseases. Projects from the former CVMLand provide data from cardiovascular diseases, diabetes mellitus, liver disease, lipid metabolism disorders and nutrition disorders.


DiseaseLandDiseases.png


DiseaseLand contains datasets retrieved from a variety of public projects, from GEO (Gene Expression Omnibus), SRA (Sequence Read Archive), ArrayExpress, dbGAP (The Database of Genotypes and Phenotypes), and other large data repositories like ImmGen (The Immunological Genome Project). Moreover, DiseaseLand subscribers have access to the Body Map Collection, including BluePrint, GTEx, and HPA. Most data in DiseaseLand are collected in individual studies. The complexity of integrating these diverse studies requires careful management of data processing pipelines and well-controlled metadata curation


DiseaselandDataSource.png


With a heavy focus on publicly available expression microarray and RNA-Seq data, DiseaseLand offers the potential to look at gene expression, all processed through the same pipeline, across many different projects. DiseaseLand also adds additional value by providing exon level expression and alternative splicing metrics, visualizations and functions. At OmicSoft, thanks to our experienced data curation and processing team, we have a systematic method for data curation. Refer to our Curation Pipeline for details.

DiseaseLand also features Comparison Views, which allows users to easily search and visualize statistical contrasts between groups of samples using common queries: Treated vs Control, Disease vs Normal, Responder vs Non-Responder etc. By searching a gene, user can "visualize" the association with comparisons across thousands of projects, and narrow down to find interesting projects interactively. (Additional reading: ComparisonLand )

Data Sources

GEO

SRA

dbGAP

ArrayExpress

ImmGen

Data Types

Expression Data ( by 2016 Q2 release)

HumanDisease:

  • 57610 samples from microarray platforms
  • 2662 samples from RNA-Seq samples
  • 2870 comparisons
  • 950 projects from GEO


MouseDisease:

  • 14410 samples from microarray platforms
  • 1438 samples from RNA-Seq samples
  • 1767 comparisons
  • 515 projects from GEO


Laboratory Methods

Refer to individual projects clinical metadata for details of how data were generated.

Processing Methods

Expression Data: OmicSoft Affymetrix Microarray Preprocessing

RNA-Seq data: OmicScript Pipeline and Building Land From RNA-Seq Data

OmicSoft does not reprocess other genomic data, but extracts data directly from original datasets.

Key Meta Data Columns:

DiseaseLand is curated at the comparison, sample and project level, with hundreds of meta data columns available.

Comparison level:

  • Comparison Cutoffs: Sample size, fold change, p value and expression cutoffs for each comparison.
  • Comparison details: Comparison Category, Contrast, case and control sample IDs.

Sample level:

  • DiseaseCategory (controlled vocabulary) : Disease category of the sample based on the details disease state.
  • TissueCategory (controlled vocabulary) : Tissue category such as skin, muscle, heart, kidney etc.
  • DiseaseState (controlled vocabulary) : Curated at sample level from each project.
  • SampleSource (controlled vocabulary) : Either cell type or tissue information. When a sample has cell type information, cell type is used. Otherwise, tissue category is used.

Project level:

  • ProjectName: The name of individual projects where the data is from.

Primary Grouping

DiseaseCategory

Sample Distribution by DiseaseCategory:

DiseaseLandSampleDistribution.png

Examples from ImmunoLand:

  • Arthritis
  • Asthma, COPD
  • IBD (Ulcerative Colitis, Crohn’s diseases)
  • Lupus
  • Psoriasis and other skin diseases
  • Infectious diseases and vaccine
  • Neuroimmuno-diseases

Examples from CVMLand:

  • Heart Disease
  • Vascular diseases
  • Diabetes mellitus
  • Diabetic retinopathy
  • Prediabetes
  • Glucose intolerance
  • Insulin resistance
  • Hyperglycemia
  • Islet autoantibody positive
  • Lipid metabolism disorder
  • Nutrition disorders
  • Liver disease
  • Pathological conditions

DiseaseStates within DiseaseCategories

Within most DiseaseCategories, multiple diseases will be described in DiseaseState, providing even finer control when searching for projects of interest.

Examples for DiseaseCategory arthritis:

  • Septic arthritis
  • Juvenile idiopathic arthritis
  • Psoriatic arthritis
  • Osteoarthritis
  • Rheumatoid arthritis
  • Enthesitis related arthritis

Examples for DiseaseCategory diabetes:

  • Type 1 diabetes mellitus
  • Type 2 diabetes mellitus
  • Gestational diabetes mellitus

Secondary grouping

Similar to DiseaseCategory==>DiseaseState, samples are categorized by cell identity at multiple levels, including TissueCategory, Tissue, and CellType.

By default, TissueCategory is used as the secondary grouping.

Key Views

Comparison View:

Nearly all projects in DiseaseLand include at least one comparison between subsets of samples. These comparisons are usually modeled after comparisons in the source publication. All comparison datasets are curated as belonging to different "Comparison Types": Treated vs Control, Disease vs Normal, Responder vs Non-Responder, etc.

In DiseaseLand, you can search for a gene and view its expression in all samples or a single project, or you can visualize which comparisons detected up- or down-regulation of the gene. This way, you can identify projects of interest, and discover trends in your favorite gene's regulation.

DiseaseLandComparisonView.png

Project View:

Experimental designs in projects within DiseaseLand are quite different, and batch effects in microarray projects are difficult to remove. OmicSoft created project-specific views to display expression values based on experimental design within each project.

DiseaseLandProjectView.png

Comparison Details Views:

OmicSoft uses manually curated metadata to generate statistical tests (called comparisons) for each project/study included in DiseaseLand, generally following the comparisons in the original paper. The Comparison collection is useful for finding the common differential expression patterns/signatures between studies, such as between an microarray and NGS study, or to find links between a gen knockout experiment and a compound treatment study.

ComparisonDetailViews.png

Example views include Volcano Plot (upper left), Venn Diagram (upper right), Comparison Heatmap (bottom left) and Significant Genes (bottom right).

Clinical Details View

The OmicSoft curation team carefully curates sample, comparison, and project meta data, including clinical details. As each project has its own key clinical variables, we recommend that users always look at Clinical Details for any specific project/comparison of interest.

ClinicalDetails.png

Example Clinical Details view for project GSE45734, excluding non-clinical variables.

[back to top]


Related Articles

EnvelopeLarge2.png