CuratedGWAS

From Array Suite Wiki

Jump to: navigation, search

Contents

GxL.Associations_B37: Public GWAS results curated by OmicSoft for GeneticsLand data subscription

As part of the GeneticsLand data subscription, OmicSoft curates Genome Wide Association Studies (GWAS) that are publicly available.

This Land contains:

  • 11,382 Association Results Sets
  • >1,000 Phenotypes from 155 different trait Categories
  • Each variant in the Land is annotated using the following databases (additional annotation sources are available):
dbSNP, 1000 Genomes, ClinVar, gnomAD, SIFT, PolyPhen2, MutationTaster, LRT, GTEx, GWAS catalog, GRASP 2.0, conservation scores from dbNSFP, GWAVA, HaploReg, RegulomeDB, HGNC, OMIM, InterPro, The Drug Gene Interaction Database (DBIdb)


Content

AssociationsLand.png

  • A list of the phenotype Categories included in the Land



Addiction Cardiomyopathy Glaucoma Multiple sclerosis (MS) Sickle cell anemia
Adipose-related Cardiovascular disease (CVD) Graft-versus-host Muscle-related Skin cancer
Age-related macular degeneration (ARMD) Celiac disease Grave's disease Musculoskeletal Skin-related
Aging Cervical cancer GxE Myasthenia gravis Sleep
Alcohol Chronic kidney disease Hair Myocardial infarction (MI) Smallpox
Allergy Chronic lung disease Hearing Narcotics Smoking
Alzheimer's disease Chronic obstructive pulmonary disease (COPD) Heart Nasal Stone
Amyotrophic lateral sclerosis (ALS) Cognition Height Nasal cancer Stroke
Anemia Colorectal cancer Hepatic Neuro Subclinical CVD
Aneurysm Congenital Hepatitis Obsessive-compulsive disorder (OCD) Surgery
Anthrax Coronary heart disease (CHD) HIV/AIDS Oral cancer Systemic lupus erythematosus (SLE)
Anthropometric C-reactive protein (CRP) Hormonal Oral-related Testicular cancer
Arterial Crohn's disease Huntington's disease Ovarian Thrombosis
Arthritis CVD risk factor (CVD RF) Imaging Ovarian cancer Thyroid
Asthma Cystic fibrosis Immune measures/Inflammatory disease Pancreas Thyroid cancer
Atrial fibrillation Depression Infection Pancreatic cancer Treatment response
Attention-deficit/hyperactivity disorder (ADHD) Developmental Influenza Parkinson's disease Tuberculosis
Autism Diet-related Kidney cancer Physical activity Type 1 diabetes (T1D)
Behavior/Social Drug response Leukemia Platelet Type 2 diabetes (T2D)
Bipolar disorder Emphysema Lipids Pregnancy-related Ulcerative colitis
Bladder cancer Endometrial cancer Liver cancer Prostate Upper airway tract cancer
Blood cancer Environment Lung cancer Prostate cancer Urinary
Blood measure Epilepsy Lymphoma Pulmonary Uterine cancer
Blood pressure Esophageal cancer Male Radiation Uterine fibroids
Body mass index Eye-related Melanoma Rectal cancer Vaccine
Bone cancer Female Menarche Renal Valve
Bone-related Gallbladder cancer Menopause Renal cancer Vasculitis
Brain cancer Gallstones Methylation Reproductive Venous
Breast cancer Gastric cancer Mood disorder Rheumatoid arthritis Vitamin
Cancer Gastrointestinal Mortality Salmonella Weight
Cancer-related General health Movement-related Schizophrenia Wound


[back to top]


Curation Process

Data Processing

Publicly available GWAS result sets were obtained and processed into GTT format using an internally developed pipeline. This pipeline performs allele standardization, a key feature of GeneticsLand, ensuring that all genetic associations are reported on the forward strand of the same genome build and that the effects (e.g. betas, ORs, HRs) are always given as the alternative allele versus the genome reference-regardless of the original input format. This allows for much easier cross study comparisons.

Pipeline

We've developed a pipeline for allele standardization that minimizes the loss of variants due to strand ambiguities. An overview of the processes steps is below:

1) First determine if are results reported on the forward strand
  • Used Informative SNPs (non-A/T, C/G)
  • Calculate % of forward
  • if > 97% assume all variants reported on forward
2) Flag (and keep) ambiguous alleles using the "Uncertain" column
  • If forward assumption met- informative SNPs on reverse strand: Uncertain = EffectDirection
  • If forward assumption not met- non-informative (A/T,C/G) SNPs: Uncertain = EffectDirection
  • Both- alleles do not match genome reference: Uncertain = TwoNonRefs
3) Standardize the effects to always compare the ALT allele vs the REF allele
  • Set model reference = REF and model effect allele = ALT
  • Flip effect (beta, OR, HR) and other relevant columns when necessary so results are always show in the same orientation in Land
4) Perform additional calculations
  • If std. error is provided, calculate confidence intervals
  • Calculate OR for binary outcomes


For studies where only an RS ID is provided, we used the following method to determine the alleles:

1) If the RS ID is found in dbSNP (including any merged RS IDs)
  • Used the REF and ALT alleles given in dbSNP
  • Multiallelic RS IDs are written as allele1/allele2
  • Set Uncertain = EffectAllele to indicate the alleles are being inferred
2) If RS ID not found in dbSNP
  • REF = genome reference base and ALT = N
  • Set Uncertain = EffectAllele to indicate the alleles are unknown

Metadata Curation

The OmicSoft GeneticsLand curation team, with expertise in GWAS, work to generate accurately curated project and sample level metadata (e.g. GWAS modeling information, dataset stratification, sample sizes, outcome unit, genotyping array, etc.). Additionally, the GWAS outcomes are classified into broad trait categories, allowing for quick phenotype-based filtering and searching.

How To Use

Metadata Data Dictionary

Column Description Controlled Vocab or an example
AssociationID A unique ID for each independent analysis. A publication (grouped by projects in Project Name) may perform more than one association (GWAS) analysis. Each are given a unique ID comprised of the phenotype, any subsets (e.g. Males only or Stage 1), and the PMID (where available) Dementia Late Onset Alzheimers Disease Stage 1 PMID24162737
Ancestry Ancestry of the individuals used in the association result set European
Categories Broad phenotype category Cardiovascular
Consortia Consortia that generated the association result set CHARGE
Date of Publication Year in which the journal article was published 2010
Discovery Sample Description Description of the individuals used in study's discovery phase 4,275 Korean ancestry individuals
Effect Type Defines the type of value given for the "Effect Size" column in GeneticsLand OR, Beta, HR, Z-score, NA
First Author Last name of the first author of the journal article Speliotes
Imputed Indicates if the study was imputed. NA used for missing and could mean study was not imputed or that imputation has not been determined yet Imputed, NA
Inclusion Threshold Defines if the association result set has an inclusion threshold or if all results are provided. Full = full summary statistics available; Top Hits = top hits summary statistics only, a p-value threshold was applied. Full, Top Hits
Journal Title of the Journal where the analysis was published Nat Genet
Number of Variants Total number of SNPs and/or indels (N) used in the source GWAS- for Top Hits only studies, will not represent the number of results in Land
Outcome Unit The measurement unit of the phenotype. Generally available only for studies that contain full summary statistics g/dL
Phenotype The phenotype/outcome tested in the association result set Myocardial Infarction
Platform Genotyping array used Custom Illumina iSelect
PMID Pubmed ID for the journal article PMID28588231
Replication Sample Description Description of the individuals used in study's replication phase 7579 EA cases, 8236 controls
Source Source from where the association result set was obtained dbGaP
Subset Indicates whether a study was restricted to men, women or children Men, Women, Children, NA
Title Title of the Journal article or consortia Genetic variants associated with disordered eating
Total Discovery Samples Total number of individuals (N) used in the study's discovery phase
Total Replication Samples Total number of individuals (N) used in the study's replication phase
Total Sample Size Total number of individuals (N) in the study

Gene or SNP-centric workflow

1) Start by searching the Land for your Gene(s) or SNP(s) of interest. You can search by:

SNP: rs7412
Gene: APOE
Coordinate: 19:45412079
Region: 19:45412070-45412080


GeneSearch.png



Or search multiple SNPs, genes, coordinates or regions at once

BasicSearch.png


2) Open the annotated association results under Select View | Curated Studies (Table)


CuratedStudiesTable.png


  • Note- multi-variant searches (any search other than a single snp search) includes an All SNPs table


3) Browse, modify, filter, or export results using the filter panel, task bar, and export buttons.

CuratedStudiesTable2.png


Phenotype-centric workflow

There are several ways to open a specific Association result set in Land. In GxL.Associaitons_B37, you may want to browse all studies related to a specific trait category. For example:


1) Select studies of interest by filtering on phenotype categories (or other metadata columns)

PhenotypeSearch.png



2) Highlight the category and specific studies of interest. Then select "Browse Selected Associations"

SelectPhenotypes.png



3) Select from the available associations views (top hits table, genome plots, etc.) under "Select View"

SelectView.png


  • Note: you can also browse association results by searching for the AssociationID in the search box or under Search Multiple Associations | Add From Land

Views

Associations views can be reached easily by searching 1 or more associations. Views include interactive plots:

genome plots GenomePlot.png

and region plots RegionPlot.png

Data Source

All GWAS results sets are open access. Please cite the original source when using this resource.


[back to top]


Questions or Comments

Related Articles

[back to top]

EnvelopeLarge2.png