GxL.PheWAS Land

From Array Suite Wiki

Jump to: navigation, search


GxL.PheWAS_B37: PheWAS results curated by OmicSoft for GeneticsLand data subscription

As part of the GeneticsLand data subscription, OmicSoft has curated publicly available PheWAS data from the UK Biobank, generated by the Neale lab, and the PheWAS Catalog.

This Land contains:

  • >26 billion association summary statistics
  • 3,777 Phenotypes
  • Each variant in the Land is annotated using the following databases (additional annotation sources are available):
dbSNP, 1000 Genomes, gnomAD, ClinVar, GTEx, GWAS catalog, GWAVA, dbNSFP, Conservation, RegulomeDB, OMIM, HGNC, Interpro, DGIdb



Curation Process

Data Processing

Publicly available PheWAS result sets were obtained and processed into GTT format using an internally developed pipeline. This pipeline performs allele standardization, a key feature of GeneticsLand, ensuring that all genetic associations are reported on the forward strand of the same genome build and that the effects (e.g. betas, ORs, HRs) are always given as the alternative allele versus the genome reference-regardless of the original input format. This allows for much easier cross study comparisons.


The PheWAS analysis of the UK Biobank data was performed by the Neale lab using these methods. We then standardized the output summary statistics using our internal pipeline (below).

1) First determine if are results reported on the forward strand
  • Used Informative SNPs (non-A/T, C/G)
  • Calculate % of forward
  • if > 97% assume all variants reported on forward
2) Flag (and keep) ambiguous alleles using the "Uncertain" column
  • If forward assumption met- informative SNPs on reverse strand: Uncertain = EffectDirection
  • If forward assumption not met- non-informative (A/T,C/G) SNPs: Uncertain = EffectDirection
  • Both- alleles do not match genome reference: Uncertain = TwoNonRefs
3) Standardize the effects to always compare the ALT allele vs the REF allele
  • Set model reference = REF and model effect allele = ALT
  • Flip effect (beta, OR, HR) and other relevant columns when necessary so results are always show in the same orientation in Land
4) Perform additional calculations
  • If std. error is provided, calculate confidence intervals
  • Calculate OR for binary outcomes

For studies where only an RS ID is provided (Source=PheWAS Catalog), we used the following method to determine the alleles:

1) If the RS ID is found in dbSNP (including any merged RS IDs)
  • Used the REF and ALT alleles given in dbSNP
  • Multiallelic RS IDs are written as allele1/allele2
  • Set Uncertain = EffectAllele to indicate the alleles are being inferred
2) If RS ID not found in dbSNP
  • REF = genome reference base and ALT = N
  • Set Uncertain = EffectAllele to indicate the alleles are unknown

Metadata Curation

How To Use

Metadata Data Dictionary

Column Description Example Entries
AssociationID Unique association result set id, generally a phenotype description e.g. Pseudoexfoliation glaucoma, Abdominal aortic aneurysm
ICD-10 Diagnosis Code ICD-10 Diagnosis Code code where available, UK Biobank data only e.g. G44, H43
N Cases Number of cases e.g. 211
N Controls Number of controls e.g. 337077
N Missing Number of missing e.g. 4146
N Non-missing Total number analyzed 18695
Phenotype Phenotype analyzed e.g. Corneal edema, Treatment/medication code: bumetanide,
PHESANT Notes PHESANT output from the Neale lab's UK Biobank analysis
PHESANT Reassignments PHESANT output from the Neale lab's UK Biobank analysis
PheWAS Catalog PheCode Custom phenotype grouping used by the PheWAS Catalog described here e.g. 395.3
Project Name Data source PheWAS Catalog or UK Biobank
Questionnaire Notes UK Biobank questionnaire notes e.g. ACE touchscreen question How many children have you fathered? ...
Source Data source and location e.g. PheWAS Catalog (https://phewascatalog.org/)
UK Biobank Field.code UK Biobank Data-Field e.g. 90088
Warning for Case/Control PHESANT output from the Neale lab's UK Biobank analysis e.g. YES

Gene or SNP-centric workflow

1) Start by searching the Land for your Gene(s) or SNP(s) of interest. You can search by:

SNP: rs7412
Gene: APOE
Coordinate: 19:45412079
Region: 19:45412070-45412080


Or search multiple SNPs, genes, coordinates or regions at once


2) Open the annotated association results under Select View | Curated Studies (Table)


  • Note- multi-variant searches (any search other than a single snp search) includes an All SNPs table

3) Browse, modify, filter, or export results using the filter panel, task bar, and export buttons.


Phenotype-centric workflow

There are several ways to open a specific Association result set in Land. In GxL.Associaitons_B37, you may want to browse all studies related to a specific phenotype.


Associations views can be reached easily by searching 1 or more associations. Views include interactive plots:

genome plots GenomePlot.png

and region plots RegionPlot.png

Data Source

All PheWAS result sets are open access. Please cite the original source when using this resource.

[back to top]

Related Articles

[back to top]