GeneticsLand Overview

From Array Suite Wiki

Jump to: navigation, search


What is GeneticsLand?

Schematic overview of data flow for GeneticsLand

GeneticsLand is a robust solution for storage, integration, querying, and visualization of big genetic data sets, including both individual-level data (genotype or allele dose data) and summary statistics like allele frequencies and genotype-phenotype association results of binary and quantitative traits including eQTLs.

GeneticsLand is also a subscription to curated data content including genotypes or frequencies from reference populations like the 1000 Genomes and genotype-phenotype associations from GWAS like GTEx.

Big Data Storage

The core component of a GeneticsLand instance is its repository of genotype (or allele dose) data. You can think of the data in GeneticsLand as a two-dimensional matrix or table. Along one axis are all the genetic variants and along the other axis are the DNA samples such that each cell contains a genotype or dose. This same repository is used to store the genetic association results – sharing the same genetic variant axis but instead of DNA samples, the other axis is for the association analysis results such that each cell contains the statistics.

Variant DNA1 DNA2 ... Analysis1 Analysis2
chr1:10505:A>T A/T ... P=0.01; stderr=0.17; beta=1.2
chr1:15903:G>GC G/G 0.95 ... P=0.001; stderr=0.23; beta=1.01 P=0.0436; stderr=0.05; OR=0.9
... ... ... ... ... ...
chrY:28765024:CG>C 0.02 ...

Data Integration

In addition to the genetic data repository, GeneticsLand will dynamically join additional information at both the variant and sample level for each query. You can leverage the most power from GeneticsLand by adding your own data so that it can be centrally queried with the public / reference data curated by OmicSoft.

  • Variant annotations from over twenty different annotation sources from a number of sources. If you are aware of additional annotations in the public domain that would be useful, please let us know and we will work to build new classifiers (annotation). You also have the option of building your own custom/proprietary classifiers that you can easily integrate with your specific sample data.


  • Store, group, and query samples across projects with user-level and project-level access controls:

Manage | Show Land Statistics/ Manage User Access/ Manage Project Access
Here, you can control user-level/project-level access and data subsets. Check Array Land Manage for more information.
Manage | Samples | Manage Sample Sets
Manage | Samples | Sample Meta Data
Manage | Samples | Manage Sample Clinical Data
Manage | Samples | Manage Project Meta Data
Manage | Genes | Manage Gene Sets
See also GeneSetAnalysis
Manage | Associations | Manage Association Meta Data
Manage | Measurement/Screening | Add Measurement Data

Query / Search

Data is stored in GeneticsLand as chr:position:ref:alt and all searches are 'turned into' position based searches. From the user perspective, there are four main ways to search a GeneticsLand:

  1. Single variant (examples: rs73211978, 19:45404377, 19:45404749:C:T)
  2. Single gene (example: TOMM40)
  3. List of variants, genes, or a region
  4. Phenotype (association result endpoint)



There are several context-specific pre-defined views for each search type, as well as “Land-level” views that provide an overview of the contents of the Land and views of sample data independent of any genetic context. In addition to tabular views of the data, there are many graphical views, all of which are fully interactive so that users can select individual samples and completely customize-able, including grouping and filtering by any variable (the data itself, variant annotations, or sample metadata)


An example of one of the many customizable views in GeneticsLand



Analytics | Principal Component Analysis
Analytics | Variant Association Analysis
Analytics | Gene Burden Test
Analytics | Draw Pedigree
See Manage Result Sets in ArrayLand for information on result sets.

Curated Association Results

As part of the GeneticsLand data subscription , OmicSoft is curating results from Genome-Wide Association Studies (GWAS) that are publicly available. GeneticsLand contains curated data from over 7,000 association result sets. These curated datasets may be used to compare your results to previous studies, or help with study design.

As with the users’ own genome-wide data, there are several, customizable views available for displaying the results of the curated datasets:

For more detailed instructions, please see Using GeneticsLand and the GeneticsLand Tutorial.