Land GxlGeneBurdenTest.pdf

From Array Suite Wiki

Jump to: navigation, search



The function to perform SNP-set (Sequence) Kernel Association Test (SKAT) and gene burden testing. The result is saved as a result set.

Example workflow

1) Define the SampleSet to be analyzed
a. When defining the SampleSet, consider the availability of phenotype data and covariates so that the SampleSet will be representative of the final sample list to be analyzed
b. Decide what data type you will be analyzing (array genotypes, sequence, or imputed data) and include this in your SampleSet definition
There are several ways to create SampleSets in Land. For example, use the filter pane to remove samples missing phenotype data and samples that have :not been imputed. Then create use Create SampleSet
GeneBurden SampleSet.png

2) Define the variant set to be analyzed
a. Search a gene, gene list, variant list, or region in the search box
b. Open the Allele Frequency view for the data type you plan to analyze (e.g. Select View | Imputed Allele Frequency)
GeneBurden VariantSet.png
c. Filter the variants using allele frequency, classifiers of interest or the GAIT Mask. Then save the variant set as a text file.
GeneBurden VariantSet2.png
The GAIT Mask is designed to mirror the variant filters available on the Genetic Analysis Interactive Tool (GAIT). Currently 3 filters are available, defined as:
  • Protein-truncating + non-snynonymous with MAF < 1% (PTV+NS 1%)
  • Protein-truncating + possibly deleterious non-synonymous with MAF < 1% (PTV+NSbroad 1%*)
  • Protein-truncating only (PTV)
Note, PTV+NSborad 1%* does not match the GAIT criteria exactly because GeneticsLand currently only has one of the functional predictors (SIFT). Others will be add in the future and the mask will be updated to account for those additional predictions.

3) Run the Gene Burden/SKAT Test as outlined using the appropriate options

4) View the results in Land

Perform Gene Burden Test

General Options


Land – Select the instance of GeneticsLand where you want to export data from

Data type – Select the data type that you wish to use in your analysis. The exported variants will depend  on how the data was published. For example, selecting “Genotyped Data” will export variants that were originally published as genotype data (PLINK or VCF (Genotyped)) and will exclude variants from sequencing and imputation (VCF (Sequenced), VCF (Imputed), and Impute2).

Output folder – Designate the output logs location

Select sample set – Sample sets are a collection of samples within GeneticsLand. To create a SampleSet, Select View | Samples | Create SampleSet

Select variant list file – Select a subset of variants provided in the List file.

Gene sets:

  • Single set test – If selected, all variants in the variant set will be treated as one set (and processed together) in the SKAT and Burden tests. Designate a name of the gene set (e.g. BRCA_region) in the "Set name."
  • Multiple set test – If selected, variants will be divided into sets based on their gene annotation. All variants that do not map to a gene (e.g. inter gene regions), will be dropped. Use if performing testing on regions or multiple genes at once.

Exporting options:

  • R2 cutoff (derived) – Represents the imputation quality threshold (set to 0 to export everything) calculated using the selected samples
  • Dose to genotype threshold – Threshold used to convert imputation dosage values to genotypes. If the imputation dosage is within ± the selected threshold value of genotype values (0, 1, or 2), then the dosage will be converted to a genotype and exported. The default threshold (0.5) will convert all dosage values to a genotype. A dose to genotype threshold of 0.3 is often used. 

Result set name – Name to be use for output logs and the returned result set in Land.

Model Options



  • Select Continuous trait (linear model) for a quantitative outcome (phenotype) (e.g. BMI)
  • Select Binary trait (logistic model) for a binary outcome (phenotype) (e.g. Case/Control)

Specify Model (Phenotype and Covariates)

  • Select the phenotype to be tested using the dynamic search box, which shows clinical and sample metadata

  • Include covariates to in the SKAT/burden test models by scrolling through the metadata variables and selecting "Add"

  • Specify control (binary trait) – For binary traits only, select the referent group for the models

NOTE: The SKAT implementation does not currently handle crossed (interaction) or nested models, so selecting these while building your model will have no effect.

SKAT options

  • Marker missing rate threshold – Use to remove variants with missing call rates greater than the selected threshold. Note, any remaining missing genotypes will be imputed using SKAT's methodology (impute.method="fixed")
  • Max MAF (<=0.5) – Use to set a max minor allele frequency (MAF) threshold. Any variants with a MAF greater than this threshold will be dropped. Use to select only rare variants for SKAT/Burden testing. The default (0.5) means no MAF threshold will be applied.
  • Combine common/rare variants – If selected, will run SKAT's Sequence Kernel association test for the combined effect of common and rare variants (SKAT_CommonRare). Unselect if you have restricted your variant set to rare variants only (for example, using the MAX MAF theshold above).


In Land

A table and plot of the results are returned to Land Select Analytics | Open Result Set


The results will be labeled with the name given by the Result set name

1) Plot Scatter plot (for quantitative phenotypes)


This shows the scatter plot results from the burden test results - a linear regression model of your selected phentoype (adjusted for any covariates included in your model) and a count of the minor alleles present in a given SNP set. The burden test PValue, Beta, and confidence intervals (CIs) are shown in the plot title.

Frequency plot (for binary phenotypes)


This shows a frequency plot of the number of cases and controls that have at least one minor allele from the SNP set (ie burden counts are converted to binary yes/no). "Controls" are defined during by Specify control (binary trait) above. The burden test results (PValue, odds ratio (OR), and CIs) from a logistic regression model of your selected phentoype (adjusted for any covariates included in your model) and a count of the minor alleles present in a given SNP set are shown in the plot title.

Note- if you've use the Multiple set test option, you can scroll through the plots on the right (red arrow).

2) GTT Table Report

Table is publishable to Land