Land GxlGeneBurdenTest.pdf

From Array Suite Wiki

Jump to: navigation, search


Perform Gene Burden Test


The function to perform SNP-set (Sequence) Kernel Association Test (SKAT) and gene burden testing. The result is saved as a result set.

General Options


Land – Select the instance of GeneticsLand where you want to export data from

Data type – Select the data type that you wish to use in your analysis. The exported variants will depend  on how the data was published. For example, selecting “Genotyped Data” will export variants that were originally published as genotype data (PLINK or VCF (Genotyped)) and will exclude variants from sequencing and imputation (VCF (Sequenced), VCF (Imputed), and Impute2).

Output folder – Designate the output logs location

Select sample set – Sample sets are a collection of samples within GeneticsLand. To create a SampleSet, Select View | Samples | Create SampleSet

Select variant list file – Select a subset of variants provided in the List file.

Gene sets:

  • Single set test – If selected, all variants in the variant set will be treated as one set (and processed together) in the SKAT and Burden tests. Designate a name of the gene set (e.g. BRCA_region) in the "Set name."
  • Multiple set test – If selected, variants will be divided into sets based on their gene annotation. All variants that do not map to a gene (e.g. inter gene regions), will be dropped. Use if performing testing on regions or multiple genes at once.

Exporting options:

  • R2 cutoff (derived) – Represents the imputation quality threshold (set to 0 to export everything) calculated using the selected samples
  • Dose to genotype threshold – Threshold used to convert imputation dosage values to genotypes. If the imputation dosage is within ± the selected threshold value of genotype values (0, 1, or 2), then the dosage will be converted to a genotype and exported. The default threshold (0.5) will convert all dosage values to a genotype. A dose to genotype threshold of 0.3 is often used. 

Result set name – Name to be use for output logs and the returned result set in Land.

Model Options



  • Select Continuous trait (linear model) for a quantitative outcome (phenotype) (e.g. BMI)
  • Select Binary trait (logistic model) for a binary outcome (phenotype) (e.g. Case/Control)

Specify Model (Phenotype and Covariates)

  • Select the phenotype to be tested using the dynamic search box, which shows clinical and sample metadata

  • Include covariates to in the SKAT/burden test models by scrolling through the metadata variables and selecting "Add"

  • Specify control (binary trait) – For binary traits only, select the referent group for the models

SKAT options

  • Marker missing rate threshold – Use to remove variants with missing call rates greater than the selected threshold. Note, any remaining missing genotypes will be imputed using SKAT's methodology (impute.method="fixed")
  • Max MAF (<=0.5) – Use to set a max minor allele frequency (MAF) threshold. Any variants with a MAF greater than this threshold will be dropped. Use to select only rare variants for SKAT/Burden testing. The default (0.5) means no MAF threshold will be applied.
  • Combine common/rare variants – If selected, will run SKAT's Sequence Kernel association test for the combined effect of common and rare variants (SKAT_CommonRare). Unselect if you have restricted your variant set to rare variants only (for example, using the MAX MAF theshold above).


In Land

A table and plot of the results are returned to Land Select Analytics | Open Result Set


The results will be labeled with the name given by the Result set name

1) Plot Scatter plot (for quantitative phenotypes)


This shows the scatter plot results from the burden test results - a linear regression model of your selected phentoype (adjusted for any covariates included in your model) and a count of the minor alleles present in a given SNP set. The burden test PValue, Beta, and confidence intervals (CIs) are shown in the plot title.

Frequency plot (for binary phenotypes)


This shows a frequency plot of the number of cases and controls that have at least one minor allele from the SNP set (ie burden counts are converted to binary yes/no). "Controls" are defined during by Specify control (binary trait) above. The burden test results (PValue, odds ratio (OR), and CIs) from a logistic regression model of your selected phentoype (adjusted for any covariates included in your model) and a count of the minor alleles present in a given SNP set are shown in the plot title.

Note- if you've use the Multiple set test option, you can scroll through the plots on the right (red arrow).

2) GTT table Table is publishable to Land

Example workflow