Land GwasAssociationAnalysis.pdf

From Array Suite Wiki

Jump to: navigation, search


Contents

Overview

The function to perform variant association testing using for either all variants in land (genome-wide association) or for a pre-define set of variants. The result is saved as a result set.

Example workflow

1) Define the SampleSet to be analyzed
a. When defining the SampleSet, consider the availability of phenotype data and covariates so that the SampleSet will be representative of the final sample list to be analyzed
b. Decide what data type you will be analyzing (array genotypes, sequence, imputed data, or combined)
There are several ways to create SampleSets in Land. For example, use the filter pane to remove samples missing phenotype data and samples that have :not been imputed. Then create use Create SampleSet
2) Run PCA if you would like to use PCs as covariates in your association model. Note, if you run a new PCA, you will need to log off Array Server and back in after the

PCA completes to be able to choose the newly created sample set with the PCs.

3) Run the Variant Association as outlined using the appropriate options
4) View the results in Land or publish to see full association views .

Perform Variant Association Analysis

General Options

PerformVariantAssociationAnalysis.png

Land – Select the instance of GeneticsLand that you wish to use in your analysis

Data type – Select the data type that you wish to use in your analysis. The exported variants will depend  on how the data was published. For example, selecting “Genotyped Data” will export variants that were originally published as genotype data (PLINK or VCF (Genotyped)) and will exclude variants from sequencing and imputation (VCF (Sequenced), VCF (Imputed), and Impute2).

Sample set – Sample sets are a collection of samples within GeneticsLand. To create a SampleSet, Select View | Samples | Create SampleSet

Select variant list file – Select a subset of variants provided in the List file.

Model Options

  • Select Continuous trait (linear model) for a quantitative outcome (phenotype) (e.g. BMI)
  • Select Binary trait (logistic model) for a binary outcome (phenotype) (e.g. Case/Control)
  • Select Survival trait (cox model) for survival outcomes
  • Check the EMMAX option to run the Efficient Mixed-Model Association eXpedited (EMMAX) method of adjustment of population stratification. Only available for continuous outcomes. Alternatively, PCA can be used for all outcomes.

Specify Model (Phenotype and Covariates)

SpecifyModel.png
  • Select the phenotype to be tested using the dynamic search box, which shows clinical and sample metadata
SpecifyModelPhenotype.png


  • Include covariates in the association model by scrolling through the metadata variables and clicking Add (SNP is included in the model by default and represents the genetic variant)
SpecifyModelCovariates.png
  • Use the Class tick box to indicate which variables should be treated as categorical (ticked) or numeric (un-ticked).
  • You may include a single SNP interaction term in the model as follows:
    1. Add the interacting variable to the model as a main effect per above
    2. Making sure to de-select any terms in the model on the right, use the Ctrl key to multi-select SNP and the interacting variable then click the Cross button.
Cross.png
  • The following interaction conditions will not work:
    • Interaction with more than 2 variables or with 2 non-SNP variables
    • Failing to include the main effect of the interacting variable
    • More than one interaction term
  • Click OK to return to the analysis setup
  • Set Control – For binary traits only, select the referent group for the model


Options:

  • HWE p-value cutoff – Filter by Hardy-Weinberg Equilibrium (HWE) pvalue for genetic markers
  • R2 cutoff (derived) – Represents the imputation quality threshold (set to 0 to export everything) calculated using the selected samples
  • Allele count cutoff – Cutoff of the minor allele count in the selected samplesets. It will exclude markers with very few instances of minor allele (less than the cutoff).


Result set name – Name to be use for output logs and the returned result set in Land.

Output folder – Designate the output location


Output

In Land

Results are returned to Land Select Analytics | Open Result Set

GeneticsLandResultSet.png


The results will be labeled with the name given by the Result set name