Land GxlGwasPCA.pdf

From Array Suite Wiki

Jump to: navigation, search



Contents

Genetic Principal Component Analysis

PCAflow.png

Overview

This method runs PLINK2's implementation of GCTA's Principal Component Analysis (PCA) on genotypes from GeneticsLand using the phase 3 genotypes of the 1000 Genomes continental reference populations to infer the ancestry of each sample.

To run this module, from a GeneticsLand click Analytics | Principal Component Analysis

PCAmenu.png

[back to top]

Input Data Requirements

This method requires the user to choose a Sample Set specifying which samples to analyze.

[back to top]

General Options

PCA.png


Land - Be sure to select the GeneticsLand in which you are working (the one containing the genotyped samples you wish to analyze)

[back to top]

Options

Sample set - Choose the Sample Set listing the samples you wish to analyze

Job number - Specify how many export jobs should be submitted simultaneously. If your Array Server has a cluster enabled, these jobs will be submitted to the cluster. Currently, the exporting is broken up into 700 jobs so, setting as 700 will submit all jobs to the cluster queue immediately.

Output Results

1000 Genomes anchored plots

After the job has completed, view the Result Set by clicking Analytics | Open Result Set

OpenResultSet.png

Then select the Sample set name under the Principal Component Analysis tag and click OK

SelectPCAResultSet.png

The Result Set will contain 3 views. The first is the table of eigenvectors calculated from the PCA with the 1000 genomes reference samples. The samples from your Sample Set will be at the bottom with a Data Source value of Study. This table also includes the Inferred Population based on the first 5 eigenvectors.

PCAtable.png

The second view is the scatter plot of eigenvector 1 on the X axis and eigenvector 2 on the Y axis with samples colored by the Inferred Population and shaped by the Data Source as indicated in the Legend on the right.

PCAscatter.png

The third view is the same as the second except using eigenvectors 3 and 4.


[back to top]

PCs for use as covariates in association analyses

The first 10 PCs will be calculated separately in the sample genotypes alone (without the 1000 Genomes) to give results appropriate for use as covariates in genetic association analyses to adjust for population structure. These will be reported in a new Sample Set which you can access by clicking Manage | Samples | Manage Sample Sets ManageSampleSets.png

Then select the Sample set_PCA under the Principal Component Analysis tag to see this table of PCs with the InferredPopulation column joined from the 1000 Genomes-based analysis displayed in the Result set above. PCAsampleset.png

[back to top]

OmicScript

GxlGwasPCA

[back to top]

Related Articles

[back to top]

EnvelopeLarge2.png