Land GxlGwasPCA.pdf

From Array Suite Wiki

Jump to: navigation, search


Genetic Principal Component Analysis



This method runs PLINK2's implementation of GCTA's Principal Component Analysis (PCA) on genotypes from GeneticsLand using the phase 3 genotypes of the 1000 Genomes continental reference populations to infer the ancestry of each sample.

To run this module, from a GeneticsLand click Analytics | Principal Component Analysis


[back to top]

Input Data Requirements

This method requires the user to choose a Sample Set specifying which samples to analyze.

[back to top]

General Options


Land - Be sure to select the GeneticsLand in which you are working (the one containing the genotyped samples you wish to analyze)

[back to top]


Sample set - Choose the Sample Set listing the samples you wish to analyze

Job number - Specify how many export jobs should be submitted simultaneously. If your Array Server has a cluster enabled, these jobs will be submitted to the cluster. Currently, the exporting is broken up into 700 jobs so, setting as 700 will submit all jobs to the cluster queue immediately.

Output Results

1000 Genomes anchored plots

After the job has completed, view the Result Set by clicking Analytics | Open Result Set


Then select the Sample set name under the Principal Component Analysis tag and click OK


The Result Set will contain 3 views. The first is the table of eigenvectors calculated from the PCA with the 1000 genomes reference samples. The samples from your Sample Set will be at the bottom with a Data Source value of Study. This table also includes the Inferred Population based on the first 5 eigenvectors.


The second view is the scatter plot of eigenvector 1 on the X axis and eigenvector 2 on the Y axis with samples colored by the Inferred Population and shaped by the Data Source as indicated in the Legend on the right.


The third view is the same as the second except using eigenvectors 3 and 4.

[back to top]

PCs for use as covariates in association analyses

The first 10 PCs will be calculated separately in the sample genotypes alone (without the 1000 Genomes) to give results appropriate for use as covariates in genetic association analyses to adjust for population structure. These will be reported in a new Sample Set which you can access by clicking Manage | Samples | Manage Sample Sets ManageSampleSets.png

Then select the Sample set_PCA under the Principal Component Analysis tag to see this table of PCs with the InferredPopulation column joined from the 1000 Genomes-based analysis displayed in the Result set above. PCAsampleset.png

[back to top]



[back to top]

Related Articles

[back to top]