From Array Suite Wiki
Genetic Principal Component Analysis
This method runs PLINK2's implementation of GCTA's Principal Component Analysis (PCA) on genotypes from GeneticsLand using the phase 3 genotypes of the 1000 Genomes continental reference populations to infer the ancestry of each sample.
To run this module, from a GeneticsLand click Analytics | Principal Component Analysis
Input Data Requirements
This method requires the user to choose a Sample Set specifying which samples to analyze.
Land - Be sure to select the GeneticsLand in which you are working (the one containing the genotyped samples you wish to analyze)
Sample set - Choose the Sample Set listing the samples you wish to analyze Result set name - Provide a unique name for the Result Set. If a Result Set already exists with that name, you will be prompted to confirm overwriting. Job number - Specify how many export jobs should be submitted simultaneously. If your Array Server has a cluster enabled, these jobs will be submitted to the cluster. Currently, the exporting is broken up into 700 jobs so, setting as 700 will submit all jobs to the cluster queue immediately.
After the job has completed, view the Result Set by clicking Analytics | Open Result Set
Then select your Result Set name under the Principal Component Analysis tag and click OK
The Result Set will contain 3 views. The first is the table of eigenvectors calculated from the PCA with the 1000 genomes reference samples. The samples from your Sample Set will be at the bottom with a Data Source value of Study. This table also includes the Inferred Population based on the first 5 eigenvectors.
The second view is the scatter plot of eigenvector 1 on the X axis and eigenvector 2 on the Y axis with samples colored by the Inferred Population and shaped by the Data Source as indicated in the Legend on the right.
The third view is the same as the second except using eigenvectors 3 and 4.