Annotate Variants in Array Studio
From Array Suite Wiki
Annotating Variants in Array Studio
Array Studio provides several methods to annotate your NGS variant data with predicted affected genes/consequences, allele frequency, and more. Whether the variant data were generated by Array Studio's RNA-seq/DNA-seq variant-calling functions or an external variant caller, with paired or unpaired samples, you can use dozens of annotation sources to help categorize and prioritize variants.
Summarizing variants from NGS data
By default, OmicSoft's RNA-seq and DNA-seq pipeline functions will run the OmicSoft Variant/SNP-calling function, which can also be run separately on your NGS Data. If your samples have disease-normal pairs, you can also run Summarize Matched-Pair Variation. Both of these functions will output a MutationReport, and can also optionally output merged or individual sample VCF files.
Finally, external tools will commonly generate one or more VCF files. All of these outputs can be annotated in Array Studio.
Choosing an annotation method
Depending on the variant data source, the primary goals of the annotation, and the scale of the data, Array Studio has three different methods tailored to allow you to effectively annotate your variant data.
Briefly, the three function will use the same mutation/variant annotators, with the same logic in matching variant position and alteration, with differences in
- The first method annotates an OmicSoft mutation report table; the second method annotates directly from a VCF file; the third annotates a VCF data object, after importing VCF files into your Array Studio project
- The second function annotates a single VCF file into a OSCR format, which is also a central component of Omicsoft GeneticsLand. OSCR-based annotation is tremendously fast for large VCF files, since the results are stored in a database format with multiple indexes and links back to the original VCF file.
- The third function annotates a VCF data object (but the user can add multiple VCF files into a single NgsData object) into a single giant table (called TableLand, optimized for streaming).
Mutation Reports output by Array Studio are Table data objects, held in-memory. These can be directly annotated using Annotate Variant Table Report, which will generate a new Table/TableLand object. TableLand objects read directly from a file, so are read-only, but are incredibly efficient for reading tables with millions of rows, even with filtering.
Annotated mutation reports will have one row per-variant, per affected gene, with separate columns displayed for each sample. Thus, annotating a mutation report is best if you are particularly interested in a specific subset of samples, and want to set filtering parameters to identify the genes and variants in those samples.
Annotated Mutation Report Result Table
A VCF file (as well as GTT, RS_ID, or BED file) can also be annotated directly from-file with Annotate Variant File. This method is designed to generate streaming-optimized .OSCR files, OmicSoft's optimized file type for streaming variant annotation data of immense size.
.OSCR files are used in OmicSoft's GeneticsLand, and is designed for holding variant annotation for huge numbers of samples and variants. For example, OSCR-based annotation of TCGA's BRCA VCF file (26 Gigabytes, 3.4 million variants, 2187 samples) can be filtered to 1751 Pathogenic variants (ClinVar annotation) within seconds.
This means that .OSCR-based annotations are optimal for an "overview" of your data, as data are displayed as one variant per-row; Sample-level information will be displayed in the Details window. This approach is best if you are interested primarily in identifying any samples with variants in a particular gene or position.
Annotated VCF (OSCR) Result Table
VCF files, whether generated as output from Array Studio's variant-calling functions or from external tools, can be added as VcfData objects, which are analogous to Array Studio NGS Data, containing pointers to one or more VCF files. An advantage of VcfData objects is that a single VcfData object can refer to a single VCF file with merged variant data for multiple samples, or can refer to a collection of VCF files, each for one file.
VcfData objects can be annotated with Annotate VCF Data Object. Its interface is similar to Annotate Variant Table Report, but you can also import file-specific filter values, which will be included as separate columns.
Annotated VcfData Result Table
- Summarizing Variants
- Latest Tutorials