Getting Started with Microarray Analysis
From Array Suite Wiki
Getting Started with Array Studio Microarray Analysis
The microarray tutorial is a great starting point for new users of Array Studio, whether or not you will be working directly with microarray data. Array Studio was initially designed for microarray analysis, and this tutorial will cover many of the Array Studio data types, data manipulations, Views, and analyses that are commonly used in other workflows.
Feature-level data, such as gene or transcript FPKM measurements from RNA-seq, and RT-PCR abundance, can utilize Array Studio analysis modules initially designed for microarray analysis. Before completing those tutorials, walking through the microarray tutorial will help you get the most out of Array Studio.
Create a new Array Studio Microarray project
Array Studio organizes projects in the Solution Explorer. After you create a new project, Array Studio will guide you through importing your expression microarray datasets, starting with .cel files or other array formats.
- Add a new Project to the Solution Explorer [00:30]
- Import Microarray data to your project [1:30]
- Attach a Design Table to your -Omic Data [03:50]
- Rename your OmicData object[04:15]
- The Annotation Table [04:48]
- The Design Table [04:56]
- The Mas5 QC report [05:00]
Array Studio Data Types
Array Studio stores your data in four main data types. This video briefly explains the differences between data types, and how to convert between the most common data types, table and -Omic data.
- List Data [00:05]
- Table Data [00:20]
- TableLand Data [01:33]
- -Omic Data [02:35]
- -Omic Annotation Tables [02:52]
- -Omic Design Tables [03:03]
- Convert Table to -Omic [03:20]
- Convert -Omic to Table [03:49]
- NGS Data [04:12]
- Metadata are automatically "inherited" from source data [04:33]
Track Your Analysis Steps with the Audit Trail
Array Studio provides several methods to reproduce analysis steps. Omicsoft scripts (Oscript) for analysis functions can be viewed in every function window, by right-clicking on an object name, or by viewing the full Audit Trail.
- Viewing Oscript in Module Windows [00:25]
- View the Oscript that generated a data object [01:23]
- View the full chain of analysis in the Audit Trail [01:42]
Preparing Your Data for Downstream Analysis
Edit -Omic Data Design Tables
-Omic data are not directly editable, but their attached Annotation and Design metadata are. For example, if your Design table contains Time and Treatment columns, it might be convenient to have a column containing "Treatment.Time" to enable more efficient grouping of samples.
- The Combine Columns function [00:29]
- The Delete Columns function [01:17]
QC by PCA and Removing Failed Samples
Array Studio contains modules to identify samples that deviate significantly from the rest of the data set, possibly indicating a failed sample that should be excluded from downstream analysis.
Principal Component Analysis (PCA) can identify variance in data sets, which can come from real differences between sample groups, or it can come from a failed microarray chip. Failed experiments can quickly be removed from your -Omic data objects for downstream analysis.
- The Principal Component Analysis module [00:38]
- Two Component PCA [01:18]
- Three Component PCA [02:15]
- Select samples to exclude [03:10]
- Run a module on a subset of samples [03:46]
- Using a List object [03:57]
- Using a new -Omic object [04:20]
QC by Correlation of Expression
Array Studio can identify samples that deviate significantly from others in your data set, by calculating the correlation coefficient of each gene/probeset. Samples that correlate unusually poorly will be flagged as possible failed samples, and can be excluded from downstream analysis.
- The correlation-Based QC module [00:05]
- Excluding a failed sample [01:23]
- Summarizing data by Pairwise Correlation [01:47]
- Grouped Correlation [02:20]
- Ungrouped Correlation [03:02]
- Adding Color Bars to a Heatmap [03:45]
Visualize Data with Array Studio Views
-Omic Data are read-only data constructs. The most common way to explore -Omic data is to add "Views" onto your data, which are ways to visualize the underlying data. Depending on the contents, different Views are available. You can apply filters to your Views without fear of losing data.
The Table View
The most common way to look at your -Omic data is with the Table View. Although it looks like a standard spreadsheet, the Table View is actually a visualization of your underlying data. It is dynamically connected to attached annotation and design metadata, and can be sorted and filtered without worry of altering the underlying data.
- Adding a Table View to your Array Studio data [00:15]
- Sorting and Filtering Table Views [00:33]
- Display context-specific details from metadata [01:15]
- Converting read-only -Omic data to editable Table data [01:23]
- Log2-transform your expression data [02:25]
- Visualize distribution of expression values with Kernel Density [03:12]
- Web Details On-Demand [04:08]
- Changing the Table View in the View Controller [04:31]
- Filtering Table View data by metadata [05:02]
Adding Additional Views: The Variable View and Scatter Plot
Depending on the contents of your -Omic data or table, Array Studio has about 40 views to interactively display your data. This video briefly walks through some of the more popular Views for Gene-level data; the Variable View and Pairwise Scatter Plot.
- Add a View to your data [00:28]
- The Variable View [00:45]
- Interacting with your data through Views [01:25]
- Customizing Views with the View Controller [01:58]
- Group and color the samples by Design metadata [02:10]
- Change the Variable View to a Violin Plot [04:50]
- Filtering Views by Annotation metadata [05:05]
- Display statistical summary information [05:25]
- Zooming into Views [06:25]
- Change the Variable View to a Bar Plot [07:00]
- Add a Pairwise Scatter Plot [07:15]
- Heatmap View is demonstrated in Hierarchical Clustering
- Venn Diagram View is demonstrated in ANOVA
Statistical Inference and Pattern Discovery
Microarray expression data can be clustered by observation and/or variable in a Heatmap and dendrogram, and you can directly interact with the Views to discover co-regulated genes.
- Run the Hierarchical Clustering module [00:10]
- "Classic Heatmap" output of Hierarchical Clustering [01:00]
- "Modern Heatmap" output [01:35]
- Select heatmap rows matching a probeset name [02:28]
- Select heatmap rows by list [02:54]
Pattern Matching to identify similar Gene Expression Dynamics
You can search datasets for variables/observations with similar pattern to your variable/observation of interest. You can display these comparisons in multiple ways, including pairwise correlation/MA plots, heatmaps, and 3D scatter plots.
- The Find Neighbors module [00:27]
- Heatmap output of Find Neighbors [01:40]
- Variable Pairwise Scatter Plot [02:30]
- Filter by List Variable to limit listed probesets in "Specify Columns" [03:20]
- Modify Scatter Plot display [03:56]
- The 3D Variable View [04:30]
Discover Differentially-Expressed Genes by ANOVA
- Run ANOVA on your data [00:25]
- ANOVA output report and Volcano Plot [01:30]
- Exploring and filtering the ANOVA Volcano Plot [02:25]
- Filter "Match All" vs. "Match Any" [04:10]
- Venn Diagram comparison of ANOVA results by timepoint [05:18]
- Summarize your Inference Report [06:25]
Identify Enriched Gene Ontology Terms
If you are interested in discovering pathways or functionally related genes that are enriched in your data, you can run the Gene Ontology (GO) module. For this module to work, your annotation table must have metadata columns containing GO identifiers.
- Annotation Table requirements for GO analysis [00:17]
- List requirements for GO analysis [00:38]
- Run the Gene Ontology module [00:54]
- GO Analysis output [02:50]