One Color Array Getting Started

From Array Suite Wiki

Jump to: navigation, search

The easiest way to get started with MicroArray analysis is to work with One-color array datasets. Array Studio provides MicroArray work flow which goes through the following steps.


Add Data

MicroArray data object includes expression data (a table of expression levels for each variable in different samples), design table (description of samples, such as conditions and groups), annotation (annotation for variables).

Add Expression Data

OmicSoft’s product is able to manage most of the commonly used expression data formats, including:

OmicSoft also provides interface with GEO Series and ArrayExpress Experiment. User can easily download and analyze datasets from GEO or ArrayExpress using valid dataset IDs.

More information can be found here: Add Expression Data

Attach Design Table

Design table contains the information about experiment design and should include samples and experiment conditions. Accepted input files can be Excel file, SAS dataset file or text file delimited by tab, space, comma, colon, slash, backslash, semicolon or custom delimiters.

Sample design table (2*2 factorial design):


More information can be found here: Attach Design Table

Import Annotation

OmicSoft provides annotation files for many different experiment platforms and will automatically attach annotation to variable list. If it is not attached, user can look up for existing annotation files (named by platform) or upload customized annotation file.

More information can be found here: Attach Annotation


The imported expression data is usually not ready for statistical tests. User should take into account issues such as missing values and skewness. Preprocess modules include:

Allows filtering of both variables and observations by a variety of criteria (mean, median, etc.), and allows user to group by annotation column or design column and return results if any or all groups meet the criteria.
Allows user to impute missing values using methods such as FixedNumber, FixedPercentile, RowPercentile, ColumnPercentile, RowAverage, or KNN.
Can transform expression values to different scale (Log2, Exp2, Log10, Exp10, Log, Exp, etc.), or add/multiply a constant. Numeric design columns (sample information like dose and time) can be used for transformations (Divide, Divided by, Multiply, Add, Subtract, Subtracted by).
For non-Affymetrix datasets and/or data imported but not normalized previously, the Normalization module allows normalization on observations and variables using a variety of methods, including: Center, Scale, Zscore, Quantile, and Lowess (for two-color arrays) normalization. Robust methods can be used, as well as an invariant set.
User can combine either design columns (for observations) or annotation columns (for variables), using a variety of summarization methods (i.e mean, median, min, max, etc.). One common use of this is creating a combined dataset by Gene Symbol, to eliminate multiple probesets, features, etc., for a single gene.

Quality Control

Quality Control modules enables user to get better ideas about the quality of their data. User can view outliers and inconsistent data trends every easily. User can exclude samples if they fail QC modules. OmicSoft provides Correlation-based QC, Model-Based Outlier Detection and Principal Component Analysis for MicroArray data. Alternatively, user can use QC Wizard to run multiple QC modules simultaneously, without having to go through each individual menu.

Statistical Inference

Inference modules can help user find variables whose expression levels are significantly influenced by experimental conditions (e.g. treatment, genotype). OmicSoft provides standard tests such as One-way ANOVA, Two-way ANOVA, General Linear Model and some more complicated tests such as Cox Model and Multiple Group Test.

Pattern Recognition

Identifying the differentially expressed genes (or transcripts) is not enough. One of the core goals of MicroArray analysis is to explore expression patterns of variables or observations which might be biologically related and can help discover biologically meaningful knowledge. The Pattern Recognition modules allow user to find neighbors for a variable or observation based on correlation, and to run Hierarchical Clustering or NMF Clustering on up to 10,000 variables and/or observations.