One Color Array Getting Started
From Array Suite Wiki
The easiest way to get started with MicroArray analysis is to work with One-color array datasets. Array Studio provides MicroArray work flow which goes through the following steps.
MicroArray data object includes expression data (a table of expression levels for each variable in different samples), design table (description of samples, such as conditions and groups), annotation (annotation for variables).
Add Expression Data
OmicSoft’s product is able to manage most of the commonly used expression data formats, including:
- Tab (or comma) delimited file: variable (e.g. gene) per row
- Tab (or comma) delimited file: observation (e.g. sample) per row
- Excel file: variable (e.g. gene) per row
- Excel file: observation (e.g. sample) per row
- Affymetrix .CEL files (3’ IVT or Gene Arrays)
- Affymetrix .CHP files (3’ IVT or Gene Arrays)
- Illumina expression report file
- Nimblegen expression PAIR report files
- GenePix result files
- Agilent text files
- MAGE-ML files
- Nano String RCC files
- Tab delimited files: large variable per row data
More information can be found here: Add Expression Data
Attach Design Table
Design table contains the information about experiment design and should include samples and experiment conditions. Accepted input files can be Excel file, SAS dataset file or text file delimited by tab, space, comma, colon, slash, backslash, semicolon or custom delimiters.
Sample design table (2*2 factorial design):
More information can be found here: Attach Design Table
OmicSoft provides annotation files for many different experiment platforms and will automatically attach annotation to variable list. If it is not attached, user can look up for existing annotation files (named by platform) or upload customized annotation file.
More information can be found here: Attach Annotation
The imported expression data is usually not ready for statistical tests. User should take into account issues such as missing values and skewness. Preprocess modules include:
- Allows filtering of both variables and observations by a variety of criteria (mean, median, etc.), and allows user to group by annotation column or design column and return results if any or all groups meet the criteria.
- Allows user to impute missing values using methods such as FixedNumber, FixedPercentile, RowPercentile, ColumnPercentile, RowAverage, or KNN.
- Can transform expression values to different scale (Log2, Exp2, Log10, Exp10, Log, Exp, etc.), or add/multiply a constant. Numeric design columns (sample information like dose and time) can be used for transformations (Divide, Divided by, Multiply, Add, Subtract, Subtracted by).
- For non-Affymetrix datasets and/or data imported but not normalized previously, the Normalization module allows normalization on observations and variables using a variety of methods, including: Center, Scale, Zscore, Quantile, and Lowess (for two-color arrays) normalization. Robust methods can be used, as well as an invariant set.
- User can combine either design columns (for observations) or annotation columns (for variables), using a variety of summarization methods (i.e mean, median, min, max, etc.). One common use of this is creating a combined dataset by Gene Symbol, to eliminate multiple probesets, features, etc., for a single gene.
Quality Control modules enables user to get better ideas about the quality of their data. User can view outliers and inconsistent data trends every easily. User can exclude samples if they fail QC modules. OmicSoft provides Correlation-based QC, Model-Based Outlier Detection and Principal Component Analysis for MicroArray data. Alternatively, user can use QC Wizard to run multiple QC modules simultaneously, without having to go through each individual menu.
Inference modules can help user find variables whose expression levels are significantly influenced by experimental conditions (e.g. treatment, genotype). OmicSoft provides standard tests such as One-way ANOVA, Two-way ANOVA, General Linear Model and some more complicated tests such as Cox Model and Multiple Group Test.
Identifying the differentially expressed genes (or transcripts) is not enough. One of the core goals of MicroArray analysis is to explore expression patterns of variables or observations which might be biologically related and can help discover biologically meaningful knowledge. The Pattern Recognition modules allow user to find neighbors for a variable or observation based on correlation, and to run Hierarchical Clustering or NMF Clustering on up to 10,000 variables and/or observations.