Omicsoft Affymetrix Microarray Preprocessing

From Array Suite Wiki

Jump to: navigation, search

Overview

Omicsoft Affymetrix microarray pre-processing method is designed to do single chip processing (i.e. processing 1 chip alone will give the same result as processing with 100 chips). The method is a hybrid of RMA, frozen RMA and some concepts of MAS5. It first subtracts the background using RMA, then all the perfect-match probes are normalized against the common probe distribution (created by array data from 930 GSK cell lines) similar to frozen RMA, then the linear average (trimmed mean) summarization is applied to perfect matches only, and probes are scaled to a consistent intensity for the (trimmed) mean.

This method is used to process Affymetrix CEL files for ArrayLand. Some pre-processing methods (like RMA) rely on sample-specific effects extracted on a group of samples. If we change the samples in each data processing, or add new samples, all samples have to be re-processed from CEL files. With the OMICSOFT method, analyzed samples do not have to be re-processed.

Normalization

Normalization is based on a predefined distribution function built from a GSK cell line (930 samples) study. The dataset is diverse, containing multiple tissue types. The probes are background-subtracted first, then averaged across all chips to get the empirical distribution. This empirical distribution is used as the reference in quantile normalization for all Affymetrix chips.

Summarization

Summarization is carried out in each chip using trimmed mean of normalized probe intensity values. For each probeset, trimmed mean (excluding 2% highest and 2% lowest) of probe intensity values (at linear scale) are calculated, then the mean value is convert into log2 scale.

Benchmark

By using CCLE dataset, we analyzed 1036 CEL files using RMA and Omicsoft methods, and also quantified FPKM values from 767 matched RNA-Seq samples. Figures below shows the distribution of correlations of gene expression between Affy expression processed by RMA or Omicsoft and FRPKM values from RNA-Seq samples, and distribution of correlations of probeset expression between Affy expression processed by RMA and Omicsoft.

Benchmark.png