Normalize.pdf

From Array Suite Wiki

Jump to: navigation, search

Contents

Normalize

Overview

[back to top]

The Normalize command will normalize variables or observations based on linear or quantile normalizations. It works on “Microarray” data types. It generates a new "Microarray" data type, or will overwrite the old one. Users can also specify a new name for the new one to keep the old one.

Input Data Requirements

It works on -Omic data types.

To run this module, type MicroArray | Preprocess | Normalize.

Normalize menu.png

General Options

Normalize1.png

[back to top]

Input/Outputs

[back to top]


  • Project & Data: The window includes a dropdown box to select the Project and Data object to be filtered.
  • Variables: Selections can be made on which variables should be included in the filtering (options include All variables, Selected variables, Visible variables, and Customized variables (select any pre-generated Lists)).
  • Observations: Selections can be made on which observations should be included in the filtering (options include All observations, Selected observations, Visible observations, and Customized observations (select any pre-generated Lists).
  • Output name: The user can choose to name the output data object.


Options

  • Normalization method - Center, CenterTo0, Scale, ScaleTo1, ZScore, Quantile, Lowess, QSpline, CenterQuantile and ScaleQuantile normalizations (for more details on the different normalizations, see below).
  • For every normalization option (except Lowess and QSpline), the user has the choice to normalizing observations or normalizing variables.
    • In most cases, the user will wish to normalize the observations, as most comparisons are done between chips.
    • If the user wishes to make genes comparable, it may be a valid option to normalize the variables. However, in Array Studio, for algorithms that may require variable normalization (i.e. hierarchical clustering, using Pearson for "Distance"), this normalization will be done by the program.
  • Quantile/Target - Used with the CenterQuantile and ScaleQuantile normalization methods.
  • robust method: If this box is checked, robust statistics are used to replace regular statistics (i.e. mean is replaced by median, SD is replaced by MAD).
  • Use a subset to calculate normalization factor: the user can select a pre-existing list to use in normalization. Clicking on the Select button will display the available lists to choose from.
  • The user has the choice of using an invariant set, and setting the percentage of variables that should be used to make up the invariant set (default percentage is 80%).
    • Without using an invariant set, the assumption is that most of the genes are equally expressed in the compared samples, and that the proportion of the differentially expressed genes is low. So, it'd be appropriate to use an invariant set in situations where the user is comparing highly heterogeneous samples (i.e. different tissues, etc.).
    • The invariant set is calculated by ranking the genes in each chip according to their expression level, and finding genes with the smallest change in ranks. For those probesets not in the invariant set, a simple linear interpolation based on two close data points is used when doing the quantile normalization: (http://en.wikipedia.org/wiki/Linear_interpolation).
[back to top]
Different normalization methods
Center/Center to 0

A single normalization factor is computed for balancing chips, normalizing to the median among samples. Adding/subtracting intensities by this factor equalizes the mean (or median intensity) among compared chips. This method should generally be used on logged data. (Centering to 0 will bring the means to 0).

Scale/Scale to 1

A single normalization factor is computed for balancing chips. Multiplying intensities by this factor equalizes the mean (or median intensity) among compared chips. This method should generally be used with original scale (unlogged) data.

Z-Score

A Z-score transformation is applied (raw intensity - mean intensity)/standard deviation to either the observations or variables. This can be used as a method to compare observations or variables from a wide range of different experiments.

Quantile Normalization

While the global normalization (i.e. scaling, including Scale Quantile) forces the observations or variables to have equal intensities for their mean/median/specified target quantile, this Quantile Normalization forces the observations/variables to have identical intensity distribution.

With this method, intensities are sorted for each observation/variable, the mean intensity at each rank is calculated, and then the intensity is replaced by the mean intensity at its rank.

Quantile normalization is the recommended method for most microarray normalizations.

Lowess

Can only be used for two-color experiments. Requires a design column for ArrayID (with ColumnMode set to ArrayID), and a design column for Channel (with ColumnMode set to Channel). Lowess is used with two-color experiments to help account for imbalances between red and green intensities. The Lowess method first estimates the mean value of the log2(ratio) as a function of the log2(intensity). It then corrects systematic deviations in the R-I or MA plot by carrying out a local weighted linear regression as a function of the log2(intensity) and subtracting the calculated best-fit average log2ratio) from the experimentally observed ratio for each data point.

If the user has imported the data from *gpr files, and has imported two channels, special ArrayID, and Channel columns will be created automatically in the Design Table. In addition, these columns have their Column Mode set to ArrayID and Channel, respectively.

If the user attempts to do Lowess normalization, Array Studio uses the ArrayID and Channel information normalize each chip. For any outside imported data, the user can manually create these two columns, provided that the Column Mode is also set properly. To set the column modes, go to the Table Menu | Column | Column Properties and change value for ColumnMode for each column of interest.

QSpline

A simple and robust non-linear method for normalization using array signal distribution analysis and cubic splines. These methods compared favorably to normalization using robust local-linear regression (lowess). The application of these methods to oligonucleotide arrays reduced the relative error between replicates by 5-10% compared with a standard global normalization method.

CenterQuantile

This is a scaling method is different from "full" Quantile normalization, in that it will subtract a normalization factor (quantile value - target value) from the original value so that the specified quantile will have the target value. It is similar to Center normalization, except that the specified Quantile will be used instead of the mean/median. This method should generally be used on logged data.

CenterQuantile = Original Value - (quantile value - targetValue)

Quantile (default value = 75)

Target (default value = 10)

ScaleQuantile

The "ScaleQuantile" method will divide a normalization factor (quantile value / targetValue) so that the specified quantile will have the target value. It is similar to Scale normalization, except that the specified Quantile will be used instead of the mean/median. This method should generally be used on unlogged data.

ScaleQuantile = Original Value / (quantile value / targetValue)

Quantile (default value = 75)

Target (default value = 10)

Tips.pngIf your data is already log based, you most likely will use CenterQuantile rather than the ScaleQuantile.


Output Results

[back to top]

The Output type will either be set to Change input data, in which case the original Data object will be permanently changed, or if the user enters a name in the Output name field, the Output type will switch to Normalized Microarray Data, and a new Data object will be created in the Solution Explorer.

Warning.png WARNING: If users don't specify output name, the original MicroArray data would be overwritten by the new normalized data.


Related Articles

[back to top]