DESeq2Test(R).pdf

From Array Suite Wiki

Jump to: navigation, search

Contents

Overview

The new DESeq v2 R implementation, which will be available starting with OmicSoft Suite v11.3, addresses the inconsistencies in output observed in the previous version (DESeq2Test.pdf#Inconsistent_Results_for_Factor_Columns).

The functionality can be accessed through Analysis | NGS | Inference | DESeq (V2) (R). The prerequisite is to have a local or a server project opened in the Solution Explorer, and a properly-configured R environment (Install_R_Packages_for_ArrayServer) with DEseq2 installed.

DESeq2 (V2) R.png

Configuration

General

Input/Output

  • The window includes a dropdown box to select the Project and Data object on which the command will be run.
  • Selections can be made on which variables should be included in the analysis (options include "all", "selected", "visible", and any pre-generated Lists).
  • Selections can also be made on which observations should be included in analysis. (options include "all", "selected", "visible", and any pre-generated Lists).

Options

If user is not familiar with General Linear Model (GLM), please also read general linear model function documentation. The Options section for the Linear Model window include 3 steps:

Step 1

Step 1, which is required, involves specifying the model. This is where the user will specify the terms of the model, main effects and cross/interaction terms:

OptionsConfigurationDESeq2R.png

  • The "Columns" section contains columns from the Data object's Design Table. If the column should be considered a Class term, a checkbox for that column can be selected. By default, Array Studio will guess on what constitutes a Class term. In general, numeric columns will not be considered Class terms by default, while other column such as "Factors", will be considered Class terms by default. Consult with a statistician if not sure as to whether a column should be a class term.
  • The "Construct Model" section is where the user can add the terms to the model. By selecting terms on the left, the user can use the Add, Cross, and Remove buttons to select the terms for that particular model. Selecting "Add" will add one or multiple terms to the model, whereas "Cross" will cross the terms selected on the left.

Clicking "OK" returns the user to the General Linear Model window, where Step 1 is now complete.

Step 2

Step 2, which is also required, involves specifying the contrasts involved. This includes any particular comparisons the user is interested in, along with the tests:

Deseq2GLM 3.jpg

The user has the option of manually building contrasts for each comparison or using the "For each" option to let Array Studio generate multiple estimates at once. In the Options section, the user can decide whether Estimates, Fold changes, Raw p-values, Adjusted p-values, Generate significant list, and Split significant list (by direction) will be created for the Inference report generated by this command.

Advanced

AdvancedConfigDESeq2R.png

  • Fit type: Either "parametric", "local", or "mean" for the type of fitting of dispersions to the mean intensity.
  • Alpha level: P-value cutoff. Default alpha level is set to be 0.05. If you checked the option to “Generate significant list” or “Split significant list” when you specified the test, then this threshold will be used to define the “significant level”.
  • Minimal replicates for replacing: This setting tells the DESeq2 algorithm when it is allowed to replace outliers with the trimmed mean value. It is the minimum amount of replicates needed before outlier status can be determined and replaced. For example, if you have 7 replicates in your dataset and the algorithm finds an outlier expression value in some gene, the outlier value will be replaced with a trimmed mean for that given gene. Afterwards, the model will be refit for differential expression prediction using these new values. You can find more details about this setting in DESeq2 manual: https://bioconductor.org/packages/release/bioc/manuals/DESeq2/man/DESeq2.pdf Note: The parameter is called minReplicatesForReplace under the DESeq function.
  • PFilterAlpha: corresponds to alpha in DESeq2:results(DESeq), the significance cutoff used for optimizing the independent filtering (by default 0.1). If the adjusted p-value cutoff (FDR) will be a value other than 0.1, alpha should be set to that value.
  • Enable fold change: If set to True, fold-change estimates that are less than 1 are applied the following transformation: -1 * (1 / fc), where fc = fold-change. Used to maintain the advantage of log-transformed fold change (uniform distances on the X-axis whether up-regulated or downregulated), while reflecting the dynamics of a linear ratio (8x upregulated or 8x downregulated is more intuitive than log2-fold change of 3 or -3)
  • Export dispersion table
  • Export outliers: Export an outlier column in the result table, which means that this gene was considered as outlier based on Cook’s distance. Genes flagged as outliers will have fold-changes, but no P-value calculations.
  • Export group means: Export a column to show mean values for each group in the result table
    • Note that group means are calculated directly from DEseq2-normalized counts, whereas the reported fold-change and estimates are computed by DEseq2 on the contrast estimates. The ratio of group means are not expected to exactly match the estimates.
  • Export max group means: Export a column to show the maximal group means for each comparison (the bigger value we can get by comparing the mean value of case group and control group), the means will be calculated based on count after normalization (same to DESeq2 normalization, we will calculate the sizeFactor for each column/observation, and then each count value will be divided by that sizeFactor for each column)
  • Export normalized gene counts: Export a table containing the per-gee normalized counts, for each sample
  • Export contrast vector table: Export a table with contrast vectors
  • Export Wald test: A Wald test for significance is provided as the default inference method.
  • Get sized factors: Export a table containing the 'size factors' used to normalize the gene counts (as computed by R-DESeq2)
  • Filter low count

Output

  • A DESeq Inference Report will be generated, containing fold-change and p-values for each tested variable. The default visualization, a volcano plot, will also be generated.

DESeqGLM3 02.png

InferenceReport VolcanoPlot.png

  • DispersionTable will be generated under the "Summary" folder.
  • DispersionScatterPlot will be generated automatically for the DispersionTable