Run DESeq2 RScript

From Array Suite Wiki

Jump to: navigation, search


Contents

Introduction

ArrayStudio users are often interested in latest version of DESeq2 package, like v1.20.*. Currently, Omicsoft implemented DESeq2 v1.10.1 to ArrayStudio, to use the latest version of DESeq2 package, users could put the demo DESeq2R_SingleColumn.rscript to **Documents\Omicsoft\RScripts\MicroArray\ folder, and run RScript from the GUI. By using this RScript, users can get identical results as from RStudio.

This page presents an ambitious Escript that will automatically read count and design from -OmicData and -NGSData from ArrayStudio GUI, and perform DESeq2 analysis.

Input Dataset

For this demonstration, we will use a test dataset GSE6976, which was randomly downloaded from GEO.

Write the RScript, and put it to the target directory

For local project, put the *.RScript file to C:\Users\WINUSER\Documents\Omicsoft\RScripts\MicroArray\

Save rscript.png

After that, the Script would be available under -OmicData -> R -> Run RScript (MicroArray Data)

Run RScript.png

In general, this R Script for Array Studio project needs to be saved in:

  • For local analysis, the .rscript files need to be put in OmicsoftHomeDirectory\RScripts\MicroArray by default.
  • For server analysis, the .rscript files need to be put in BaseDirectory\Pipeline\RScripts\MicroArray by default.

Run the RScript

Workflow of the R Script

The rscript can be found at the bottom of the page. This section will describe how to run the script.

Parameters setup

Within this PScript, there are three parameters to setup, GroupColumn, CompareTo, and CCO (CooksCutOff).

  1. GroupColumn, users could specify the column in the design table to perform the DESeq2 test.
  2. CompareTo, by default DESeq2 will pair each sub group with the first one in the selected column. As in this case, there are 4 levels of factors in the Treatment column, DESeq2 will pair each with the first one (50 mM KCL treatment for 1.5 hours), so users could set CompareTo as the target factor. in the screenshot, "untreated" was set as the target factor.
  3. CCO, cooksCutOff value, there might be some outliers (either for the whole sample or just for several genes of a particular sample), set the CCO=FALSE, DESeq2 will perform the test on outliers, raw p-values and adjust p-values will be available in the output table.

DESeq2 parameter setup.png

When the job is done, a table with sample baseMean, log2foldChange, raw p-value, and adjust p-value, will show up in the local project.

Deseq2 local job done.png


Pairwise comparison setup

Users could also specify a design column to the script, and run pairwise comparison for all possible combinations of two factor levels within that column. As shown in the model setup below, I specified PairwiseColumn as Yes, and the script will get combination from treatments column, as run DESeq2.

Deseq2 pairwise.png

The RScript

<Info>
  Description    Perform DESeq2 GLM analysis, with or without matched-pair or batch information
  Author         Jeff Du
  Created        05/09/2018
  Updated        05/14/2019 
  Requirement    Bioconductor version 3.6, or above; DESeq2 version 1.18 or later
 
<Input>
  GroupColumn =
  CompareTo = 
  ## cookscutoff TRUE or False
  CCO = FALSE
 
PairwiseColumn = No
 
<Output>
  DESeq2.R.output
 
 
 
<Script>
  dat <- input.data
  des <- input.design
 
GroupColumn = input.parameters[["GroupColumn"]]
CCO = input.parameters[["CCO"]]
CompareTo = input.parameters[["CompareTo"]]
PairwiseColumn = input.parameters[["PairwiseColumn"]]
 
print( GroupColumn )
print( CompareTo )
print( CCO )
print( PairwiseColumn )
 
## clean the rownames
rownames(des) = gsub("[^a-zA-Z0-9_.]","_",rownames(des))
rownames(des) = make.names(rownames(des),unique=TRUE)
 
## only keep rows passed from the data table to the rscript
## for example when applying a filter to the count table, there will be fewer rows;
des = des[colnames(dat),]
 
## check if DESeq2 has been installed
if(suppressWarnings(!(require(DESeq2)))){
  cat("DESeq2 pkg is not found; installing bioconductor package DESeq2...")
  source("http://bioconductor.org/biocLite.R")
  biocLite("DESeq2")
  require(DESeq2)
}
 
 
## parse cco from "FALSE" string into boolean
CCO <- eval(parse(text=CCO))
 
## check if column names and group column could be found
if(!(GroupColumn%in%colnames(des))) stop("The grouping column (case-sensitive) NOT found! ","Column names available: ", colnames(des))
## if(!(GroupColumnB%in%colnames(des))) stop("The grouping column (case-sensitive) NOT found! ","Column names available: ", colnames(des))
if(!(CompareTo%in%des[,GroupColumn])) stop("The CompareTo value (case-sensitive) NOT found! ","Possible values are: ",  unique(des[,GroupColumn]))
 
 
designColA <- des[ ,GroupColumn]
#designColB <- des[ ,GroupColumnB]
#designColA <- factor( designColA )
#designColB <- factor( designColB )
design <- data.frame(designColA)
 
#des[ , GroupColumn] <- factor( des[ ,GroupColumn] ) 
 
## levels() works on factor( col )
 
levels <- levels( factor( designColA ) )
 
##########################################################
## get all combination of two factors from the design col
pair.levels <- combn( levels, 2)
 
print( head(pair.levels) )
 
design$designColA <- factor(design$designColA) 
 
dds <- DESeqDataSetFromMatrix(round(dat), design, ~designColA)
dds <- DESeq(dds, minReplicatesForReplace=7, modelMatrixType="standard")
 
resultsNames(dds)
 
 
## initial res as the result from DESeq2 dds
res <- NULL
 
if(PairwiseColumn == "No"){
 
  print(" Run pre-defined comparisions; ")
 
  for( level in levels){
 
    if( level != CompareTo) {
 
      contrast <- c("designColA", level, CompareTo)
 
      if( is.null(res)){
 
        res <- results(dds, contrast, cooksCutoff = CCO)
 
        ## rename the colnames of current dataframe
        colnames(res) <- paste(GroupColumn, level,"vs", CompareTo, colnames(res), sep = "_")
 
      } else {
 
        res2 <- results(dds, contrast, cooksCutoff = CCO)
        colnames(res2) <- paste(GroupColumn, level, "vs", CompareTo, colnames(res2), sep = "_")
        res <- cbind(res, res2)
      }
 
      print( head( res ) )
    }
  }
 
 
 
} else {   ## pairwise column is not NULL
 
  print(" Run pairwise comparisions. ")
 
  for( col.i in 1:dim(pair.levels)[2]){
 
 
 
    contrast <- c("designColA", pair.levels[1], pair.levels[2] )
 
    if( is.null(res)){
 
      res <- results(dds, contrast, cooksCutoff = CCO)
 
      ## rename the colnames of current dataframe
      colnames(res) <- paste(GroupColumn, pair.levels[1],"vs", pair.levels[2], colnames(res), sep = "_")
 
    } else {
 
      res2 <- results(dds, contrast, cooksCutoff = CCO)
      colnames(res2) <- paste(GroupColumn, pair.levels[1], "vs", pair.levels[2], colnames(res2), sep = "_")
      res <- cbind(res, res2)
    }
 
    print( head( res ) )
 
  }
 
 
}
 
 
 
 
#res <- results(dds, cooksCutoff=CCO)
 
DESeq2.R.output <- res
 
## End
#############################################
[back to top]