Seurat.pdf

From Array Suite Wiki

Jump to: navigation, search


Contents

Seurat

Overview

Seurat is an R package developed by Satijia Lab, which gradually becomes a popular packages for QC, analysis, and exploration of single cell RNA-seq data. The Seurat module in Array Studio haven't adopted the full Seurat package, but will allow users to run several modules in Seurat package:

  • FindVariableGenes: Identifies genes that are outliers on a 'mean variability plot'. First, uses a function to calculate average expression (mean.function) and dispersion (dispersion.function) for each gene. Next, divides genes into num.bin (deafult 20) bins based on their average expression, and calculates z-scores for dispersion within each bin.
  • FindMarkers: Finds markers (differentially expressed genes) for identity classes.

This function is intended to use Single Cell UMI count data, and directly runs the Seurat in the R engine integrated with ArrayStudio.

If user haven't run Seurat in ArrayStudio before and need to set it up, please follow this wiki: R packages integration with ArrayStudio to set the up the Seurat.

To open this module, please go to Analysis | NGS | Sing Cell RNA-Seq | Seurat Marker Analysis.

Seurat01.png

[back to top]


Input Data Requirements

This module works on -Omic data objects and Zero inflated binary matrix (ZIM) data.

[back to top]


General Options

User can choose to perform this analysis locally:

Seurat02.png

Or perform this analysis on the server:

Seurat03.png

Warning.png WARNING: if user see the package compatibility is not OK, it means that the R integrated with ArrayStudio is not ready to run Seurat, please check with R packages integration with ArrayStudio to configure the Seurat in ArrayStudio


[back to top]

Input/Outputs

  • Project & Data: The window includes a dropdown box to select the Project and Data object to be filtered.
  • Variables: Selections can be made on which variables should be included in the filtering (options include All variables, Selected variables, Visible variables, and Customized variables (select any pre-generated Lists)).
  • Observations: Selections can be made on which observations should be included in the filtering (options include All observations, Selected observations, Visible observations, and Customized observations (select any pre-generated Lists).
  • Output name: The user can choose to name the output data object.


This module can be a follow up module once user have clustered cells into different sub-clusters, and generated different lists for each sub-clusters:

Seurat08.png

[back to top]


Options

  • Cell Group #1: list in the project
  • Cell Group #2: list in the project
  • Min observations per gene: numeric value, the number of detected expression in at least this many cells (default: 3)
  • Min genes per cell: numeric, include cells where at least this many genes are detected (default: 200)
  • Multiplicity: Determines the method for generating adjusted p-values. See MultiplicityAdjustment for more details of the different options.
    • UMI based counts: logical; whether this data is UMI based counts (default: FALSE)
    • Filter UMI counts: logical; whether to perform the filtering for UMI counts (default: FALSE); The details about filtering criteria can be found in the Advanced option.
    • LogNormalize cell counts: logical; whether to do log normalization for the cell counts (default: TRUE)
    • Identify marker genes: logical; whether to identify marker genes (default: TRUE)
    • Identify high variable genes: logical; whether to identify high variable genes (default: FALSE)

Advanced Options

Seurat04.png


  • Filter UMI:
    • Low: Low cutoffs for filtering UMI (default: 200)
    • High: High cutoffs for filtering UMI (default: 20,000,000)


  • Marker Genes:
    • Test method: Denotes which test to use. Default is to use wilcox. Available options are:
      • "wilcox" : Wilcoxon rank sum test (default)
      • "bimod" : Likelihood-ratio test for single cell gene expression, (McDavid et al., Bioinformatics, 2013)
      • "roc" : Standard AUC classifier
      • "t" : Student's t-test
      • "tobit" : Tobit-test for differential gene expression (Trapnell et al., Nature Biotech, 2014)
      • "poisson" : Likelihood ratio test assuming an underlying poisson distribution. Use only for UMI-based datasets
      • "negbinom" : Likelihood ratio test assuming an underlying negative binomial distribution. Use only for UMI-based datasets
      • "MAST : GLM-framework that treates cellular detection rate as a covariate (Finak et al, Genome Biology, 2015)
      • "DESeq2 : DE based on a model using the negative binomial distribution (Love et al, Genome Biology, 2014)
    • LogFoldChange cutoff: Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells. Default is 0.25 Increasing logfc.threshold speeds up the function, but can miss weaker signals.
    • Min # of cells per group: Minimum number of cells in one of the groups
    • Min cell percentage per group: only test genes that are detected in a minimum fraction of this number of cells in either of the two populations. Meant to speed up the function by not testing genes that are very infrequently expressed. Default is 0.1
    • Min # of cells per gene: Minimum number of cells expressing the gene in at least one of the two groups, currently only used for poisson and negative binomial tests


  • High Variable Genes:
    • XLow: Bottom cutoff on x-axis for identifying variable genes
    • YLow: Bottom cutoff on y-axis for identifying variable genes
    • XHihg: Top cutoff on x-axis for identifying variable genes
    • YHihg: Top cutoff on y-axis for identifying variable genes
    • # of Bins: Total number of bins to use in the scaled analysis (default is 20)


  • Export group means: Export a column to show mean values for each group in the result table
  • Export maximal group means per contrast: Export a column to show the maximal group means for each comparison (the bigger value we can get by comparing the mean value of case group and control group), the means will be calculated based on count after normalization
[back to top]


Output Results

Under the Inference folder, the Seurat module will generate a table and a volcano plot view for this table in ArrayStudio:

Seurat05.png

An example of volcano is shown below:

Seurat06.png

If user have checked the option to Identify high variable genes, and if there are indeed high variable genes detected from the dataset, there will be another table report outside of inference folder named as SeuratHVGTest:

Seurat07.png

[back to top]


OmicScript

Seurat.oscript

[back to top]


Related Articles

EnvelopeLarge2.png

[back to top]