From Array Suite Wiki
Single Cell Difference Expression Analysis
Question: If I have thousands of cells in my comparison, can I use SCDE as well? Answer from SCDE team:
“The SCDE error model was really never designed with the intention of accommodating >384 cells. Actually, when there are such a large number of cells, the sophisticated error modeling really becomes overkill and unnecessary, since the law of large numbers starts playing a role. We’ve found simple Wilcox or T tests to be sufficient.”
The SCDE module in Array Studio will allow the user to run differential expression analysis with UMI counts, using the scde package in R: SCDE. This function is intended to use Single Cell UMI count data, and directly runs the R implementation of SCDE.
To open this module, please go to Analysis | NGS | Sing Cell RNA-Seq | Single Cell Differential Expression Analysis.
Input Data Requirements
This module works on -Omic data objects and Zero inflated binary matrix (ZIM) data.
User can choose to perform this analysis locally or perform this analysis on the server:
- Project & Data: The window includes a dropdown box to select the Project and Data object to be filtered.
- Variables: Selections can be made on which variables should be included in the filtering (options include All variables, Selected variables, Visible variables, and Customized variables (select any pre-generated Lists)).
- Observations: Selections can be made on which observations should be included in the filtering (options include All observations, Selected observations, Visible observations, and Customized observations (select any pre-generated Lists).
- Output name: The user can choose to name the output data object.
- Group: The user needs to specify the factor in the Group dropdown menu (this will include any columns designated as Factors in the Design Table). Please note that currently SCDE requires group names, which correspond to column names in design table, to start from alphabetic characters.
- Compare: This function allows the user to specify the level of the specified Group column for making each comparison. For instance, if user want to make comparison for "A vs B", user can define "A" here.
- CompareTo: This function allows the user to specify the level of the specified Group column for making each comparison. For comparison of "A vs B", user can define "B" here. In the current design, only one comparison allowed in this module. For instance, in an experiment with 4 time points (0,1 2, and 3hrs), if the user chose Group as time, and Compare to as 0, then user need to choose one time points from 2hrs, 3hrs and 4hrs as "Compare", and do the comparison one by one.
- Batch: a factor (corresponding to rows of the model matrix) specifying batch assignment of each cell, to perform batch correction
- Thread number: Thread number is the total number of threads to be allocated to the process. The more threads that are allocated, the faster the algorithm will run. By default, this is set to the number of CPUs on the user’s computer. This should not be set to a greater number of CPUs than available, but can be reduced at the user’s discretion.
- Multiplicity: This function specifies the multiple comparisons adjustment used for the analysis. The options include: "FDR_BH", "FDR_BY", "Bonferroni", "Sidak", "StepDownBonferroni", "StepDownSidak", "StepUp" and QValue (FDR_BH is the default option).
- Clean counts: user can choose whether to filter counts matrix based on gene and cell requirements
- Filter uncorrelated cells:
- Clean reads counts :
- minimal library size: Minimum number of genes detected in a cell. Cells with fewer genes will be removed (default: 1800)
- minimal reads per gene: Minimum number of reads per gene. Genes with fewer reads will be removed (default: 10)
- minimal cells per gene: Minimum number of cells a gene must be seen in. Genes not seen in a sufficient number of cells will be removed (default: 5)
Error model fitting parameters: parameters used when building error models for heterogeneous cell populations.
- Linear fit: whether newer linear model fit with zero intercept should be used (T), or the log-fit model published originally (F)
- Local theta fit: Boolean of whether to fit the overdispersion parameter theta, ie. the negative binomial size parameter, based on local regression (default: set to be equal to the linear.fit parameter)
- Combine all group priors: an optional factor describing grouping of different cells. If provided, the crossfits and the expected expression magnitudes will be determined separately within each group. The factor should have the same length as ncol(counts). (default: FALSE)
- Threshold segmentation: whether to use a fast threshold-based segmentation during cross-fit (default: TRUE)
- Min non-failed observations per gene: minimum number of non-failed measurements (within the k nearest neighbor cells) required for a gene to be taken into account during error fitting procedure (default: 3)
- Min read count per gene [cross gene comparison]: minimum number of reads required for a measurement to be considered non-failed (default: 4)
- Model zero count threshold: threshold to guess the initial value (failed/non-failed) during error model fitting procedure (default: 4)
- Rate of failure (zero lambda): the rate of the Poisson (failure) component (default: 0.1)
- Min number of genes: minimum number of genes to use for model fitting (default: 2000)
- Max number of cross-fit per group: maximum number of cross-fit comparisons that should be performed per group (default: 5000)
- Min cross-fit per cell: minimum number of pairs that each cell should be cross-compared with (default: 10 | min.pairs.per.cell)
- Resolution of expression magnitude grip [prior distribution]: number of points (resolution) of the expression magnitude grid (default: 400). Note: larger numbers will linearly increase memory/CPU demands.
- Bootstrap randomizations #[differential expression test]: number of bootstrap/sampling iterations that should be performed (default: 150)
The Rtsne module will generate a table and a scatter plot view for this table in ArrayStudio:
- A SCDE report table similar to DESeq Inference Report will be generated, containing fold-change and p-values for each tested variable. The default visualization, a volcano plot, will also be generated.
An example of volcano plot is shown below: