RtsneV2.pdf
From Array Suite Wiki
(→Advanced Options) |
(→Advanced Options) |
||
(5 intermediate revisions by one user not shown) | |||
Line 35: | Line 35: | ||
===Options=== | ===Options=== | ||
− | * ''' | + | |
− | * ''' | + | * '''Preprocessing''': |
− | * ''' | + | ** '''Log2 Transformation''': logical; by default Log2 Transformation will be checked |
− | * ''' | + | ** '''Filter raw data''': logical; by default Filter raw data will be checked |
− | * ''' | + | ** '''Save filtered data''': logical; by default save filter data will be checked; |
− | * ''' | + | ** '''UMI''': logical; Users could choose UMI or not by their input data; |
− | * '''Check duplicates''': logical; Checks whether duplicates are present. We generally assume that there is no duplicates. User can double check to see if duplicates present and set this option to FALSE, especially for large datasets. (default: FALSE) | + | ** |
− | + | ** '''Log2 TPM cut off''': numeric; a threshold corresponding to Log2 Transformation, it is set at 0.01 by default | |
+ | ** '''Min observations per gene''': numeric; threshold to filter outlier genes with few observation counts (default: 3) | ||
+ | ** '''Min genes per cell''': integer; threshold of cells with minim gene counts (default: 200) | ||
+ | ** '''Check duplicates''': logical; Checks whether duplicates are present. We generally assume that there is no duplicates. User can double check to see if duplicates present and set this option to FALSE, especially for large datasets. (default: FALSE). | ||
+ | |||
* '''PCA settings''' : | * '''PCA settings''' : | ||
** '''initial PCA dimensions''': integer; the number of dimensions that should be retained in the initial PCA step (default: 50) | ** '''initial PCA dimensions''': integer; the number of dimensions that should be retained in the initial PCA step (default: 50) | ||
** '''Center data before PCA''': logical; Should data be centered before pca is applied? (default: TRUE) | ** '''Center data before PCA''': logical; Should data be centered before pca is applied? (default: TRUE) | ||
** '''Scale data before PCA''': logical; Should data be scaled before pca is applied? (default: FALSE) | ** '''Scale data before PCA''': logical; Should data be scaled before pca is applied? (default: FALSE) | ||
+ | ** '''Partial PCA''': | ||
+ | ** '''Run initial PCA''': logical; Whether an initial PCA step should be performed (default: TRUE) | ||
+ | |||
{{Warning|if user see the '''package compatibility''' is not OK, it means that the R integrated with ArrayStudio is not ready to run Rtsne, please check with [[Analytic_Module_Versions#Rtsne|R implementation of t-SNE]] to configure the Rtsne in ArrayStudio}} | {{Warning|if user see the '''package compatibility''' is not OK, it means that the R integrated with ArrayStudio is not ready to run Rtsne, please check with [[Analytic_Module_Versions#Rtsne|R implementation of t-SNE]] to configure the Rtsne in ArrayStudio}} | ||
{{BackToTop}} | {{BackToTop}} | ||
Line 53: | Line 60: | ||
[[image:tsne_v2_advanced.png]] | [[image:tsne_v2_advanced.png]] | ||
− | * '''Kmean cluster number lower/upper bound''': Indicate the minimal and maximal cell clusters. Once clustering is performed, cells will automatically be assigned according to kmeans identity | + | * '''tSNE''' |
− | * '''Stop lying iteration number''': integer; Iteration after which the perplexities are no longer exaggerated (default: 250, except when Y_init is used, then 0) | + | ** '''Dimesion''': integer; Output dimensionality (default: 2) |
− | * '''Moment switch iteration number''': integer; Iteration after which the final momentum is used (default: 250, except when Y_init is used, then 0) | + | ** '''Perplexity''': numeric; Perplexity parameter |
− | * '''Momentum''': numeric; Momentum used in the first part of the optimization (default: 0.5) | + | ** '''Theta''': numeric; Speed/accuracy trade-off (increase for less accuracy), set to 0.0 for exact TSNE (default: 0.5) |
− | * '''Final Momentum''': numeric; Momentum used in the final part of the optimization (default: 0.8) | + | ** '''Max iteration''': integer; Number of iterations (default: 1000) |
− | * '''Eta''': numeric; Learning rate (default: 200.0) | + | ** |
− | * '''Exaggeration factor''': numeric; Exaggeration factor used to multiply the P matrix in the first part of the optimization (default: 12.0) | + | ** '''Kmean cluster number lower/upper bound''': Indicate the minimal and maximal cell clusters. Once clustering is performed, cells will automatically be assigned according to kmeans identity |
+ | ** '''Stop lying iteration number''': integer; Iteration after which the perplexities are no longer exaggerated (default: 250, except when Y_init is used, then 0) | ||
+ | ** '''Moment switch iteration number''': integer; Iteration after which the final momentum is used (default: 250, except when Y_init is used, then 0) | ||
+ | ** '''Momentum''': numeric; Momentum used in the first part of the optimization (default: 0.5) | ||
+ | ** '''Final Momentum''': numeric; Momentum used in the final part of the optimization (default: 0.8) | ||
+ | ** '''Eta''': numeric; Learning rate (default: 200.0) | ||
+ | ** '''Exaggeration factor''': numeric; Exaggeration factor used to multiply the P matrix in the first part of the optimization (default: 12.0) | ||
+ | ** '''Set R Random Seed''': binary then integer; Set seed for repeat; | ||
+ | ** '''Color by''': drop down options based on factors in a column from design table; | ||
+ | |||
+ | * '''Subset dataset by sample mapping file ''' | ||
+ | ** '''Sample mapping file ''': | ||
+ | ** '''Sample metadata column ''': | ||
+ | ** '''Sample metadata column value''': | ||
+ | ** '''Append sample metadata ''': | ||
==Output Results== | ==Output Results== |
Latest revision as of 10:30, 12 September 2019
Contents |
t-SNE V2 clustering
Overview
The Rtsne module in Array Studio will allow the user to cluster different cells with UMI counts, using the Rtsne package in R: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation. t-SNE is a method for constructing a low dimensional embedding of high-dimensional data, distances or similarities. Nowadays, t-SNE has been a typical method to cluster different subgroup of cells in the process of analyzing Single Cell sequencing data. This function is intended to use Single Cell UMI count data, and directly runs the Rtsne in the R engine integrated with ArrayStudio.
If user haven't run Rtsne in ArrayStudio before and need to set it up, please follow this wiki: Setup tSNE in R engine to set the Rtsne up.
To open this module, please go to Analysis | NGS | Sing Cell RNA-Seq | t-SNE Clustering | t-SNE Clustering (V2) .
Input Data Requirements
This module works on -Omic data objects and Zero inflated binary matrix (ZIM) data.
General Options
User can choose to perform this analysis locally:
Or perform this analysis on the server:
Note. the Perplexity value should be less than (observations -1 )/3.
Input/Outputs
- Project & Data: The window includes a dropdown box to select the Project and Data object to be filtered.
- Variables: Selections can be made on which variables should be included in the filtering (options include All variables, Selected variables, Visible variables, and Customized variables (select any pre-generated Lists)).
- Observations: Selections can be made on which observations should be included in the filtering (options include All observations, Selected observations, Visible observations, and Customized observations (select any pre-generated Lists).
- Output name: The user can choose to name the output data object.
Options
- Preprocessing:
- Log2 Transformation: logical; by default Log2 Transformation will be checked
- Filter raw data: logical; by default Filter raw data will be checked
- Save filtered data: logical; by default save filter data will be checked;
- UMI: logical; Users could choose UMI or not by their input data;
- Log2 TPM cut off: numeric; a threshold corresponding to Log2 Transformation, it is set at 0.01 by default
- Min observations per gene: numeric; threshold to filter outlier genes with few observation counts (default: 3)
- Min genes per cell: integer; threshold of cells with minim gene counts (default: 200)
- Check duplicates: logical; Checks whether duplicates are present. We generally assume that there is no duplicates. User can double check to see if duplicates present and set this option to FALSE, especially for large datasets. (default: FALSE).
- PCA settings :
- initial PCA dimensions: integer; the number of dimensions that should be retained in the initial PCA step (default: 50)
- Center data before PCA: logical; Should data be centered before pca is applied? (default: TRUE)
- Scale data before PCA: logical; Should data be scaled before pca is applied? (default: FALSE)
- Partial PCA:
- Run initial PCA: logical; Whether an initial PCA step should be performed (default: TRUE)
WARNING: if user see the package compatibility is not OK, it means that the R integrated with ArrayStudio is not ready to run Rtsne, please check with R implementation of t-SNE to configure the Rtsne in ArrayStudio
Advanced Options
- tSNE
- Dimesion: integer; Output dimensionality (default: 2)
- Perplexity: numeric; Perplexity parameter
- Theta: numeric; Speed/accuracy trade-off (increase for less accuracy), set to 0.0 for exact TSNE (default: 0.5)
- Max iteration: integer; Number of iterations (default: 1000)
- Kmean cluster number lower/upper bound: Indicate the minimal and maximal cell clusters. Once clustering is performed, cells will automatically be assigned according to kmeans identity
- Stop lying iteration number: integer; Iteration after which the perplexities are no longer exaggerated (default: 250, except when Y_init is used, then 0)
- Moment switch iteration number: integer; Iteration after which the final momentum is used (default: 250, except when Y_init is used, then 0)
- Momentum: numeric; Momentum used in the first part of the optimization (default: 0.5)
- Final Momentum: numeric; Momentum used in the final part of the optimization (default: 0.8)
- Eta: numeric; Learning rate (default: 200.0)
- Exaggeration factor: numeric; Exaggeration factor used to multiply the P matrix in the first part of the optimization (default: 12.0)
- Set R Random Seed: binary then integer; Set seed for repeat;
- Color by: drop down options based on factors in a column from design table;
- Subset dataset by sample mapping file
- Sample mapping file :
- Sample metadata column :
- Sample metadata column value:
- Append sample metadata :
Output Results
The Rtsne module will generate a table and a scatter plot view for this table in ArrayStudio:
An example of TsneScoreTable is shown below:
An example of scatter plot with the two principle component defined by Rtsne is shown below. Each data point represents a cell:
Additional Options
Once the scatter plot is generated, user can try to manually select cells that belongs to the same cluster, and add a list name to these clusters:
If all of the cells have been assigned a list name based on their distribution in the scatter plot, user can select all the lists defined from this scatter plot and right click to choose to add the list membership to the original TsneScoreTable:
Then user can go to the scatter plot, and choose to Change Symbol Properties, and color the plot by Categorical value, and set the newly added ListMembership:
With this operation, user can see that different colors can be assigned to each cluster:
OmicScript