Introduction to SCUmiMouse Land Content

From Array Suite Wiki

Jump to: navigation, search


SCUmiMouse_B37 Land

In additional to curated DiseaseLand studies with standard RNA-seq and microarray experiments, subscribers to DiseaseLand have access to Lands with single-cell RNA-seq data. SCUmiMouse_B37 Land is part of this collection, focusing on data derived from studies that examine single cell populations from a number of categories found in our DiseaseLand Collection. Single-cell RNA-Seq experiments are available in many different technologies, we distinguish data for non-UMI studies and UMI studies in our Lands:

Species non-UMI Land UMI Land Reference GeneModel
Mouse SCMouse_B37 SCUmiMouse_B37 Mouse.B37 Ensembl.R78

SCHuman_B37 has a heavy focus on publicly available RNA-Seq expression data, and offers the potential to look at gene expression, all processed through the same pipeline, across many different projects, with the additional value of providing visualizations and functions. At Omicsoft, thanks to our experienced data curation and processing team, we have a systematic method for data curation. Refer to our Curation Pipeline for details.

Samples in single-cell Lands are split between the UMI and non-UMI lands based on project information/data processing. Generally, any single-cell RNA-Seq project in which individual cells have been barcoded and contain Unique Molecular Identifiers (UMI) as shown here, will be found in the UMI lands. This includes data from platforms such as DropSeq/10X Genomics. These samples have an inherent 3' bias and are thus processed and analyzed differently than non-UMI lands, which focus on projects in which RNA-Seq was performed on samples from other single cell populations (i.e. SMART-Seq). Most of the samples from these lands are from single cells, however, some samples have 10, 100, or 1000 cells or are bulk samples (annotated as “population” in the CellNumber property). They are mostly used for benchmarking or comparison purpose in selective projects. You can use “CellNumber” property to filter the samples if you would like to single them out. And we have filtered out samples with “0” CellNumber annotation as low quality samples.

Data Source


Data Types

  • RNA-Seq data

Laboratory Methods

Refer to individual projects' clinical metadata for details of how data were generated.

Processing Methods

RNA-Seq data: OmicScript Pipeline and Building Land From RNA-Seq Data

Expression is normalized as Reads per million (TPM) for non-UMI Lands and Reads per million (RPM) for UMI lands. To ensure only high quality data is incorporated into Single Cell Lands, we use the following criteria:

  • MitochondrialRate: < 20%
  • Alignment Mapped Reads: >= 1000
  • RNASeqMappedRate: (Human >=0.4; Mouse >=0.3)
  • Coverage_GeneWithCoverage: >=250

Key Meta Data Columns

SCUmiMouse Land data is curated at the sample and project level, with hundreds of meta data columns available.

Sample level:

  • DiseaseCategory (controlled vocabulary) : Disease category of the sample based on the details disease state. (Primary Grouping column)
  • TissueCategory (controlled vocabulary) : Tissue category such as skin, muscle, heart, kidney etc. (Secondary Grouping column)
  • DiseaseState (controlled vocabulary) : Curated at sample level from each project.
  • SampleSource (controlled vocabulary) : Either cell type or tissue information. When a sample has cell type information, cell type is used. Otherwise, tissue category is used.
  • CellNumber : Indicates number of cells per sample. Will be 1 for most samples, but can be used to filter poor quality samples (with a value of zero) or controls with more than 1 cell
  • LibraryStrategy: Indicates the strategy used to obtain single cells for the project (i.e. DROP-seq/10X Genomics).

Sample Distribution by DiseaseCategory:


Project level:

  • ProjectName: The name of individual projects where the data is from.
  • TherapeuticArea: Specific clinical focus of individual project (can be multiple areas depending on project)

Key Views

Project View

Experimental designs in projects within DiseaseLand can be quite different, while some users may also want to quickly identify expression of a gene in the context of a specific study. Omicsoft created project-specific views to display expression values based on experimental design within each project.

Braf mouseUMI projectRPM.png

Gene RPM View

View gene expression across all projects in this view to see how a gene is expressed in various contexts, such as the tissue it was derived from.

Braf mouseUMI geneRPM.png

[back to top]

Related Articles