Introduction to CCLE Land Content

From Array Suite Wiki

Jump to: navigation, search


Contents

Cancer Cell Line Encyclopedia

The Cancer Cell Line Encyclopedia (CCLE) project is an effort to conduct a detailed genetic characterization of a large panel of human cancer cell lines. OmicSoft's CCLE Lands provide analysis and visualization of DNA copy number, mRNA expression, mutation data and more, for 1000 cancer cell lines. These data can also provide the link between pharmacologic vulnerabilities and genomic/expression patterns, with Land Measurement Queries.

Land Version Genome Build Gene Model
Current Version
CCLE_B38_GC33 Human.B38 OmicsoftGenCode_V33
Legacy versions
CCLE_B38 Human.B38 OmicsoftGenCode_V24
CCLE_DepMap_Preview_B38 Human.B38 OmicsoftGenCode_V24
CCLE_B37 Human.B37.3 OmicsoftGene20130723
CCLE_DepMap_Preview_B37 Human.B37.3 OmicsoftGene20130723

CCLE_B38_GC33

CCLE_B38_GC33 is the latest CCLE Land database, with the latest available data. We recommend using this Land when possible.

In addition to all data previously available in CCLE_B37/CCLE_B38 (with RNA-seq data re-analyzed on the latest gene model), DepMap data were integrated into this Land.

Additional data were added with new MS data, deeper RNA-seq data, and new CNV/DNA-seq Mutation data.

CCLE_DepMap_Preview_B37 and CCLE_DepMap_Preview_B38

Starting with the 2019R3 release, we integrated DepMap CRISPR and RNAi dependency data into CCLE Lands, which can be found in CCLE_DepMap_Preview_B37 and CCLE_DepMap_Preview_B38. These data have now been integrated into CCLE_B38_GC33

Data Source

CCLE DepMap

Data Types

  • CNV, based on segmented CNV files (downloaded)
  • CNV Call, GISTIC2 calls
  • DNASeq_Mutation
  • DNASeq_Mutation_Exome
  • Expression Intensity Probes (Affymetrix)
  • RNA-Seq, including:
    • Single-end and Paired-end fusion calling
    • RNA-Seq somatic mutation, from matched tumor/normal pairs
    • Exon Junction and Exon Usage
    • Expression (Gene- and Transcript- level quantification)
  • Gene Dependency (CCLE_DepMap_Preview)
    • CRISPR
    • RNAi

Laboratory Methods

  • Affymetrix Expression Array (Affymetrix.HG-U133_Plus_2)
  • Illumina HiSeq RNA sequencing (HiSeq 2000)
  • Hybrid capture sequencing

Processing Methods

Expression Data: Omicsoft Affymetrix Microarray Preprocessing

RNA-Seq data: OmicScript RNAseq Pipeline and Building Lands From RNA-Seq Data

HLA (Class I) identification using the RnaSeq aligned reads. The HLA OptiType program aligns RNA-seq reads to the HLA Reference genome, and then performs an optimization to determine the most likely HLA Class I allele. See OptiType - precision HLA typing from next-generation sequencing data.pdf for a description of the algorithm.

Omicsoft does not reprocess other genomic data, but extracts data directly from original datasets.

  • CRISPR data: Achilles Gene Effect (2019R3)
  • RNAi data: DEMETER2 Data v5 (combined)
  • Mass Spec data
  • CNV data

DNA-seq mutation calls

OmicSoft mutation calls were extracted from Broad DepMap data 2020Q3, including variant (reference and alternative allele counts) based on the following priority order:

  • HC_AC: in Broad Hybrid capture data from the CCLE2 project
  • RD_AC: in Broad Raindance data from the CCLE2 project
  • CGA_WES_AC: the allelic ratio for this variant in Broad WES using a cell line adapted version of the CGA pipeline (https://docs.google.com/document/d/1VO2kX_fgfUd0x3mBS9NjLUWGZu794WbTepBel3cBg08/edit) that includes germline filtering.
  • SangerWES_AC: in Sanger WES
  • SangerRecalibWES_AC: in Sanger WES after realignment at Broad
  • WGS_AC: in Broad WGS data from the CCLE2 project

Additional columns:

   isTCGAhotspot: is this mutation commonly found in the TCGA consortium data?
   TCGAhsCnt: count of this mutation in TCGA (number of samples)
   isCOSMIChotspot: is this mutation commonly found in COSMIC?
   COSMIChsCnt: count of this mutation in COSMIC (number of samples)
   ExAC_AF: the allelic frequency in the Exome Aggregation Consortium (ExAC)

Descriptions of the remaining columns in the MAF can be found here: https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/

More details on how these mutations were called and filtered can be found in the manuscript “Next generation characterization of the Cancer Cell Line Encyclopedia” in Nature.

Key Meta Data Columns

  • Primary Site: The body site where the cell line sample is derived from.
  • Histology: Histological types of cancer, such as carcinoma, glioma and sarcoma.
  • Land Tissue: The tissue from which the cell line was derived, using OmicSoft's curation Controlled Vocabulary
  • Land Sample Type: A detailed description of the cell type from which the cell line was derived, using OmicSoft's curation Controlled Vocabulary
  • Tumor or Normal: Indicates whether a sample is from a tumor or normal sample.
[back to top]

Primary Grouping

Primary Site

Sample Distribution by Primary Site

CCLESampleDistribution.png

[back to top]


Key Views

Gene Expression

One of the most common ways to visualize gene expression data is a per-sample Scatter plot (e.g. Gene FPKM), with each sample grouped by Primary Site on the Y-axis, and expression level plotted on the X-axis:

GeneFPKMforEGFRCCLE.png

Additional Views include transcript-level and exon-level views, pairwise comparison plots, and direct visualization of RNAseq coverage with the OmicSoft Genome Browser.

[back to top]


DNA Mutation

Multiple visualizations display frequency and locations of gene mutations in CCLE samples, including the Mutation Landscape View.

DNAMutationCCLE.png

[back to top]


Copy Number Variation

Copy number data can be visualized for a gene of interest, grouped by any metadata column, such as Histology.

CNV BRAF CCLE.png


[back to top]

Related Articles

EnvelopeLarge2.png