Introduction to CCLE Land Content

From Array Suite Wiki

Jump to: navigation, search


Contents

CCLE_B37 and CCLE_B38

The Cancer Cell Line Encyclopedia (CCLE) project is an effort to conduct a detailed genetic characterization of a large panel of human cancer cell lines. OmicSoft's CCLE_B37 Land release provides analysis and visualization of DNA copy number, mRNA expression, mutation data and more, for 1000 cancer cell lines. These data can also provide the link between pharmacologic vulnerabilities and genomic/expression patterns, with Land Measurement Queries.

Land Version Genome Build Gene Model
CCLE_B37 Human.B37.3 OmicsoftGene20130723
CCLE_B38 Human.B38 OmicsoftGenCode_V24

Data Source

CCLE

Data Types

  • CNV, based on segmented CNV files (downloaded)
  • CNV Call, GISTIC2 calls
  • DNASeq_Mutation
  • DNASeq_Mutation_Exome
  • Expression Intensity Probes (Affymetrix)
  • RNA-Seq, including:
    • Single-end and Paired-end fusion calling
    • RNA-Seq somatic mutation, from matched tumor/normal pairs
    • Exon Junction and Exon Usage
    • Expression (Gene- and Transcript- level quantification)

Laboratory Methods

  • Affymetrix Expression Array (Affymetrix.HG-U133_Plus_2)
  • Illumina HiSeq RNA sequencing (HiSeq 2000)
  • Hybrid capture sequencing

Processing Methods

Expression Data: Omicsoft Affymetrix Microarray Preprocessing

RNA-Seq data: OmicScript RNAseq Pipeline and Building Lands From RNA-Seq Data

Omicsoft does not reprocess other genomic data, but extracts data directly from original datasets.

Key Meta Data Columns

  • Primary Site: The body site where the cell line sample is derived from.
  • Histology: Histological types of cancer, such as carcinoma, glioma and sarcoma.
  • Land Tissue: The tissue from which the cell line was derived, using OmicSoft's curation Controlled Vocabulary
  • Land Sample Type: A detailed description of the cell type from which the cell line was derived, using OmicSoft's curation Controlled Vocabulary
  • Tumor or Normal: Indicates whether a sample is from a tumor or normal sample.
[back to top]

Primary Grouping

Primary Site

Sample Distribution by Primary Site

CCLESampleDistribution.png

[back to top]


Key Views

Gene Expression

One of the most common ways to visualize gene expression data is a per-sample Scatter plot (e.g. Gene FPKM), with each sample grouped by Primary Site on the Y-axis, and expression level plotted on the X-axis:

GeneFPKMforEGFRCCLE.png

Additional Views include transcript-level and exon-level views, pairwise comparison plots, and direct visualization of RNAseq coverage with the OmicSoft Genome Browser.

[back to top]


DNA Mutation

Multiple visualizations display frequency and locations of gene mutations in CCLE samples, including the Mutation Landscape View.

DNAMutationCCLE.png

[back to top]


Copy Number Variation

Copy number data can be visualized for a gene of interest, grouped by any metadata column, such as Histology.

CNV BRAF CCLE.png



Update Log

CCLE Land Development Notes

B37 Data History

MetaData

[2019R1] Normalized metadata according to OncoLand curation standard (Controlled for "Tissue" and "DiseaseState"). Generated design file with project level metadata. Cell line are controlled with controlled vocabularies.

[2018Q2] Revised "DiseaseState" and "LandSampleType", added "TissueCategory" column.

[2018Q1] Fix metadata: correct annotation RDES as requested by a customer

[2017Q2] Minor modification: rename wxs bamfile names for streaming purpose

[2016Q3] Removed MB157_BREAST and added COLO699_LUNG

[2016Q2] Rename CCLE2015 to CCLE_B37, and will publish as V3

[2015Q4] Add HLA calls from Optitype

[2015Q2] Add alignment mapping stats to MetaData

CNV

[2014Q3] Added 20 samples

CNVCall

[2017Q1] Re-generate the CNVCalls using the source file below

DnaSeq_Mutation

[2018Q2] Redid DnaSeq_Mutation based on new mutation data available from Broad Portal Obtained from: https://portals.broadinstitute.org/ccle/data

See below for mutation data processing

[2015Q2] Fix/Update DnaSeq_Mutation (related to indel positions in maf/vcf)

DnaSeq_Mutation_Exome

[2018Q2] Removed the "DnaSeq_Mutation_Exome"

[2015Q3] Data type added

Expression_Intensity_Probes

[2016Q3] Reran the cel2alv pipeline with updated Oshell version (9.0.6.2)

[2016Q2] Replace general expression

RnaSeq

[2017Q1] Added 7 additional samples (EKVX, SF539, SNB75, SF268, HOP92, HOP62, UO31)

[2014Q3] Added 155 additional samples

Virus and 16s

[2017Q2] Reran with new virus (Virus.RefSeq20170418) and 16S (16SMicrobial.Ncbi20170418) references

[2016Q3] Added alignments to 7 samples

RPPA

[2018Q2] Added RPPA data type for 880 samples

Obtained from: https://portals.broadinstitute.org/ccle/data

CTRP Measurement Data

[2018Q2] Added Cancer Therapeutics Response Portal (CTRPv2.0) small molecule sensitivity measurement data for 545 compounds


B38 Data History

CNV

[2016Q1] Liftover from B37 coordinates

DnaSeq_Mutation

[2018Q2] Redid DnaSeq_Mutation based on new mutation data available from Broad Portal Obtained from: https://portals.broadinstitute.org/ccle/data

See below for mutation data processing

[2016Q1] Liftover from B37 coordinates

Expression_Intensity_Probes

[2016Q3] Reran the cel2alv pipeline with updated Oshell version (9.0.6.2)

RnaSeq

[2017Q1] Added 7 additional samples (EKVX, SF539, SNB75, SF268, HOP92, HOP62, UO31)

[2016Q1] Realign fastq to B38

RPPA

[2018Q2] Added RPPA data type for 880 samples

Obtained from: https://portals.broadinstitute.org/ccle/data

Virus and 16s

[2017Q2] Reran with new virus (Virus.RefSeq20170418) and 16S (16SMicrobial.Ncbi20170418) references

LandQC

DnaSeq_Mutation Processing

The newest mutation file from the Broad Portal (CCLE_DepMap_18Q2_maf_20180502.txt) contains mutation information merged from several different methods (WXS, hybrid-capture, RainDance, and WGS). Because our mutation alvs only support mutation allele frequencies from one data source, we decided to extract the reference and alternative allele counts based on the following priority order:

HC_AC (hybrid-capture) RD_AC ("RainDance") WES_CCLE (WXS) SangerRecalibWES_AC WGS_AC

Additional Notes

Mutations only supported by RNA-seq ("RNAseq_AC") are excluded since this is a DNA-seq alv Even though there is data for 1,549 samples, only 993 of these samples are in our current CCLE MetaData (and no updated MetaData file was provided), so only a subset of this data is included.


CTRP Files

ftp://caftpd.nci.nih.gov/pub/OCG-DCC/CTD2/Broad/CTRPv2.0_2015_ctd2_ExpandedDataset/

Fastq Files: CCLE Raw Fastq Files Locations

Primary CCLE Website: https://portals.broadinstitute.org/ccle/data/browseData

CCLE Bam file (downloaded from UCSC CGHub repository)

Potential Benchmark Papers

http://www.nature.com/nbt/journal/v33/n11/full/nbt.3344.html

http://www.tandfonline.com/doi/full/10.4161/21624011.2014.954893




[back to top]


Related Articles

EnvelopeLarge2.png