MAGeCK Gene Quantify from count file.pdf
From Array Suite Wiki
MAGeCK Gene Quantify from count file
This module is designed to run MAGeCK command through Array Studio, to generate inference report from the count data. Full name of MAGeCK is "Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout", which is developed and maintained by Wei Li and Han Xu from Dr. Xiaole Shirley Liu's lab at Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health.
Many independent studies have adopted MAGeCK to identify important genes from the recent genome-scale CRISPR-Cas9 knockout screens technology, to better assistant our users who are also interested using this computational tool, OmicSoft has designed this GUI so users can easily setup their parameters and send the job to run by MAGeCK in the Linux environment, once the analysis is done, import the result into ArrayStudio GUI for further visualizations/analysis.
Omicsoft implementation is benchmarked with MAGeCK v.0.5.7
#The first line of command is for the count data generation #mageck count -l library.txt -n demo --sample-label L1,CTRL --fastq test1.fastq test2.fastq mageck test -k demo.count.txt -t L1 -c CTRL -n demo
This module can be accessed by:
For the installation about Mageck package, please refer to the other wiki: MAGeCK Count from Fastq
This function works on count data generated by Mageck, and will require a design file to group the samples into control and treat.
For instance, if user already have the normalized count data generated by the CRISPR | CRISPR Mageck | Mageck Count from Fastq, user can use this as input file.
A desing.txt for example:
FastqFile SampleLabel Group test1.fastq.gz C1 control test2.fastq.gz T1 treat
- Exe Path: User will need to set the mageck path which will be the relative server path, through which ArrayServer will be able to call mageck to do the analysis
- Algorithm: Options will be classic or mle.
- classic: The classic "test" command first estimate the variance and mean for sgRNA read count, and use them to calculate the p-value assuming it follows NB distribution. It then sorts p-value and calculates the gene-level score using modified robust rank aggregation (a-RRA).
- mle: Since version 0.5, MAGeCK provides a new module, mle, to calculate gene essentiality. In comparing with classic "test" command, it performs maximum-likelihood estimation of gene essentiality scores instead of the RRA analysis, and the calculated beta score measurement allows direct comparison across multiple conditions mle in mageck wiki.
- Design File: The design matrix is a txt file indicating the effects of different conditions on different samples. Please check the format of design file in upper section. In this file, three columns are required, FastFile to show the fastq file name, SampleLabel to show the sample name, Group column to indicate if the sample is in control group or treat group. The Group column information will be parsed into mageck command, and can only be control or treat.
- Output Folder: defines where the output file will be generated
- Output Prefix: a prefix added to the resulting file
- CNV Matrix: Optional arguments for CNV correction. A matrix of copy number variation data across cell lines to normalize CNV-biased sgRNA scores prior to gene ranking.
- Norm-Method: Method for normalization, default is median. Available options: None, median, total, control. If control is specified, the size factor will be estimated using control sgRNAs specified in --control-sgrna option.
- Threads: Using multiple threads to run the algorithm. Default using only 1 thread.
For more details about these options, please check with Mageck wiki usage
In the ArrayStudio GUI, user will be able to see the table data named as "prefix".CripserSummaryReport_test:
In the output folder, user will be able to see these result files:
For more details about the result files in the output folder, please refer to the Mageck's own wiki: Mageck output