MAGeCK Count from Fastq.pdf
From Array Suite Wiki
MAGeCK Count from Fastq
Many independent studies have adopted MAGeCK "Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout" to identify important genes from genome-scale CRISPR-Cas9 knockout screens. MAGECK was developed and maintained by Wei Li and Han Xu from Dr. Xiaole Shirley Liu's lab at Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health.
To better assist biologists analyzing CRISPR screening data and simplify implementation of MAGeck, OmicSoft has designed this GUI to wrap in the Linux commands of MAGeCK. Users can easily setup their parameters and send the job to run by MAGeCK - results will be imported into the ArrayStudio GUI for further visualizations/analysis.
This module is designed to run MAGeCK command through Array Studio, to generate count data from fastq files.
This module can be accessed by:
Running this module through the GUI will use the following command line input in Linux and return the results to ArrayStudio users:
mageck count -l library.txt -n demo --sample-label L1,CTRL --fastq test1.fastq test2.fastq
OmicSoft implementation is benchmarked with MAGeCK v.0.5.7
ArrayServer administrators will need to install MAGeCK and its dependencies on the same machine as ArrayServer. Please contact your administrator and request they perform the following steps:
Install required packages for MAGeCK
MAGeCK suggests users to install numpy which will calculate the negative binomial p value. In cases where numpy is not found, MAGeCK will use the normal p value instead.
The MAGeCK sourceforge wiki page is https://sourceforge.net/p/mageck/wiki/Home/
pre-required package scipy
pip install scipy
Download the source code at https://sourceforge.net/projects/mageck/files/latest/download, unzip it, and install in a directory mapped within ArrayServer
tar xvzf mageck-0.5.7.tar.gz cd mageck-0.5.7 python setup.py install
The MAGeCK comes with a "demo" folder, which contains sample data and script. Please run it to make sure MAGeCK is correctly installed.
This function works on fastq/fastq.gz data, and will require a design file and library file.
For instance, if there are two fastq files for input file:
A design.txt for example:
FastqFile SampleLabel Group test1.fastq.gz C1 control test2.fastq.gz T1 treat
- MAGeCK Path: user will need to set the mageck path which will be the full (absolute) path on the Linux machine running ArrayServer, through which ArrayServer will be able to call mageck to do the analysis
- Library File: the library file to show sgRNA sequence and its target gene, for more details, please check library file on Mageck wiki
- Design File: The design matrix is a .txt file indicating the effects of different conditions on different samples. Please check the format of design file in the section above. In this file, three columns are required, FastqFile to show the fastq file name, SampleLabel to show the sample name, Group column to indicate if the sample is in control group or treat group. The Group column information will be parsed into mageck command, and can only be control or treat.
- Output Folder: defines where the output file will be generated
- Output Prefix: a prefix added to the resulting file
- Norm-Method: Method for normalization, default is median. Available options: None, median, total, control. If control is specified, the size factor will be estimated using control sgRNAs specified in --control-sgrna option.
- Reverse Complementary: whether to Reverse complement the sequences in library for read mapping.
- Trim 5' End: Length of trimming the 5' of the reads. Default 0
- sgRNA Len: Length of the sgRNA. Default 20. ATTENTION: after v 0.5.3, the program will automatically determine the sgRNA length from library file; so only use this if you turn on the --unmapped-to-file option.
For more details about these options, please check with Mageck wiki usage
In the ArrayStudio GUI, user will be able to see the result table named as "prefix".sgRNANormalizedCount:
In the output folder, user will be able to see these result files:
For more details about the result files in the output folder, please refer to the Mageck's own wiki: Mageck output