MAGeCK Count from Fastq.pdf

From Array Suite Wiki

(Redirected from MAGeCK Count from Fastq)
Jump to: navigation, search

Contents

MAGeCK Count from Fastq

Many independent studies have adopted MAGeCK "Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout" to identify important genes from genome-scale CRISPR-Cas9 knockout screens. MAGECK was developed and maintained by Wei Li and Han Xu from Dr. Xiaole Shirley Liu's lab at Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health.

To better assist biologists analyzing CRISPR screening data and simplify implementation of MAGeck, OmicSoft has designed this GUI to wrap in the Linux commands of MAGeCK. Users can easily setup their parameters and send the job to run by MAGeCK - results will be imported into the ArrayStudio GUI for further visualizations/analysis.

Warning.png WARNING: To use this feature, a user will need to have a server project and ArrayServer must be installed on a Linux machine.


This module is designed to run MAGeCK command through Array Studio, to generate count data from fastq files.

This module can be accessed by:

Mageck01.png

Running this module through the GUI will use the following command line input in Linux and return the results to ArrayStudio users:

mageck count -l library.txt -n demo --sample-label L1,CTRL  --fastq test1.fastq test2.fastq 

OmicSoft implementation is benchmarked with MAGeCK v.0.5.7

Installation

ArrayServer administrators will need to install MAGeCK and its dependencies on the same machine as ArrayServer. Please contact your administrator and request they perform the following steps:

Install required packages for MAGeCK

MAGeCK suggests users to install numpy which will calculate the negative binomial p value. In cases where numpy is not found, MAGeCK will use the normal p value instead.

Install MAGeCK

The MAGeCK sourceforge wiki page is https://sourceforge.net/p/mageck/wiki/Home/

Download the source code at https://sourceforge.net/projects/mageck/files/latest/download, unzip it, and install in a directory mapped within ArrayServer

tar xvzf mageck-0.5.7.tar.gz
cd mageck-0.5.7
python setup.py install

Test Code

The MAGeCK comes with a "demo" folder, which contains sample data and script. Please run it to make sure MAGeCK is correctly installed.

General

Mageck06.png

Input/Output

This function works on fastq/fastq.gz data, and will require a design file and library file.

For instance, if there are two fastq files for input file:

test1.fastq.gz
test2.fastq.gz

A design.txt for example:

FastqFile	SampleLabel	Group
test1.fastq.gz	C1	control
test2.fastq.gz	T1	treat

Options

  • MAGeCK Path: user will need to set the mageck path which will be the full (absolute) path on the Linux machine running ArrayServer, through which ArrayServer will be able to call mageck to do the analysis
  • Library File: the library file to show sgRNA sequence and its target gene, for more details, please check library file on Mageck wiki
  • Design File: The design matrix is a .txt file indicating the effects of different conditions on different samples. Please check the format of design file in the section above. In this file, three columns are required, FastqFile to show the fastq file name, SampleLabel to show the sample name, Group column to indicate if the sample is in control group or treat group. The Group column information will be parsed into mageck command, and can only be control or treat.
  • Output Folder: defines where the output file will be generated
  • Output Prefix: a prefix added to the resulting file


  • Norm-Method: Method for normalization, default is median. Available options: None, median, total, control. If control is specified, the size factor will be estimated using control sgRNAs specified in --control-sgrna option.
  • Reverse Complementary: whether to Reverse complement the sequences in library for read mapping.
  • Trim 5' End: Length of trimming the 5' of the reads. Default 0
  • sgRNA Len: Length of the sgRNA. Default 20. ATTENTION: after v 0.5.3, the program will automatically determine the sgRNA length from library file; so only use this if you turn on the --unmapped-to-file option.

For more details about these options, please check with Mageck wiki usage

[back to top]


Output Results

In the ArrayStudio GUI, user will be able to see the result table named as "prefix".sgRNANormalizedCount:

Mageck05.png

In the output folder, user will be able to see these result files:

Mageck03.png

For more details about the result files in the output folder, please refer to the Mageck's own wiki: Mageck output

Omicscript

MageckCount.oscript

Related Articles

EnvelopeLarge2.png