Map reads to both human and virus genome
From Array Suite Wiki
To determine viral load in a sample, Array Studio users can map reads to references for viral genomes. There are three options to do this: 1) Users can create a combined genome in which viral sequences are added to the reference genome from which the sample came from (i.e. Human). Users could map RNA-Seq reads to this human+virus combined genome and quantify gene counts for human genes as well as individual viral counts. While this may be a useful option, reads that map to viral sequences often also map within the human genome. In this case, it may be more desirable to 2) map reads to the human genome first, and use the unmapped reads to further align to virus sequences. 3) map raw reads to public virus reference genome, and do quantification for virus genome counts. This wiki page describes a workflow for how users can take unmapped reads and quantify viral expression.
Mapping of Reads To Human Genome plus Customized Virus Reference Genome and Gene Model
In this step, users can perform mapping in two stages: 1) to the human genome and 2) to the virus genome. For all OmicSoft provided genome references, please see: References.
Map Reads to Human Genome
Raw fastq reads from a bulk RNA-seq sample can be aligned first to the Human Genome.
Specify the fastq files to Map RNA-Seq Reads To Genome with the human reference genome and gene model:
In the Advanced tab, be sure to uncheck Exclude unmapped reads in BAM file. Otherwise the unmapped reads will not be available in the bam files.
mapped and unmapped reads in the bam files
When the job is done, there will be one NGS Data show up in the solution together with an Alignment Report. In the output folder specified, there are bam files for each sample, with both mapped and unmapped reads contained within the same bam file.
Map Unmapped Reads to Virus Genome
To further map to the Virus Genome, the unmapped reads will need to be extracted as fastq files from the previous step and then used for subsequent mapping.
Extract unmapped reads
Under the NGS menu, extract the unmapped reads using the NGS -> Manipulation -> Export:
In the Export window, choose the NgsData data with both mapped and unmapped reads, and choose output format as UNPAIRED+UNMAPPED_FASTQ_GZ, as shown below.
Output Fastq Files
The extracted fastq files will be found in the output folder:
Map reads to Virus Genome
To map the fastq files extracted in the previous step, ;
Quantification Counts data for reads aligned to Human and Virus genome
After mapping original fastqs and unmapped fastqs to human and virus genome, there will be two NGS objects available, run Summarize Gene/Transcript Count module on the two NGS data, users will get counts data for human genes and virus genes.
The final output would be similar as shown in the snapshot below:
Mapping of Reads to Public Virus Reference Genome
In OmicSoft platform, there are several public available virus genome, Virus.RefSeq20170418, Virus.RefSeq2014.**. ArrayStudio users could map RNASeq reads to those virus reference genome to analysis virus gene expression in the sample.
Map raw reads to Virus reference genome without customized virus gene model
Similar to the mapping steps demonstrated above, go to Add Data -> Add NGS Data -> Add RNA-Seq Data -> Map Reads To Genome (Illumina). In the Map RNA-Seq Reads To Genome window, choose the raw fastq files, choose one of the available Virus Reference Genome, as in this demo the Virus.RefSeq20170418 was chosen.
When the alignment job is done, we would expect one NGS data object with effective alignment information, as shown below.
Quantification of Virus genome mapping
After getting the NGS data object for mapped reads, users could go to NGS -> Quantification -> Report Gene / Transcript Counts module to get the Counts data for virus.
When the job is done, the -OmicData expression table for virus genome would show up in the solution.