Genome browser alignment filters

From Array Suite Wiki

Jump to: navigation, search

Overview

In OmicSoft Genome Browser, we have implemented "alignment filter" allowing user to remove or keep certain type of reads only. The coverage will also change accordingly based on the remaining reads.

Genome browser alignment filters

CIGAR filter

For CIGAR filter, it is self-explained by these options:

  • Exon Junction Reads Only will display exon junction spanning reads
  • Exclude Exon Junction Reads will display reads which are not spanning between exons
  • Indel Reads Only will only display reads containing indels. It is a good way to check whether the indel detection is correct.
  • Exclude Indel Reads will remove indel reads in genome browser

Flag Filter

For Flag filter, it might be confusing. I will explain it with the example chart below:

RNA-Seq reads from non-strand specific protocol. Sequencing reads are from both forward and reverse complement of two transcripts.

In OmicSoft Genome Browser, (-) strand has Green color and (+) strand has Blue color. You can view more default settings by click "Track Properties" either by right click on track or by click the menu.


  • First Read Only will only show reads and coverage from the first read. You still see reads (colored with blue and green) which are mapped to plus and minus strand for non-strand specific protocol RNA-Seq data.
20130118 ReadStrandExampleRead1Only.png
  • Second Read Only will only show reads and coverage from the second read. You still see reads colored with blue and green.
20130118 ReadStrandExampleRead2Only.png
  • Forward Strand Only will only show reads and coverage from reads mapped to forward (+) strand. You only see reads colored with blue.
20130118 ReadStrandExampleForwardOnly.png
  • Reverse Strand Only will only show reads and coverage from reads mapped to reverse (-) strand. You only see reads colored with green.
20130118 ReadStrandExampleReverseOnly.png
  • Consistent with First-Read-Forward-Strand will only show reads and coverage from unpaired reads and read pairs (both read1 and read2) where read1 mapped to forward (+) strand. For paired reads, it does not require, but theoretically, read2 mapped to minus (-) strand.
20130118 ReadStrandExampleRead1OnForward.png
  • Consistent with First-Read-Reverse-Strand will only show reads and coverage from unpaired reads and read pairs (both read1 and read2) where read1 mapped to reverse (-) strand. For paired reads, it does not require, but theoretically, read2 mapped to minus (+) strand.
20130118 ReadStrandExampleRead1OnReverse.png

In Track Properties | Pileup | Exclude Singletons, if you choose True, it will only show paired reads only.

Visualization of strand specific RNA-Seq dataset

The flag filter is extremely useful to visualize strand specific RNA-Seq data. In strand specific RNA-Seq protocol, one strand, usually the forward strand of the transcript, is enriched. The confusing part is that the forward strand of the transcript can be on the reverse/minus/(-) strand on the genome if the gene itself is on the reverse/minus/(-) strand. In OmicSoft Genome Browser, the Flag filter is based on strand of genome.

Before visualizing the data in genome browser, it is highly recommended that user check the alignment QC using ArrayStudio's NGS | Aligned Data QC | RNA-Seq QC Metrics (read RnaSeqQCMetrics.pdf). One part of these QC is the strand enrichment rate. In this section, I will use one fly dataset (SRR070272) as one example, where 99.86% of read pairs are mapped to the forward strand of transcriptome.

QC for strand for SRR070272


Here are screenshots for two fly genes from this strand specific data

Note: In OmicSoft Genome Browser, (-) strand has Green color and (+) strand has Blue color.

Without any filter


Only show the first reads. Because it is forward strand specific protocol and gene Ank is on minus strand, first reads from gene Ank is always on genome's minus strand; vice versa for gene CG3200.


Only show the second reads. Forward strand specific protocol & gene Ank is on minus strand -> all read2 are on genome's plus/forward strand


Only show the forward reads, all in blue.


Only show the reverse reads, all in green.


Only show read pairs with their read1 mapped to the forward strand. Because it is pretty good forward strand specific data, almost all reads from Ank are excluded since their read1 are mapped to minus strand.


Only show read pairs with their read1 mapped to the reverse/minus strand. Almost all reads from CG32000 are excluded since their read1 are mapped to forward/plus strand.

Other tricks

The following tricks are useful:

  • In Track Properties | Profile | Connect paired End Profiles, if you choose True, it will show a dashed line between read 1 and 2.
  • In Track Properties | Coverage | Split Coverage By Strand, if you choose True, it will split the coverage based on first read mapped to forward and minus strand. Read more in Genome browser split coverage by strand
  • In Track Properties | Pileup | Exclude MultiReads, if you choose True, it will only show uniquely aligned reads.
  • In Track Properties | Pileup | Exclude Singletons, if you choose True, it will only show paired reads only.