MiRNA alignment and quantification

From Array Suite Wiki

Jump to: navigation, search

Array Studio (AS) offers users a good solution for miRNA analysis in alignment, quantification and expression comparison. The below shows users how to do alignment, quantification, and comparison.




miRNAs are a class of small non-coding RNAs in length of ~22 nt. Thus, to align small RNA reads against its genome, users first have to remove adapter sequences from ends of small RNA sequences, which are generally provided by sequence service provider. If you do not know the 3' adapter sequence, you can try searching for adapters using Search Adapters function.


Once adapter sequence is known, users can start to align small RNA sequencing dataset against its genome.

Add Data => Add NGS Data => Add miRNA-Seq Data => Map Reads To Genome(Illumina)

Image m a q05.png

Users should define genome and gene model (like: miRBase.R21). Meanwhile, users have to customize the method of adapter trimming. Tips: For gene model, miRBase.R21 and miRBase.R21.Mature are based on miRNA precursor sequences and mature sequences, respectively. Users are allowed to choose different modes to quantify miRNA expressions.

By default, in the alignment, AutoPenalty is used. Max (2, (read length - 31) / 15) mismatches are allowed based on trimmed and adapter stripped read length. Reads with mismatches are also used for quantification. If you only want to allow one mismatches, you can set fix penalty = 1 in the alignment.

Image m a q06.png

In Advanced Tab, users can define more options. Recommended to leave them as default.

Users will get alignment report:

Image m a q08.png


To quantify miRNA expression, Click NGS => Quantification => Report Gene/Transcript Counts

Image m a q10.png

The quantification GUI should automatically find the mature gene model (such as miRBase.R21.Mature) in the quantification step based on gene model used for alignment (such as miRBase.R21). miRNA-Seq data are usually single end dataset and it does not matter whether you check the count fragment or reads, it will be considered counting reads. It is naïve count and no EM algorithm is applied. The count data from the quantification step is counting number of reads in the mature miRNA region. It is not normalized by miRNA region length since researchers are usually using count data directly in miRNA publication. If a read is mapped to multiple miRNA regions, it has been counted multiple times. The count data is not normalized.

Click Submit. Users will get quantification result as below:

Image m a q11.png

Click Table in Annotation, users will get more details about miRNAs

Image m a q12.png


We have a normalization function in NGS | Inference | Normalize RNASeq data. Usually, for miRNA-Seq, TotalCount normalization is recommended. User can normalize data so that the total read count in each sample is 1,000,000. Then the data is per million mapped reads.

TMM, another common normalization method for miRNAseq, is also available in Normalize RNASeq Data.

Differential expression analysis

Once miRNA expression is quantified and normalized, user can follow the same downstream analysis (GLM, ANOVA, clustering) as those in microarray analysis.

More readings