PairedEnd

From Array Suite Wiki

Jump to: navigation, search

Specifies whether the alignment mapping should use paired end mode. When set to true, the following strategy will be used:

Paired-end reads need a different strategy from single-end reads. Paired-end reads have at least two advantages over single-end reads: it can help resolving the ambiguity of non-uniquely mapped reads and it can help assembling the transcriptome in downstream analysis. Concordant alignments are searched at first with the best paired alignment defined as the minimal total mismatch# of the paired alignments, and if concordant alignments cannot be found, best alignment(s) for each read will be reported. OSA uses the following steps to perform paired- end alignment:

1. Get the best alignment for each read. All equally good alignments are returned and paired to get the concordant solutions. If concordant solutions are found, steps 2 and beyond are skipped.
2. If read 1 has a uniquely best alignment, try to perform a constrained alignment on read 2, where the candidate positions of read 2 are constrained by the position of read 1 (by default we extend the read 1 position by expected insert size + 6 * standard deviation of insert size). If concordant solutions can be found, return the result directly. For RNA-Seq, this formula also adds in 64000, to account for any introns.
3. If read 2 has a uniquely best alignment, try to perform a constrained alignment on read 1, where the candidate positions of read 1 are constrained by the position of read 2 (by default we extend the read 1 position by expected insert size + 6 * standard deviation of insert size). If concordant solutions can be found, return the result directly. For RNA-Seq, this formula also adds in 64000, to account for any introns.
4. Consider all the possible pairings from suboptimal alignments from read 1 and read 2. Instead of searching through all possible pairs at all different mismatch level and finding the best paired alignment, we consecutively constrain the mismatch# of single alignments and break the searching if a concordant solution is found in early stage. This strategy does not seem to affect the accuracy of the algorithm.
5. If concordant solutions cannot be found, best alignment(s) for each read will be reported.

Note: For RNA-Seq data, the parameters “/InsertSizeStandardDeviation /ExpectedInsertSize” has minimal effects in the second stage of the alignment (i.e. when aligning to the genome) as there could be introns between the two paired end reads.


See also Logic of filenames used for paired-end read