Here, we provide detailed performance comparisons of NGS read aligners. In the light of the heated debates, we would like to stress that benchmarks only measure specific aspects and may not be used to claim any universal superiority or inferiority of a particular tool.
In order to compare different short read aligners, we use a published real-life paired-end DNA/RNA-Seq dataset. All optimal alignments (also multiple mapping loci) of 100,000 read pairs of each sample were obtained by RazerS 3 (full sensitivity mapping tool). In the benchmark shown below, we measured the performance in finding all optimal hits of different NGS mappers with default parameters. True positives are reads with up to 10 multiple mapping loci, allowing up to 10 errors (mismatches and indels).
Note that we explicitely want to find all multiple mapping loci in this benchmark and not only unique mapping loci or just one random hit of several. We believe that reads mapping multiple times should not be discarded since gene duplications and repeat regions are known to be biologically relevant.
The following real-life datasets were used for this benchmark:
The down-sampled dataset as well as the optimal alignments are published and can be accessed here. The corresponding publication (Otto et al. Bioinformatics, 2014) can be found here. Please feel free to repeat the benchmark on your own machine to double-check the results!
In order to optimize this benchmark, we would like to encourage all readers to reproduce this data and to come up with alternative benchmarks.
I you plan to use the results of this benchmark, please cite:
Otto C, Stadler PF, Hoffmann S: 'Lacking alignments? The next generation sequencing mapper segemehl revisited', Bioinformatics. 2014 Jul 1;30(13):1837-43 (2014)