There is this question that comes up really often during our workshops: what is the best NGS read aligner? What alignment tool do you recommend to use for my next-generation sequencing experiments? Indeed this is a relevant question since read mapping often is a central step in any NGS analysis pipeline.
Traditional sequence alignment tools like BLAST are not suited for NGS. To extract meaningful information from NGS data, one needs to align millions and millions of mostly short sequences. BLAST and similar tools are way to slow for the vast amount of data produced by modern sequencing machines. The advent of these machines ignited the development of a class of new, much faster read aligners. Many of them: there are now more than 90 short read alignment programs available.
Timeline of NGS read aligners. Image from Nuno Fonseca, HTS Mappers.
Every time a new read alignment software is developed and officially presented in a peer reviewed journal, the authors are asked to provide a comparison to existing tools. This is typically done in a benchmark where certain aspects of a software tool are assessed in a (ideally) scientifically sound manner. You can then compare these benchmarks and use these to decide on the optimal tool for your case. However, this procedure has its limitations: only a small set of the many aspects - typically things like mapping rate, sensitivity, speed - can be assessed in a short paper. And only certain program versions, parameter settings and data can be assessed.
Assume you have a benchmark of your favorite alignment tools, what aspects should you look for? In general, you should try to answer the following questions:
Note that besides this “hard” benchmark there are also other factors to consider: is the output or input format of the program usable for you? Has the software special features relevant for you? Is the program easy to use? Are there special license requirements or fees associated with it?
...there is no best read aligner. It really depends on your goals and the specific use case. What is the application, what sequencing technology has been used, what is the species, what are the computational constraints, etc.? You need to take into account the answer to those questions and then decide on the best read mapper taking into account the performance in the aspects important to you as well as in the software's features.
Last updated on April 07, 2016
ecSeq is a bioinformatics solution provider with solid expertise in the analysis of high-throughput sequencing data. We organize public workshops and conduct on-site trainings on NGS data analysis.
Would you like to receive updates about our NGS trainings and solutions? Then sign-up for our newsletter