Seminar - Philip Clausen - Optimizing alignment for microbial detection and analysis with KMA
Philip T.L.C. Clausen1, Frank M. Aarestrup1 & Ole Lund1
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, 2800 Kgs Lyngby, Denmark
Over the past decade several alignment methods have been designed to cope with the vast amounts of sequence data produced by the second generation of sequencing machines. We are now facing the third generation of sequencing technology, and new methods for alignment are on the raise to match the new features seen in sequencing. As a result we now have an arsenal of alignment methods at our choosing all optimized to a certain sequencing technology, but unfortunately mostly optimized to call SNPs in the human genome as well. These optimizations comes with a number of assumptions and features that are reasonable for analysis looking like the human genome projects and SNP calling. However, this analogy often ends quite quickly when we move into the world of microbial diagnostics, surveillance and analysis based on bioinformatics. We present KMA (K-Mer Alignment), designed specifically to handle the problems seen in microbial sequence analysis. Which includes scaling with the ever growing amounts of sequence data, keeping a short turnaround time so that decisions can be made in time, and the distinguishment and handling of highly similar sequences without the need of clustering or LCA- algorithms by introducing the novel ConClave algorithm. With KMA, we have given the bioinformaticians in public health microbiology a powerful tool to save the world. So far we have seen promising results in global AMR surveillance, phylogeny based solely on MinIon data, species composition and SNP-calling in metagenomic samples. All of which is currently being tested and implemented in remote parts of the world, currently at KCRI Tanzania without the usage of big cluster computers. Additionally, we have migrated all the CGE-tools (http://www.genomicepidemiology.org) to use KMA, allowing for shorter turnaround time and thereby increasing the capacity of our old servers to last even longer. For some of the extreme cases, such as global AMR surveillance based on sewage sequencing, the unzipping of data has become the bottle neck, taking up more than 90% of the computational resources. Which have enabled global AMR surveillance in minutes, while the species composition is performed in hours.
Host: Heroen Verbruggen