SysGen Seminar – Aaron Darling – 25th August, 2017
The ithree Institute, University of Technology Sydney
Friday 25th August
FW Jones Theatre, Level 3 Medical Building, The University of Melbourne
Metagenome assembly, probabilistic graph factorisation, and real-time Bayesian phylogenetics
Metagenome sequencing has the potential to offer deep insights into the microbial communities that are pervasive in nature. Metagenomic data, however, are fragmentary with individual sequence reads each likely to have derived different cells in the sample, and with no associated information on cellular origin. Recent analysis methods for metagenomes have succeeded in reconstructing genomes from metagenomes, in a process called genome binning. These so-called metagenome-assembled genomes (MAGs) or population genomes represent a major technical advance because they transform metagenome analysis into the familiar problem of isolate genome analysis, which is well understood. However, genome binning methods themselves fail to capture some of the most important genomes in major ecosystems, often due to the presence of strain-level diversity in microbial ecosystems. In the first half of this seminar I will describe recent work by international teams to resolve strain-level genomic diversity, including an overview of probabilistic approaches to decomposing the metagenome assembly graph into strain-level subgraphs.
In the second half of this seminar I will discuss the use of phylogenetic algorithms in genomic epidemiology. Phylogenetics has found a very practical application in infectious disease outbreak surveillance and management, however, current algorithms struggle to keep pace with the rate of data generation. Databases like the US FDA's GenomeTrakr now contain > 50k Salmonella genomes and accrue new genomes at a rate > 50/day. Current gold-standard methods for phylogenetic analysis, such as the MCMC methods implemented in BEAST or MrBayes, have difficulty scaling beyond several hundred genomes and existing analyses can not be updated when new data becomes available, yet it is essential that the new data be analysed quickly if the promise of genomic epidemiology is to be fully realised. I will describe recent work on algorithms to update phylogenetic posteriors with new sequences using Sequential Monte Carlo, providing the capability to quickly update existing analyses when new data become available.
A/Prof Aaron Darling has a long-standing interest in bioinformatics and computational biology. He currently leads a group at the ithree institute, an infectious disease research institute affiliated with the University of Technology Sydney. Research in the group focuses on the development and application of computational, statistical, and molecular methods for characterising human-associated microbial communities -- the microbiome.
Enquiries: Andrew Siebel (email@example.com)