MIG Seminar – Jukka Corander – 11th December, 2017

More Information

Andrew Siebel

asiebel@unimelb.edu.au

T: +61 3 8344 0707

Jukka Corander

Department of Mathematics and Statistics, University of Helsinki

Friday 11th December
12-1pm
Evan Williams Theatre, Peter Hall Building, The University of Melbourne

Statistical Learning of Ultra High-Dimensional Potts Models in Genomics

Abstract
The potential for genome-wide modeling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) with Potts models has earlier been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 10000-100000 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here we introduce a novel inference method (SuperDCA) which employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 100000 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA thus holds considerable potential in building understanding about numerous organisms at a systems biological level.

Bio
Professor at University of Helsinki and at University of Oslo. PhD degree from Stockholm University year 2000. PhD thesis about Bayesian learning of graphical models. Scientific areas of particular interest: statistical genetics, bioinformatics, graphical models, stochastic simulation, machine learning, theory of classification.

Enquiries: Andrew Siebel (asiebel@unimelb.edu.au)