Students


Interested in Biology and Maths?
Like working with computers?
Why not try Computational Biology?

________________________________________

With continued advances in the technologies for collecting biological information and data, there is an ever-increasing demand for professional scientists with skills in computational biology.

MORE INFORMATION ON...

Bachelor of Science - Major in Computational Biology

COMBINE - An organisation for Australian students and early career researchers in Bioinformatics and Computational Biology


Student research projects & opportunities


Laboratory head: Prof David Balding
UoM Organisational unit(s): School of BioSciences - http://courses.science.unimelb.edu.au/study/degrees/master-of-philosophy-science/overview OR School of Maths & Stats - http://www.ms.unimelb.edu.au/#study
Email: dbalding@unimelb.edu.au
Research Group website: https://sites.google.com/site/baldingstatisticalgenetics/

A note to potential PhD and MSc students, and some potential project areas
I work in different areas of computational statistical inference, focussing on applications in population, evolutionary, medical and forensic genetics.  Some of the main themes of my research are:
-  mathematical modelling of
◦ancestry,
◦relatedness,
◦demographic history of populations,
◦evolutionary processes such as mechanisms of selection;
-  identifying sources of biological material (identification of individual and of tissue);
-  measures of relatedness among two or more individuals and the role of relatedness in genomics analyses, including heritability analyses;
- predicting phenotypes from genotype and other data;
- association analysis, particularly in the presence of complex relatedness/population structure;
- analysis of other omics data (transcriptome, methylome, etc) in conjunction with genomics data.

Much of the above is informed by the coalescent-with-recombination model of population genetics.  Performing statistical inference for large datasets under this model remains one of the major open problems in statistical genomics.  Other statistical tools include mixed- (or penalised-) regression for large numbers of predictors (p >> n) and various multivariate statistics techniques.

I work with collaborators in many different fields: forensic science, crop research, ancient DNA, pharmaceutical companies and diseases of humans, animals and plants.  There is potential for collaborative PhD projects in many of these areas.  Most projects have a strong element of statistical modelling based on data generated by the collaborative partner or from public databases.

Students wishing to study with me should be strong on computation using R or other scientific computing languages/programs: they should at least have some experience of, and aptitude for, writing programs in R or other scientific/statistical language.  They also need a good background in maths/stats: some but not all projects require a maths/stats undergraduate degree.  Most projects require a basic knowledge of, and enthusiasm to further explore, genetics and genomics or other relevant areas of biology.


Project in Complex Trait Genomics

Project 1 
Project title: Genome-wide models for heritability and prediction
Supervisor: Prof David Balding
Location: Melbourne Integrative Genomics, University of Melbourne
Description: The heritability of a phenotype is the fraction of its variance that can be modelled by genetics. "Genetics" for this purpose used to be measured by pedigree relatedness, but this has two major limitations: the result depends on the available pedigree, and the pedigree relatedness of two individuals can describe their expected genome sharing, but not the realised value.  Nowadays, genome-wide allele sharing can be measured directly from SNP genotypes, but this has raised a lot of questions about what is the best way to represent the genetic similarity between two individuals, given their genome-wide genotypes. Analogy with the pedigree-based statistical model (a mixed regression model with variance matrix computed from pedigree-based kinship coefficients) has led researchers to a specific statistical model relating genome-wide SNP genotypes with a phenotype.

My work with collaborator Doug Speed (published recently in Nature Genetics, see below) has shown this model to be deficient.  Based on a large-scale reanalysis of GWAS data for 43 phenotypes, we developed a model that better fits real data by taking into account the effects of minor allele fraction, linkage disequilibrium and genotype quality on the heritability of a SNP.  Our new model leads to dramatic revisions of some published results in complex trait genomics.

This project will explore further implications of our superior heritability model.  These include extending the model to multivariate phenotype analysis, and in particular to improved estimates of the genetic correlation between traits.  Our LD model has substantial implications for LD Score Regression, a popular approach to analysing GWAS data that is available only in the form of summary statistics, rather than individual genotype data.  In this project we will work both to further refine the new heritability model, including its extension to summary statistics, and to improve LD Score Regression and other related methods.  This will in turn lead to better prediction models for individual and multiple phenotypes.

Speed D, …, Balding D (2017) Reevaluation of SNP heritability in complex human traits, Nature Genetics doi:10.1038/ng.3865

Available for: PhD (preferably)/MSc


Projects in Population Genomics
Project 2
Project title:
Quantifying genetic variation across multiple populations
Supervisor: Prof David Balding
Location: Melbourne Integrative Genomics, University of Melbourne
Description: The genetic distance between two populations has traditionally been measured by a parameter called FST (S is for subpopulation, T is for total population).  There has long been disagreement over its precise definition and the best way to estimate it.  These disagreements became important with the advent of genome-wide genotype data, with estimates of inter-continental human genetic distances differing almost by a factor of  two due to different definitions with different sensitivities to rare variants. The problem has now largely been resolved for pairs of populations, but there remains the problem of defining and estimating FST for multiple populations.  A recent academic visitor to Melbourne, Dr Tristan Mary-Huard from AgroParisTech, France, made considerable progress in clarifying the definition and developing fast and efficient method-of-moments estimators of FST for multiple populations.  He also developed a fast and simple procedure for building population trees in which the branch lengths correspond to FST. In this project we will develop this work further, in collaboration with Dr Mary-Huard, to construct more general graphical structures representing the genetic variation across a set of populations that may have had historical episodes of admixture, in particular the human populations of the Americas.

Available for: MSc only

Project 3
Project title:
The population history of indigenous Australians: what can the available genetic data tell us?
Supervisor: Prof David Balding and Dr Ashley Farlow
Location: Melbourne Integrative Genomics, University of Melbourne
Description: In the past year or so, three major papers have appeared making strong claims about the population history of indigenous Australians from genetic data: two appeared in Nature. One of them used autosomal DNA, the others relying on only the mitochondrial DNA. The claims from these papers appear to conflict with each other, and many appear to be too precise to be adequately supported from genetic data alone. Much of the data from these papers is available to other researchers, and other data genetic resources are available for indigenous Australians and New Guineans. Broadly speaking, the thinking behind this project is that more careful statistical inferences may be able to resolve some of the differences among these authors, and to distinguish claims that are strongly supported from those that are more speculative. There is a range of publicly available software for demographic inference from genetic data that the student will investigate and critically appraise, and we will examine the support for alternative population histories using simulation-based approximate Bayesian computation, for which generic software is also available but many parameter settings will require careful assessment.

Available for: MSc/PhD


Project in Forensic Genetics
Project 4
Project title:
Forensic weight-of-evidence for unilineal markers
Supervisors:
Prof David Balding
Location:
Melbourne Integrative Genomics, University of Melbourne

Description: Together with Dr Mikkel Andersen of Aalborg University in Denmark, I have recently published a paper and accompanying software that can approximate the distribution of the number of men matching a given Y-chromosome profile under various population genetics models.  This leads to a scientifically robust and easy to understand way to present Y-profile evidence in courts.  More work is needed to develop this method, for mixtures of Y chromosomes from different men, to take account of any known Y profiles among relatives of the suspected contributor, and also to the corresponding problem for the female-lineage mitochondrial DNA profiles.

Andersen MM, Balding DJ (2017) How convincing is a matching Y-chromosome profile? PLoS Genet 13(11): e1007028. https://doi.org/10.1371/journal.pgen.1007028

See also the Pursuit article.

Available for: MSc only


Project in Computational Statistics
Project 5
Project title:
Gaussian process regression for ABC inference
Supervisors:
Prof David Balding
Location:
Melbourne Integrative Genomics, University of Melbourne

Description: ABC, or Approximate Bayesian Computation, has revolutionised statistical inference, initially in population genetics but later spreading to other application areas, by allowing principled albeit approximate statistical inference under complex models that can have large numbers of latent (usually nuisance) variables. The key idea is very simple: (1) simulate data under the model; (2) if the simulated data are sufficiently similar to the observed data, then retain the parameters of interest from the simulation; (3) repeat until the set of retained values is large enough for accurate inferences.

There are many variations on this basic algorithm, for example retained values can be weighted rather than simply accepted or rejected.  In particular there are many ways to quantify the similarity between observed and simulated datasets, often using summary statistics.  My colleagues and I previously proposed a regression approach in which we viewed a parameter that is the target of inference as the dependent variable in a regression, with each simulated dataset a realisation of a high-dimensional predictor variable.  The simulations provide the training data from which a regression model can be fit, and the task is then to use the fitted model to predict the unobserved parameter value corresponding to the observed dataset.

We originally fitted a local-linear regression for the posterior mean of the parameter.  In this project we will investigate more sophisticated modelling approaches including Gaussian Process Regression models in the parameter space.  Note that GPR have been used in the data space, for example in synthetic regression, but we will investigate ways to model the posterior distribution of the parameter using Gaussian processes, which is a convenient class of models for which efficient software is already available.  If successful, this project could generate another major step forward in statistical inference under complex models in many application areas.

Available for: PhD only

____________________

Laboratory head: A/Prof Kathryn Holt
UoM Organisational unit: Department of Biochemistry & Molecular Biology
http://biomedicalsciences.unimelb.edu.au/departments/biochemistry/study/honours-and-masters
Email: kholt@unimelb.edu.au
Research Group website: https://holtlab.net/

Project title: Unravelling antibiotic resistant bacterial genomes using nanopore sequencing
Supervisor: A/Prof Kathryn Holt
Location: Bio21 Molecular Science and Biotechnology Institute

Description: Antibiotic resistant infections are a major global health problem, largely driven by the spread of drug resistance genes through populations of bacterial pathogens. The most common mechanism for the spread of drug resistance genes is via plasmids – small circles of DNA that can move between bacterial cells and transfer antibiotic resistance.

The movement of plasmids and drug resistance genes in natural bacterial populations can be studied by whole genome sequencing of cultured bacteria. This is most commonly done using Illumina sequencing, which involves breaking up genomic DNA (including chromosome and plasmid DNA) into 500 base pair fragments, sequencing them, and attempting to reconstruct (“assemble”) the whole genome from the resulting sequence fragments. However, the assembly process is complicated by the presence of DNA repeats, which are common in bacterial genomes and often make it impossible to disentangle chromosomal from plasmid DNA. As a result, it is often not possible to resolve plasmids from Illumina data, which poses a significant barrier to studying the spread of drug resistance.

To solve this problem, our lab has been using new DNA sequencing technology that can generate very long sequence reads (up to 1 million bp long) by passing high molecular weight DNA strands through nano-sized pores. The aim of this project will be to apply the new “nanopore sequencing” approach to completely sequence and compare dozens of antibiotic resistance plasmids from bacteria isolated in Melbourne hospitals, in order to investigate the evolution and spread of drug resistance.

Available for: Hons/MSc

____________________

Laboratory head: Dr Kim-Anh Lê Cao
UoM Organisational unit: School of Maths & Stats - http://www.ms.unimelb.edu.au/#study
Email: kimanh.lecao@unimelb.edu.au
Research Group website: http://sysgen.unimelb.edu.au/research/computational-biostatistics-methods-le-cao

Project Title: Multivariate computational methods for data integration of single cell assays
Supervisors: Dr Kim-Anh Le Cao and Dr Jarny Choi (jarnyc@unimelb.edu.au)
Location:
Melbourne Integrative Genomics and Centre for Stem Cell Systems, University of Melbourne

Description: High-throughput single cell molecular profiling gives our scientific community the unique opportunity to define cell types with distinctive molecular profiles to unprecedented depths. However, identifying novel cell types relies on the ability to combine and integrate different types of independent assays (performed in different laboratories) to obtain generalizable and reproducible results. Our main challenges are data heterogeneity and large-scale datasets (many cells and many transcripts). The project will focus on the extension of our projection-based multivariate methods implemented in mixOmics (www.mixOmics.org) to single cell sequencing datasets to address these challenges and identify robust gene signatures that characterize the novel cell subtypes.

Background reading: Regev et al. (2017) The Human Cell Atlas bioRxiv
http://www.biorxiv.org/content/early/2017/05/08/121202

Suitable for: Students with a background in statistics or computer science, and an interest in cell biology.

Available for: MSc/PhD, but also smaller undergraduate projects.

____________________

Laboratory head: A/Prof James McCaw
UoM Organisational unit: School of Mathematics and Statistics
http://www.ms.unimelb.edu.au/#study
Email: jamesm@unimelb.edu.au
Research Group website: https://sites.google.com/site/jamesmccaw/home

Project 1
Project title:
Within host pathogen dynamics modelling
Supervisor: A/Prof James McCaw

Description:
Projects focusing on varied aspects of the host-pathogen interaction are available, on diseases such as influenza and malaria. Biological topics for investigation include: the role of innate and adaptive immunity in controlling infection, the role of drugs in controlling infection and the development of drug resistance, and evolutionary aspects of infection including genetic drift and selection. Mathematical techniques required are varied but include: deterministic and stochastic modelling, dynamical systems analyses including numerical bifurcation studies and biostatistical studies, including Bayesian hierarchical modelling of dynamic non-linear systems.

Suitable for:
Students with a background in applied mathematics, applied probability, physics and other quantitative and computational sciences, and those with training in biology (e.g. microbiology, immunology or parisitology) who have a strong mathematical and/or computational interest.

Project 2
Project title:
Mathematical epidemiology and infectious diseases modelling
Supervisor: A/Prof James McCaw

Description: Projects are available on the development, analysis and application of models of infectious disease transmission in human populations. Both theoretically focussed projects and applied public health projects are available. Diseases of interest include influenza, malaria, emerging and re-emerging diseases such as Ebola and vaccine preventable diseases such as pertussis.

Suitable for: Students with a background in applied mathematics, applied probability, physics and other quantitative and computational sciences, and those with training in epidemiology or public health who have a strong mathematical and/or computational interest.

Available for: Hons/MSc

___________________

Laboratory head: Prof Jodie McVernon
UoM Organisational unit: The Peter Doherty Institute for Infection and Immunity
Email: j.mcvernon@unimelb.edu.au
Research Group website: http://mspgh.unimelb.edu.au/research-groups/centre-for-epidemiology-and-biostatistics-research/modelling-and-simulation

Project title:
Developing mathematical models of influenza immunity from cohort studies
Supervisor: Prof Jodie McVernon

Description:
Following influenza (flu) infection, antibodies are produced that protect against re-infection with the same flu strain. However, over successive flu seasons, circulating viruses accumulate mutations that render this immunity relatively ineffective – a process known as antigenic drift. We have tracked flu infections and illnesses in a cohort of Vietnamese households over 11 years. This rare data documenting infection intervals and antibody responses will be used to develop and validate statistical and mathematical models of influenza infection and immunity, and to better understand the contributions of immune waning, and cross-strain immunity. Insights gained will inform public health strategies for flu prevention, including vaccination.

Available for: PhD

___________________

Laboratory heads: Dr Bernard Pope, Head of Clinical Genomics and Head of Cancer Genomics & Dr Daniel Park, Head of the Melbourne Bioinformatics Platform and Head of the Genomic Technologies Group
UoM Organisational unit: VLSCI, Faculty of Medicine, Dentistry & Health Sciences
Email: bjpope@unimelb.edu.au OR djp@unimelb.edu.au
Research Group website: www.vlsci.org.au

Project Title: Research Masters and PhD projects targeting health and medical research in life science computing
Supervisors: Dr Bernard Pope and Dr Daniel Park

Description: Traditionally, medical advances have been yielded from analyses of small datasets resulting from relatively contained experimental designs. In recent times, however, great progress has been made in generating large, curated and publicly available collections of data that have the potential to be harnessed to address important medical questions. The VLSCI combines a wealth of human expertise with powerful computational resources. We are a multidisciplinary organisation, consisting of computer scientists, molecular biologists and bioinformaticians with a strong track record of collaborative research. We have identified a number of exciting biomedical research projects with the potential for significant real-world impact, such as improving cancer treatments and diagnostics, and are seeking expressions of interest from students. The focal areas of research will expose students to functional genomics, application and development of machine learning techniques for bioinformatics, as well as developing computational tools to decipher molecular mechanisms of diseases.

Suitable for: Prospective Research Masters or PhD candidates who excel in data analytics, and/or computer science and are driven to apply these skills in this rapidly developing area of medical research.

Available for: MSc/PhD

___________________

Laboratory head: Dr Matthew Ritchie
UoM Organisational unit: WEHI, Department of Medical Biology
http://mdhs.unimelb.edu.au/our-organisation/institutes-centres-departments/department-of-medical-biology
Email: mritchie@wehi.edu.au
Research Group website: http://www.wehi.edu.au/people/matthew-ritchie

Project Title: Long-read sequencing for transcriptome and epigenome analysis
Supervisors: Dr Matthew Ritchie, Dr Charity Law and A/Prof Marnie Blewitt

Description: Long-read sequencing as generated by Pacific Biosciences’ new Sequel platform or Oxford Nanopore Technology’s MinION and PromethION devices provide researchers with the ideal tool for resolving transcript architecture, studying methylation patterns genome-wide and assembling genomes. This project will explore a number of applications of long-read technology using data sets that look at splicing patterns in platelets, methylation changes associated with X-inactivation and genome structure in cell-lines. It will involve collaboration with researchers at CSL and the successful candidate may be eligible for an industry top-up scholarship.

Suitable for: A student with a strong statistical or computational background (e.g. an undergraduate degree in Statistics and Mathematics), programming skills and an interest in biology.

Available for: MSc/PhD

___________________

Laboratory head: Dr Wei Shi
UoM Organisational unit: WEHI, Department of Medical Biology
http://mdhs.unimelb.edu.au/our-organisation/institutes-centres-departments/department-of-medical-biology
Email: shi@wehi.edu.au
Research Group website: http://www.wehi.edu.au/people/wei-shi

Project Title: Biological sequence analysis and genomic variant discovery
Supervisors: Dr Wei Shi, Prof Gordon Smyth

Description: Next-generation sequencing (NGS) technologies are increasingly used in laboratories and clinics worldwide to facilitate better understanding and diagnosis of diseases. The massive volume of data from these technologies continues to pose significant challenges for bioinformaticians. We are interested in developing novel methods for mapping both long and short NGS reads (and other biological sequences) to a reference genome to find the true origin of biological sequences (Liao et al., Nucleic Acids Research, 2013,41(10):e108). We would also like to develop more accurate methods for detecting genomic variants (eg. insertions, deletions, translocations etc.) in cancer genomes using NGS data.

Suitable for: Prospective students are expected to have a computer science background and/or have strong programming skills. One or two projects are available for PhD or Masters study.

Available for: MSc/PhD

___________________

Laboratory head: Dr Wei Shi; Prof Phil Hodgkin
UoM Organisational unit: WEHI, Department of Medical Biology
http://mdhs.unimelb.edu.au/our-organisation/institutes-centres-departments/department-of-medical-biology
Email: shi@wehi.edu.auhodgkin@wehi.edu.au
Research Group website: http://www.wehi.edu.au/people/wei-shihttp://www.wehi.edu.au/people/phil-hodgkin

Project Title: Reconstructing the immune response: from molecules to cells to systems
Supervisors: Prof Phil Hodgkin, Dr Wei Shi, Dr Andrey Kan

Description: Lymphocyte responses are complex processes whereby B and T cells undergo programmed rounds of proliferation and death triggered by pathogenic stimuli. Adequate timing and magnitude of the response are essential for effective pathogen clearance. The collective behaviour of a population of cells emerges from gene regulatory networks controlling individual cells. However, transcriptional regulation of lymphocyte responses is poorly understood. In this interdisciplinary project we will use bioinformatics methods to develop predictive mathematical models of lymphocyte responses. We will experimentally establish transcriptional profiles at different stages of the response, and interrogate these data using advanced statistical methods.

Suitable for: The student will learn both “wet lab” experimental techniques, such as cell proliferation assays and DNA sequencing, and “dry lab” skills for data analysis, such as developing statistical models using R and Python.

Available for: Hons/PhD

___________________

Laboratory head: Prof Karin Verspoor
UoM Organisational unit: School of Computing and Information Systems
http://cis.unimelb.edu.au/study/graduate/
Email: karin.verspoor@unimelb.edu.ausaralph@unimelb.edu.au
Research Group website: http://www.textminingscience.com/

Project title: Text mining for extraction of subcellular localisation.
Supervisors: Prof Karin Verspoor and Dr Stuart Ralph.

Description: We are interested in automatically extracting information from the literature that specifies the sub-cellular localisation of proteins in the malaria parasite Plasmodium falciparum, and in related human parasites. Information about sub-cellular localisation in infectious agents is crucial to prioritising targets for drugs and vaccines. We have built a database that details subcellular localisation of hundreds of Plasmodium proteins (http://apiloc.biochem.unimelb.edu.au/apiloc/apiloc), and will use this as a training set for Biomedical Natural Language Processing. The project will involve construction of a tool to recognise and extract records of cellular localisation for proteins.

Suitable for: Students with a background in computing, with interests in Natural Language Processing, Text mining, Bioinformatics.

Available for: Masters/PhD

___________________

Laboratory head: Prof Christine Wells
UoM Organisational unit: Department of Anatomy & Neuroscience, Kenneth Myer Building, ph. 8344 3795
http://biomedicalsciences.unimelb.edu.au/departments/anatomy-and-neuroscience/study/honours-And-masters-by-coursework
Email: wells.c@unimelb.edu.au
Research Group website: http://www.stemformatics.org

Project 1
Project Title: Predicting stem cell behaviour.
Supervisors: Prof Christine Wells and Dr Kim-Anh Le Cao (kimanh.lecao@unimelb.edu.au).

Description: Mesenchymal stromal cells (MSC) are resident tissue cells that are increasingly used for treatment of inflammatory disorders. However the identity and function of these cells remains controversial. We recently published a computational tool that can predict the identity of MSC, and this project builds on this tool to find predictors of MSC function. The project is held jointly between the Centre for system genomics and laboratory of Dr Kim-Anh Le Cao, and the Centre for Stem Cell Systems in the laboratory of Professor Christine Wells.

Suitable for: Maths-stats savvy students, interested in dimension reduction of large biological data sets and variable selection methodologies to find clinical predictors of cell function.

Project 2
Project Title: Finding signatures of cell identity in Stemformatics
Supervisors: Dr Jarny Choi (jarnyc@unimelb.edu.au) and Prof Christine Wells.

Description: Stemformatics (stemformatics.org) is an established data portal containing over 350 gene expression datasets, and ~10,000 samples, with a major focus on stem cells and their stages of differentiation. A major question arising from this data, is whether we can identify robust signatures to identify stem cells, and differentiated progeny. This project involves extensive mining of the datasets within Stemformatics to find those signatures of cell identity. The project will give the student some basic skills in navigating and integrating large-scale datasets. The student will identify genes that can be used as controls in data normalisation and integration; evaluate patterns that distinguish different cell types and benchmark laboratory-derived cells against in vivo, developmental equivalents. The project has scope for exploring innovative visualisations that can summarise large amounts of data succinctly.

Suitable for: A student with a strong maths or computer science background, with either some knowledge of programming or is able to learn data manipulation techniques rapidly.

Available for: Hons/MSc

___________________