Students


Interested in Biology and Maths?
Like working with computers?
Why not try Computational Biology?

________________________________________

With continued advances in the technologies for collecting biological information and data, there is an ever-increasing demand for professional scientists with skills in computational biology.

MORE INFORMATION ON...

Bachelor of Science - Major in Computational Biology

COMBINE - An organisation for Australian students and early career researchers in Bioinformatics and Computational Biology


Student research projects & opportunities


Laboratory head: Prof David Balding
UoM Organisational unit(s): School of BioSciences - http://courses.science.unimelb.edu.au/study/degrees/master-of-philosophy-science/overview OR School of Maths & Stats - http://www.ms.unimelb.edu.au/#study
Email: dbalding@unimelb.edu.au
Research Group website: https://sites.google.com/site/baldingstatisticalgenetics/

Project 1

Project title: Single cell RNAseq and the Cell Atlas
Supervisors: Prof David Balding and Prof Christine Wells
Location: Centre for Systems Genomics and Centre for Stem Cell Systems, University of Melbourne

Description: There is a lot of excitement in the biology community with the announcement of the Cell atlas: see https://www.humancellatlas.org and https://www.humancellatlas.org/files/1-key%20questions%20254.pdf. One key problem is the deconvolution of the state changes of populations of cells from the same tissue. Cells are in flux, and different cell subtypes can vary in distinct ways, even when populations of cells maintain a stable phenotype; large-scale state changes can also sometimes occur. RNA sequencing destroys the target cell and so fluctuations over time in a single cell cannot be measured. How do we distinguish changes that represent normal homeostatic flux from differences in cell subtypes or a long-term transition of cell states?
Populations of cells from the same tissue represent an ensemble of that cell state (each cell captured in isolation will be in a slightly different point in the normal homeostatic flux). Important covariates for the analysis of single-cell transcriptomes are the expression quantitative trait loci (eQTL) obtained from populations of isogenic neighbouring cells. eQTL are sequence variants correlated with expression levels and so different genotypes at eQTL sites can explain some between-cell differences and so help to deconvolve the states of a population of cells.

In this project we will explore different statistical modelling approaches to single-cell transcriptomic data, using eQTL as covariates, to try to understand the states of different cells and how they evolve over time.

Suitable for: Students with a background in statistics. mathematics or computation and an interest in cell biology.

Available for: PhD/MSc

Project 2
Project title:
Inference under the coalescent-with-recombination
Supervisor: Prof David Balding
Location: Centre for Systems Genomics, University of Melbourne

Description: This is a challenging computational-statistics project aimed at one of the central problems of statistical genetics: how to draw valid and efficient statistical inferences from genome-wide SNP or sequence data from large numbers of individuals. A widely-accepted probabilistic model for the shared ancestry of genomes is the coalescent-with-recombination (CwR), which can be combined with standard models for evolutionary parameters such as mutation and recombination rates, as well as demographic parameters such as migration and population growth rates. The challenge is to convert the resulting joint prior distribution into a posterior distribution given observed sequence data.

Currently ArgWeaver represents state-of-art for approximate Bayesian inference under the CwR, but is limited to small sample sizes. Progress was made at genome-wide scale for the two haploid genomes within an individual by Li and Durbin assuming the sequentially Markov coalescent model which simplifies the CwR by assuming that it is Markov when viewed as a process along the genome. My collaborators at UCL (London) have developed a novel "bridge sampling" approach that uses a "divide and conquer" technique in which exhaustive searches are performed in short, overlapping genome intervals. This approach is promising because the algorithm is highly parallelisable, but it is currently limited in scope. Recently Kelleher et al (2016) have published a highly-efficient way to simulate and to store genome-wide coalescent trees. This potentially generates new possibilities for inference that will be explored in this project.

Reference: Kelleher J, Etheridge AM, McVean G (2016) Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Comp Biol 12(5): e1004842. doi:10.1371/journal.pcbi.1004842

Suitable for: Students with a background in computational statistics

Available for: PhD only

Project 3

Project title: Genome-wide models for heritability and prediction
Supervisors: Prof David Balding, in collaboration with Dr Doug Speed (UCL Genetics Institute)
Location: Centre for Systems Genomics, University of Melbourne

Description: The heritability of a phenotype is the fraction of its variance that can be modelled by genetics."Genetics" for this purpose used to be measured by pedigree relatedness, but this has two major limitations: the result depends on the available pedigree, and the pedigree relatedness of two individuals can describe their expected genome sharing, but not the realised value. Nowadays, genome-wide allele sharing can be measured directly from SNP genotypes, but this has raised a lot of questions about what is the best way to represent the genetic similarity between two individuals, given their genome-wide genotypes. Analogy with the pedigree-based statistical model (a mixed regression model in which the variance matrix was modelled by pedigree-based kinship coefficients) has led researchers to a specific statistical model relating genome-wide SNP genotypes with a phenotype.

My work with collaborator Doug Speed (Speed et al, BioRχiv, https://doi.org/10.1101/074310) has shown this model to be deficient. Based on a large-scale reanalysis of GWAS data for 43 phenotypes, we developed a model that better fits real data by taking into account the effects of minor allele fraction, linkage disequilibrium and genotype quality on the heritability of a SNP.

Our superior heritability model has many implications for complex trait genetics that this project will explore. These include extending the model to multivariate phenotype analysis, and in particular to improved estimates of the genetic correlation between traits. Our LD model has substantial implications for LD Score Regression, a popular approach to analysing GWAS data that is available only in the form of summary statistics, rather than individual genotype data. In this project we will work both to further refine the new heritability model, including its extension to summary statistics, and to improve LD Score Regression and other related methods. This will in turn lead to better prediction models for individual and multiple phenotypes.

Suitable for: Students with a background in statistics and some experience of statistical software.

Available for: PhD/MSc

____________________

Laboratory head: A/Prof Kathryn Holt
UoM Organisational unit: Department of Biochemistry & Molecular Biology
http://biomedicalsciences.unimelb.edu.au/departments/biochemistry/study/honours-and-masters
Email: kholt@unimelb.edu.au
Research Group website: https://holtlab.net/

Project title: Unravelling antibiotic resistant bacterial genomes using nanopore sequencing
Supervisor: A/Prof Kathryn Holt
Location: Bio21 Molecular Science and Biotechnology Institute

Description: Antibiotic resistant infections are a major global health problem, largely driven by the spread of drug resistance genes through populations of bacterial pathogens. The most common mechanism for the spread of drug resistance genes is via plasmids – small circles of DNA that can move between bacterial cells and transfer antibiotic resistance.

The movement of plasmids and drug resistance genes in natural bacterial populations can be studied by whole genome sequencing of cultured bacteria. This is most commonly done using Illumina sequencing, which involves breaking up genomic DNA (including chromosome and plasmid DNA) into 500 base pair fragments, sequencing them, and attempting to reconstruct (“assemble”) the whole genome from the resulting sequence fragments. However, the assembly process is complicated by the presence of DNA repeats, which are common in bacterial genomes and often make it impossible to disentangle chromosomal from plasmid DNA. As a result, it is often not possible to resolve plasmids from Illumina data, which poses a significant barrier to studying the spread of drug resistance.

To solve this problem, our lab has been using new DNA sequencing technology that can generate very long sequence reads (up to 1 million bp long) by passing high molecular weight DNA strands through nano-sized pores. The aim of this project will be to apply the new “nanopore sequencing” approach to completely sequence and compare dozens of antibiotic resistance plasmids from bacteria isolated in Melbourne hospitals, in order to investigate the evolution and spread of drug resistance.

Available for: Hons/MSc

____________________

Laboratory head: Dr Kim-Anh Lê Cao
UoM Organisational unit: School of Maths & Stats - http://www.ms.unimelb.edu.au/#study
Email: kimanh.lecao@unimelb.edu.au
Research Group website: http://sysgen.unimelb.edu.au/research/computational-biostatistics-methods-le-cao

Project Title: Multivariate computational methods for data integration of single cell assays
Supervisors: Dr Kim-Anh Le Cao and Dr Jarny Choi (jarnyc@unimelb.edu.au)
Location:
Centre for Systems Genomics and Centre for Stem Cell Systems, University of Melbourne

Description: High-throughput single cell molecular profiling gives our scientific community the unique opportunity to define cell types with distinctive molecular profiles to unprecedented depths. However, identifying novel cell types relies on the ability to combine and integrate different types of independent assays (performed in different laboratories) to obtain generalizable and reproducible results. Our main challenges are data heterogeneity and large-scale datasets (many cells and many transcripts). The project will focus on the extension of our projection-based multivariate methods implemented in mixOmics (www.mixOmics.org) to single cell sequencing datasets to address these challenges and identify robust gene signatures that characterize the novel cell subtypes.

Background reading: Regev et al. (2017) The Human Cell Atlas bioRxiv
http://www.biorxiv.org/content/early/2017/05/08/121202

Suitable for: Students with a background in statistics or computer science, and an interest in cell biology.

Available for: MSc/PhD, but also smaller undergraduate projects.

____________________

Laboratory head: A/Prof James McCaw
UoM Organisational unit: School of Mathematics and Statistics
http://www.ms.unimelb.edu.au/#study
Email: jamesm@unimelb.edu.au
Research Group website: https://sites.google.com/site/jamesmccaw/home

Project 1
Project title:
Within host pathogen dynamics modelling
Supervisor: A/Prof James McCaw

Description:
Projects focusing on varied aspects of the host-pathogen interaction are available, on diseases such as influenza and malaria. Biological topics for investigation include: the role of innate and adaptive immunity in controlling infection, the role of drugs in controlling infection and the development of drug resistance, and evolutionary aspects of infection including genetic drift and selection. Mathematical techniques required are varied but include: deterministic and stochastic modelling, dynamical systems analyses including numerical bifurcation studies and biostatistical studies, including Bayesian hierarchical modelling of dynamic non-linear systems.

Suitable for:
Students with a background in applied mathematics, applied probability, physics and other quantitative and computational sciences, and those with training in biology (e.g. microbiology, immunology or parisitology) who have a strong mathematical and/or computational interest.

Project 2
Project title:
Mathematical epidemiology and infectious diseases modelling
Supervisor: A/Prof James McCaw

Description: Projects are available on the development, analysis and application of models of infectious disease transmission in human populations. Both theoretically focussed projects and applied public health projects are available. Diseases of interest include influenza, malaria, emerging and re-emerging diseases such as Ebola and vaccine preventable diseases such as pertussis.

Suitable for: Students with a background in applied mathematics, applied probability, physics and other quantitative and computational sciences, and those with training in epidemiology or public health who have a strong mathematical and/or computational interest.

Available for: Hons/MSc

___________________

Laboratory head: Prof Jodie McVernon
UoM Organisational unit: The Peter Doherty Institute for Infection and Immunity
Email: j.mcvernon@unimelb.edu.au
Research Group website: http://mspgh.unimelb.edu.au/research-groups/centre-for-epidemiology-and-biostatistics-research/modelling-and-simulation

Project title:
Developing mathematical models of influenza immunity from cohort studies
Supervisor: Prof Jodie McVernon

Description:
Following influenza (flu) infection, antibodies are produced that protect against re-infection with the same flu strain. However, over successive flu seasons, circulating viruses accumulate mutations that render this immunity relatively ineffective – a process known as antigenic drift. We have tracked flu infections and illnesses in a cohort of Vietnamese households over 11 years. This rare data documenting infection intervals and antibody responses will be used to develop and validate statistical and mathematical models of influenza infection and immunity, and to better understand the contributions of immune waning, and cross-strain immunity. Insights gained will inform public health strategies for flu prevention, including vaccination.

Available for: PhD

___________________

Laboratory heads: Dr Bernard Pope, Head of Clinical Genomics and Head of Cancer Genomics & Dr Daniel Park, Head of the Melbourne Bioinformatics Platform and Head of the Genomic Technologies Group
UoM Organisational unit: VLSCI, Faculty of Medicine, Dentistry & Health Sciences
Email: bjpope@unimelb.edu.au OR djp@unimelb.edu.au
Research Group website: www.vlsci.org.au

Project Title: Research Masters and PhD projects targeting health and medical research in life science computing
Supervisors: Dr Bernard Pope and Dr Daniel Park

Description: Traditionally, medical advances have been yielded from analyses of small datasets resulting from relatively contained experimental designs. In recent times, however, great progress has been made in generating large, curated and publicly available collections of data that have the potential to be harnessed to address important medical questions. The VLSCI combines a wealth of human expertise with powerful computational resources. We are a multidisciplinary organisation, consisting of computer scientists, molecular biologists and bioinformaticians with a strong track record of collaborative research. We have identified a number of exciting biomedical research projects with the potential for significant real-world impact, such as improving cancer treatments and diagnostics, and are seeking expressions of interest from students. The focal areas of research will expose students to functional genomics, application and development of machine learning techniques for bioinformatics, as well as developing computational tools to decipher molecular mechanisms of diseases.

Suitable for: Prospective Research Masters or PhD candidates who excel in data analytics, and/or computer science and are driven to apply these skills in this rapidly developing area of medical research.

Available for: MSc/PhD

___________________

Laboratory head: Dr Matthew Ritchie
UoM Organisational unit: WEHI, Department of Medical Biology
http://mdhs.unimelb.edu.au/our-organisation/institutes-centres-departments/department-of-medical-biology
Email: mritchie@wehi.edu.au
Research Group website: http://www.wehi.edu.au/people/matthew-ritchie

Project Title: Long-read sequencing for transcriptome and epigenome analysis
Supervisors: Dr Matthew Ritchie, Dr Charity Law and A/Prof Marnie Blewitt

Description: Long-read sequencing as generated by Pacific Biosciences’ new Sequel platform or Oxford Nanopore Technology’s MinION and PromethION devices provide researchers with the ideal tool for resolving transcript architecture, studying methylation patterns genome-wide and assembling genomes. This project will explore a number of applications of long-read technology using data sets that look at splicing patterns in platelets, methylation changes associated with X-inactivation and genome structure in cell-lines. It will involve collaboration with researchers at CSL and the successful candidate may be eligible for an industry top-up scholarship.

Suitable for: A student with a strong statistical or computational background (e.g. an undergraduate degree in Statistics and Mathematics), programming skills and an interest in biology.

Available for: MSc/PhD

___________________

Laboratory head: Dr Wei Shi
UoM Organisational unit: WEHI, Department of Medical Biology
http://mdhs.unimelb.edu.au/our-organisation/institutes-centres-departments/department-of-medical-biology
Email: shi@wehi.edu.au
Research Group website: http://www.wehi.edu.au/people/wei-shi

Project Title: Biological sequence analysis and genomic variant discovery
Supervisors: Dr Wei Shi, Prof Gordon Smyth

Description: Next-generation sequencing (NGS) technologies are increasingly used in laboratories and clinics worldwide to facilitate better understanding and diagnosis of diseases. The massive volume of data from these technologies continues to pose significant challenges for bioinformaticians. We are interested in developing novel methods for mapping both long and short NGS reads (and other biological sequences) to a reference genome to find the true origin of biological sequences (Liao et al., Nucleic Acids Research, 2013,41(10):e108). We would also like to develop more accurate methods for detecting genomic variants (eg. insertions, deletions, translocations etc.) in cancer genomes using NGS data.

Suitable for: Prospective students are expected to have a computer science background and/or have strong programming skills. One or two projects are available for PhD or Masters study.

Available for: MSc/PhD

___________________

Laboratory head: Dr Wei Shi; Prof Phil Hodgkin
UoM Organisational unit: WEHI, Department of Medical Biology
http://mdhs.unimelb.edu.au/our-organisation/institutes-centres-departments/department-of-medical-biology
Email: shi@wehi.edu.auhodgkin@wehi.edu.au
Research Group website: http://www.wehi.edu.au/people/wei-shihttp://www.wehi.edu.au/people/phil-hodgkin

Project Title: Reconstructing the immune response: from molecules to cells to systems
Supervisors: Prof Phil Hodgkin, Dr Wei Shi, Dr Andrey Kan

Description: Lymphocyte responses are complex processes whereby B and T cells undergo programmed rounds of proliferation and death triggered by pathogenic stimuli. Adequate timing and magnitude of the response are essential for effective pathogen clearance. The collective behaviour of a population of cells emerges from gene regulatory networks controlling individual cells. However, transcriptional regulation of lymphocyte responses is poorly understood. In this interdisciplinary project we will use bioinformatics methods to develop predictive mathematical models of lymphocyte responses. We will experimentally establish transcriptional profiles at different stages of the response, and interrogate these data using advanced statistical methods.

Suitable for: The student will learn both “wet lab” experimental techniques, such as cell proliferation assays and DNA sequencing, and “dry lab” skills for data analysis, such as developing statistical models using R and Python.

Available for: Hons/PhD

___________________

Laboratory head: Prof Karin Verspoor
UoM Organisational unit: School of Computing and Information Systems
http://cis.unimelb.edu.au/study/graduate/
Email: karin.verspoor@unimelb.edu.ausaralph@unimelb.edu.au
Research Group website: http://www.textminingscience.com/

Project title: Text mining for extraction of subcellular localisation.
Supervisors: Prof Karin Verspoor and Dr Stuart Ralph.

Description: We are interested in automatically extracting information from the literature that specifies the sub-cellular localisation of proteins in the malaria parasite Plasmodium falciparum, and in related human parasites. Information about sub-cellular localisation in infectious agents is crucial to prioritising targets for drugs and vaccines. We have built a database that details subcellular localisation of hundreds of Plasmodium proteins (http://apiloc.biochem.unimelb.edu.au/apiloc/apiloc), and will use this as a training set for Biomedical Natural Language Processing. The project will involve construction of a tool to recognise and extract records of cellular localisation for proteins.

Suitable for: Students with a background in computing, with interests in Natural Language Processing, Text mining, Bioinformatics.

Available for: Masters/PhD

___________________

Laboratory head: Prof Christine Wells
UoM Organisational unit: Department of Anatomy & Neuroscience, Kenneth Myer Building, ph. 8344 3795
http://biomedicalsciences.unimelb.edu.au/departments/anatomy-and-neuroscience/study/honours-And-masters-by-coursework
Email: wells.c@unimelb.edu.au
Research Group website: http://www.stemformatics.org

Project 1
Project Title: Predicting stem cell behaviour.
Supervisors: Prof Christine Wells and Dr Kim-Anh Le Cao (kimanh.lecao@unimelb.edu.au).

Description: Mesenchymal stromal cells (MSC) are resident tissue cells that are increasingly used for treatment of inflammatory disorders. However the identity and function of these cells remains controversial. We recently published a computational tool that can predict the identity of MSC, and this project builds on this tool to find predictors of MSC function. The project is held jointly between the Centre for system genomics and laboratory of Dr Kim-Anh Le Cao, and the Centre for Stem Cell Systems in the laboratory of Professor Christine Wells.

Suitable for: Maths-stats savvy students, interested in dimension reduction of large biological data sets and variable selection methodologies to find clinical predictors of cell function.

Project 2
Project Title: Finding signatures of cell identity in Stemformatics
Supervisors: Dr Jarny Choi (jarnyc@unimelb.edu.au) and Prof Christine Wells.

Description: Stemformatics (stemformatics.org) is an established data portal containing over 350 gene expression datasets, and ~10,000 samples, with a major focus on stem cells and their stages of differentiation. A major question arising from this data, is whether we can identify robust signatures to identify stem cells, and differentiated progeny. This project involves extensive mining of the datasets within Stemformatics to find those signatures of cell identity. The project will give the student some basic skills in navigating and integrating large-scale datasets. The student will identify genes that can be used as controls in data normalisation and integration; evaluate patterns that distinguish different cell types and benchmark laboratory-derived cells against in vivo, developmental equivalents. The project has scope for exploring innovative visualisations that can summarise large amounts of data succinctly.

Suitable for: A student with a strong maths or computer science background, with either some knowledge of programming or is able to learn data manipulation techniques rapidly.

Available for: Hons/MSc

___________________