Students


Interested in Biology and Maths?
Like working with computers?
Why not try Computational Biology?

________________________________________

With continued advances in the technologies for collecting biological information and data, there is an ever-increasing demand for professional scientists with skills in computational biology.

MORE INFORMATION ON...

Bachelor of Science - Major in Computational Biology

COMBINE - An organisation for Australian students and early career researchers in Bioinformatics and Computational Biology


Student research projects & opportunities


Laboratory head: Prof David Balding
UoM Organisational unit(s): School of BioSciences - http://courses.science.unimelb.edu.au/study/degrees/master-of-philosophy-science/overview OR School of Maths & Stats - http://www.ms.unimelb.edu.au/#study
Email: dbalding@unimelb.edu.au
Research Group website: https://sites.google.com/site/baldingstatisticalgenetics/

Project 1

Project title: Single cell RNAseq and the Cell Atlas
Supervisors: Prof David Balding and Prof Christine Wells
Location: Centre for Systems Genomics and Centre for Stem Cell Systems, University of Melbourne
Description: There is a lot of excitement in the biology community with the announcement of the Cell atlas: see https://www.humancellatlas.org and https://www.humancellatlas.org/files/1-key%20questions%20254.pdf. One key problem is the deconvolution of the state changes of populations of cells from the same tissue. Cells are in flux, and different cell subtypes can vary in distinct ways, even when populations of cells maintain a stable phenotype; large-scale state changes can also sometimes occur. RNA sequencing destroys the target cell and so fluctuations over time in a single cell cannot be measured. How do we distinguish changes that represent normal homeostatic flux from differences in cell subtypes or a long-term transition of cell states?
Populations of cells from the same tissue represent an ensemble of that cell state (each cell captured in isolation will be in a slightly different point in the normal homeostatic flux). Important covariates for the analysis of single-cell transcriptomes are the expression quantitative trait loci (eQTL) obtained from populations of isogenic neighbouring cells. eQTL are sequence variants correlated with expression levels and so different genotypes at eQTL sites can explain some between-cell differences and so help to deconvolve the states of a population of cells.

In this project we will explore different statistical modelling approaches to single-cell transcriptomic data, using eQTL as covariates, to try to understand the states of different cells and how they evolve over time.
Suitable for: Students with a background in statistics. mathematics or computation and an interest in cell biology.

Available for: PhD/MSc

Project 2
Project title:
Inference under the coalescent-with-recombination
Supervisors: Prof David Balding
Location: Centre for Systems Genomics, University of Melbourne
Description: This is a challenging computational-statistics project aimed at one of the central problems of statistical genetics: how to draw valid and efficient statistical inferences from genome-wide SNP or sequence data from large numbers of individuals. A widely-accepted probabilistic model for the shared ancestry of genomes is the coalescent-with-recombination (CwR), which can be combined with standard models for evolutionary parameters such as mutation and recombination rates, as well as demographic parameters such as migration and population growth rates. The challenge is to convert the resulting joint prior distribution into a posterior distribution given observed sequence data.

Currently ArgWeaver represents state-of-art for approximate Bayesian inference under the CwR, but is limited to small sample sizes. Progress was made at genome-wide scale for the two haploid genomes within an individual by Li and Durbin assuming the sequentially Markov coalescent model which simplifies the CwR by assuming that it is Markov when viewed as a process along the genome. My collaborators at UCL (London) have developed a novel "bridge sampling" approach that uses a "divide and conquer" technique in which exhaustive searches are performed in short, overlapping genome intervals. This approach is promising because the algorithm is highly parallelisable, but it is currently limited in scope. Recently Kelleher et al (2016) have published a highly-efficient way to simulate and to store genome-wide coalescent trees. This potentially generates new possibilities for inference that will be explored in this project.

Reference: Kelleher J, Etheridge AM, McVean G (2016) Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Comp Biol 12(5): e1004842. doi:10.1371/journal.pcbi.1004842
Suitable for: Students with a background in computational statistics

Available for: PhD only

Project 3

Project title: Genome-wide models for heritability and prediction
Supervisors: Prof David Balding, in collaboration with Dr Doug Speed (UCL Genetics Institute)
Location: Centre for Systems Genomics, University of Melbourne
Description: The heritability of a phenotype is the fraction of its variance that can be modelled by genetics."Genetics" for this purpose used to be measured by pedigree relatedness, but this has two major limitations: the result depends on the available pedigree, and the pedigree relatedness of two individuals can describe their expected genome sharing, but not the realised value. Nowadays, genome-wide allele sharing can be measured directly from SNP genotypes, but this has raised a lot of questions about what is the best way to represent the genetic similarity between two individuals, given their genome-wide genotypes. Analogy with the pedigree-based statistical model (a mixed regression model in which the variance matrix was modelled by pedigree-based kinship coefficients) has led researchers to a specific statistical model relating genome-wide SNP genotypes with a phenotype.

My work with collaborator Doug Speed (Speed et al, BioRχiv, https://doi.org/10.1101/074310) has shown this model to be deficient. Based on a large-scale reanalysis of GWAS data for 43 phenotypes, we developed a model that better fits real data by taking into account the effects of minor allele fraction, linkage disequilibrium and genotype quality on the heritability of a SNP.

Our superior heritability model has many implications for complex trait genetics that this project will explore. These include extending the model to multivariate phenotype analysis, and in particular to improved estimates of the genetic correlation between traits. Our LD model has substantial implications for LD Score Regression, a popular approach to analysing GWAS data that is available only in the form of summary statistics, rather than individual genotype data. In this project we will work both to further refine the new heritability model, including its extension to summary statistics, and to improve LD Score Regression and other related methods. This will in turn lead to better prediction models for individual and multiple phenotypes.
Suitable for: Students with a background in statistics and some experience of statistical software.

Available for: PhD/MSc

____________________

Laboratory head: A/Prof Kathryn Holt
UoM Organisational unit: Department of Biochemistry & Molecular Biology
http://biomedicalsciences.unimelb.edu.au/departments/biochemistry/study/honours-and-masters
Email: kholt@unimelb.edu.au
Research Group website: https://holtlab.net/

Project title: Mobile genetic elements in the evolution of antibiotic-resistant Klebsiella pneumoniae
Supervisors: Dr Kelly Wyres and A/Prof Kathryn Holt
Location: Centre for Systems Genomics, Building 184, Royal Parade
Description: Klebsiella pneumoniae is an opportunistic bacterial pathogen that frequently causes healthcare-associated infections and is recognised as an emerging public health threat. Antibiotic resistance is also a major concern and rates have been increasing globally. Recent comparative genomic studies have highlighted the extensive genomic diversity of K. pneumoniae, which can readily acquire and lose mobile genetic elements (MGEs), including plasmids, transposons and others. Plasmids are highly mobile and distinct from the bacterial chromosome, and have been the subject of many epidemiological and evolutionary studies. In contrast, less is known about MGEs such as phage and transposons, which integrate directly into the chromosome. However, these elements can also carry important genes and there is emerging evidence that they can play a role in the evolution of globally distributed, multi-drug resistant clones.

This project will use comparative genomics techniques to characterise a range of chromosomally integrated MGEs among K. pneumoniae genomes from our large collection. Protein domain and sequence homology searches will be used to predict the functions of novel genes that are transferred into K. pneumoniae chromosomes by the MGEs. The distribution of the different MGEs across the K. pneumoniae population will be investigated, and any associations with different types of disease or drug-resistance will be explored.

Available for: Hons/MSc 

____________________

Laboratory head: A/Prof James McCaw
UoM Organisational unit: School of Mathematics and Statistics
http://www.ms.unimelb.edu.au/#study
Email: jamesm@unimelb.edu.au
Research Group website: https://sites.google.com/site/jamesmccaw/home

Project 1
Project title:
Within host pathogen dynamics modelling
Supervisor: A/Prof James McCaw

Description:
Projects focusing on varied aspects of the host-pathogen interaction are available, on diseases such as influenza and malaria. Biological topics for investigation include: the role of innate and adaptive immunity in controlling infection, the role of drugs in controlling infection and the development of drug resistance, and evolutionary aspects of infection including genetic drift and selection. Mathematical techniques required are varied but include: deterministic and stochastic modelling, dynamical systems analyses including numerical bifurcation studies and biostatistical studies, including Bayesian hierarchical modelling of dynamic non-linear systems.
Suitable for: Students with a background in applied mathematics, applied probability, physics and other quantitative and computational sciences, and those with training in biology (e.g. microbiology, immunology or parisitology) who have a strong mathematical and/or computational interest.

Project 2
Project title:
Mathematical epidemiology and infectious diseases modelling
Supervisor: A/Prof James McCaw

Description: Projects are available on the development, analysis and application of models of infectious disease transmission in human populations. Both theoretically focussed projects and applied public health projects are available. Diseases of interest include influenza, malaria, emerging and re-emerging diseases such as Ebola and vaccine preventable diseases such as pertussis.
Suitable for: Students with a background in applied mathematics, applied probability, physics and other quantitative and computational sciences, and those with training in epidemiology or public health who have a strong mathematical and/or computational interest.

Available for: Hons/MSc

___________________

Laboratory heads: Dr Bernard Pope, Head of Clinical Genomics and Head of Cancer Genomics & Dr Daniel Park, Head of the Melbourne Bioinformatics Platform and Head of the Genomic Technologies Group
UoM Organisational unit: VLSCI, Faculty of Medicine, Dentistry & Health Sciences
Email: bjpope@unimelb.edu.au OR djp@unimelb.edu.au
Research Group website: www.vlsci.org.au

Project Title: Research Masters and PhD projects targeting health and medical research in life science computing
Supervisors: Dr Bernard Pope and Dr Daniel Park

Description: Traditionally, medical advances have been yielded from analyses of small datasets resulting from relatively contained experimental designs. In recent times, however, great progress has been made in generating large, curated and publicly available collections of data that have the potential to be harnessed to address important medical questions. The VLSCI combines a wealth of human expertise with powerful computational resources. We are a multidisciplinary organisation, consisting of computer scientists, molecular biologists and bioinformaticians with a strong track record of collaborative research. We have identified a number of exciting biomedical research projects with the potential for significant real-world impact, such as improving cancer treatments and diagnostics, and are seeking expressions of interest from students. The focal areas of research will expose students to functional genomics, application and development of machine learning techniques for bioinformatics, as well as developing computational tools to decipher molecular mechanisms of diseases.
Suitable for: Prospective Research Masters or PhD candidates who excel in data analytics, and/or computer science and are driven to apply these skills in this rapidly developing area of medical research.

Available for: MSc/PhD

___________________

Laboratory head: Dr Matthew Ritchie
UoM Organisational unit: WEHI, Department of Medical Biology
http://mdhs.unimelb.edu.au/our-organisation/institutes-centres-departments/department-of-medical-biology
Email: mritchie@wehi.edu.au
Research Group website: http://www.wehi.edu.au/people/matthew-ritchie

Project Title: Long-read sequencing for transcriptome and epigenome analysis
Supervisors: Dr Matthew Ritchie, Dr Charity Law and A/Prof Marnie Blewitt

Description: Long-read sequencing as generated by Pacific Biosciences’ new Sequel platform or Oxford Nanopore Technology’s MinION and PromethION devices provide researchers with the ideal tool for resolving transcript architecture, studying methylation patterns genome-wide and assembling genomes. This project will explore a number of applications of long-read technology using data sets that look at splicing patterns in platelets, methylation changes associated with X-inactivation and genome structure in cell-lines. It will involve collaboration with researchers at CSL and the successful candidate may be eligible for an industry top-up scholarship.
Suitable for: A student with a strong statistical or computational background (e.g. an undergraduate degree in Statistics and Mathematics), programming skills and an interest in biology.

Available for: PhD

___________________

Laboratory head: Dr Wei Shi
UoM Organisational unit: WEHI, Department of Medical Biology
http://mdhs.unimelb.edu.au/our-organisation/institutes-centres-departments/department-of-medical-biology
Email: shi@wehi.edu.au
Research Group website: http://www.wehi.edu.au/people/wei-shi

Project Title: Biological sequence analysis and genomic variant discovery
Supervisors: Dr Wei Shi, Prof Gordon Smyth

Description: Next-generation sequencing (NGS) technologies are increasingly used in laboratories and clinics worldwide to facilitate better understanding and diagnosis of diseases. The massive volume of data from these technologies continues to pose significant challenges for bioinformaticians. We are interested in developing novel methods for mapping both long and short NGS reads (and other biological sequences) to a reference genome to find the true origin of biological sequences (Liao et al., Nucleic Acids Research, 2013,41(10):e108). We would also like to develop more accurate methods for detecting genomic variants (eg. insertions, deletions, translocations etc.) in cancer genomes using NGS data.
Suitable for: Prospective students are expected to have a computer science background and/or have strong programming skills. One or two projects are available for PhD or Masters study.

Available for: PhD/MSc

___________________

Laboratory head: Dr Wei Shi; Prof Phil Hodgkin
UoM Organisational unit: WEHI, Department of Medical Biology
http://mdhs.unimelb.edu.au/our-organisation/institutes-centres-departments/department-of-medical-biology
Email: shi@wehi.edu.auhodgkin@wehi.edu.au
Research Group website: http://www.wehi.edu.au/people/wei-shihttp://www.wehi.edu.au/people/phil-hodgkin

Project Title: Reconstructing the immune response: from molecules to cells to systems
Supervisors: Prof Phil Hodgkin, Dr Wei Shi, Dr Andrey Kan

Description: Lymphocyte responses are complex processes whereby B and T cells undergo programmed rounds of proliferation and death triggered by pathogenic stimuli. Adequate timing and magnitude of the response are essential for effective pathogen clearance. The collective behaviour of a population of cells emerges from gene regulatory networks controlling individual cells. However, transcriptional regulation of lymphocyte responses is poorly understood. In this interdisciplinary project we will use bioinformatics methods to develop predictive mathematical models of lymphocyte responses. We will experimentally establish transcriptional profiles at different stages of the response, and interrogate these data using advanced statistical methods.
Suitable for: The student will learn both “wet lab” experimental techniques, such as cell proliferation assays and DNA sequencing, and “dry lab” skills for data analysis, such as developing statistical models using R and Python.

Available for: Hons/PhD

___________________

Laboratory head: Prof Karin Verspoor
UoM Organisational unit: School of Computing and Information Systems
http://cis.unimelb.edu.au/study/graduate/
Email: karin.verspoor@unimelb.edu.ausaralph@unimelb.edu.au
Research Group website: http://www.textminingscience.com/

Project title: Text mining for extraction of subcellular localisation.
Supervisors: Prof Karin Verspoor and Dr Stuart Ralph.

Description: We are interested in automatically extracting information from the literature that specifies the sub-cellular localisation of proteins in the malaria parasite Plasmodium falciparum, and in related human parasites. Information about sub-cellular localisation in infectious agents is crucial to prioritising targets for drugs and vaccines. We have built a database that details subcellular localisation of hundreds of Plasmodium proteins (http://apiloc.biochem.unimelb.edu.au/apiloc/apiloc), and will use this as a training set for Biomedical Natural Language Processing. The project will involve construction of a tool to recognise and extract records of cellular localisation for proteins.
Suitable for: Students with a background in computing, with interests in Natural Language Processing, Text mining, Bioinformatics.

Available for: Masters/PhD

___________________

Laboratory head: Prof Christine Wells
UoM Organisational unit: Department of Anatomy & Neuroscience
http://biomedicalsciences.unimelb.edu.au/departments/anatomy-and-neuroscience/study/honours-And-masters-by-coursework
Email: wells.c@unimelb.edu.au
Research Group website: http://www.stemformatics.org

Project 1
Project title: 
Pathways of development: developing high resolution stem cell networks that reproducibly describe stem cell differentiation.
Supervisors: Prof Christine Wells and Dr Daniel Hurley.

Background, goals and skills: The behaviours of cells can be described as coordinated networks of genes, proteins and metabolites, and these complex networks are best conceptualised in terms of ‘pathways’ for different aspects of cellular function. Surprisingly, less than 1/3 of the genes in the human genome have been annotated to a pathway - and the molecules expressed in stem cells are particularly poorly represented. Better stem cell pathways are urgently needed as basic research tools for the interpretation of genetic, biochemical/pharmacological or metabolic perturbation during development and disease. Pluripotent stem cells provide useful models of differentiation-in-a-dish. Systems Biology methodologies such as transcriptomics, proteomics or metabolomics, allows unbiased monitoring of the molecules that change as stem cells differentiate to the lineage of interest. Making sense of this data, however, requires framing of any data analysis in terms of molecular networks and pathways, to identify those molecules that are drivers of differentiation and the regulators of cell lineage commitment. Current maps of lineage commitment rely heavily on our understanding of embryonic development in model organisms such as mouse, Drosophila and C. elegans. The goal of this project is therefore to assess the ‘fit’ of these developmental pathways to a human pluripotent stem cell model of differentiation.  The project will make use of an existing systems biology dataset on in vitro stem cell differentiation to neural crest, blood progenitors, endothelial progenitors and cardiac progenitors. The data include transcriptome, proteome and metabolomic measurements of stem cells over a time course of directed differentiation. In this project, students will use text mining via semantic web applications to construct pathways from existing public databases (such as wikipathways) and from the scientific literature, construct up to date networks of early mesendoderm and neuroectoderm differentiation. The student will work with bioinformaticians and stem cell biologists to identify stem cell networks, and combine these to build definitive pathways that describe stem cell differentiation.
Suitable for: Students with a background in biological sciences, and training in cell biology, molecular biology, with an interest in bioinformatics, scientific illustration and digital design. Students will be given training in text mining, semantic web and pathway design using wiki pathways.  

Project 2
Project Title: 
Visualising stem cell networks: developing web-based computational tools to compare networks and pathways relevant to stem cell biology.
Supervisors: Prof Christine Wells and Dr Daniel Hurley.

Background, goals and skills: Network approaches are becoming important for making sense of biological data, and there is a need to develop tools to compare and relate networks from different sources. In this project, students will use open-source data visualisation libraries to build tools which compare networks at the level of pathways and regions, then display the comparisons in an interactive web-based format. Students will then use these tools to analyse some existing stem cell network resources, and interpret the results in the context of known stem cell biology.
Suitable for: Students with a background in statistics, computational biology or bioinformatics, and an interest in data visualisation methods.  Students will be given training in using a standard bioinformatics software environment, code development and visualisation of data on the Web.

Available for: Hons/MSc

___________________