Inferring transmission from genomic + epi data

There is a flurry of new papers on this, how exciting! One is through peer review and published in final form at PLoS, and the others in preprint (arXiv and the new BioRXiv)!!

It will take me a while to read and understand these, especially the subtleties of the different applications they are aiming for…. so in the meantime let’s just compare them the lazy way – is the software available, and who has the best pictures?

1. In PLoS Computational Biology, using SARS to demonstrate utility:

Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data

Thibaut Jombart, Xavier Didelot, Simon Cauchemez, Christophe Fraser, Neil Ferguson

Abstract: Recent years have seen progress in the development of statistically rigorous frameworks to infer outbreak transmission trees (“who infected whom”) from epidemiological and genetic data. Making use of pathogen genome sequences in such analyses remains a challenge, however, with a variety of heuristic approaches having been explored to date. We introduce a statistical method exploiting both pathogen sequences and collection dates to unravel the dynamics of densely sampled outbreaks. Our approach identifies likely transmission events and infers dates of infections, unobserved cases and separate introductions of the disease. It also proves useful for inferring numbers of secondary infections and identifying heterogeneous infectivity and super-spreaders. After testing our approach using simulations, we illustrate the method with the analysis of the beginning of the 2003 Singaporean outbreak of Severe Acute Respiratory Syndrome (SARS), providing new insights into the early stage of this epidemic. Our approach is the first tool for disease outbreak reconstruction from genetic data widely available as free software, the R package outbreaker. It is applicable to various densely sampled epidemics, and improves previous approaches by detecting unobserved and imported cases, as well as allowing multiple introductions of the pathogen. Because of its generality, we believe this method will become a tool of choice for the analysis of densely sampled disease outbreaks, and will form a rigorous framework for subsequent methodological developments.

Software: R package ‘outbreaker’, which includes functions for simulating outbreak data. Very nice.

thumbsupthumbsupthumbsup

Pictures: Pretty good!

thumbsupthumbsupthumbsup

2. In preprint on BioRXiv, using tuberculosis as an example:

Bayesian inference of infectious disease transmission from whole genome sequence data

Xavier Didelot, Jennifer Gardy, Caroline Colijn

Abstract: Genomics is increasingly being used to investigate disease outbreaks, but an important question remains unanswered — how well do genomic data capture known transmission events, particularly for pathogens with long carriage periods or large within-host population sizes? Here we present a novel Bayesian approach to reconstruct densely-sampled outbreaks from genomic data whilst considering within-host diversity. We infer a time-labelled phylogeny using BEAST, then infer a transmission network via a Monte-Carlo Markov Chain. We find that under a realistic model of within-host evolution, reconstructions of simulated outbreaks contain substantial uncertainty even when genomic data reflect a high substitution rate. Reconstruction of a real-world tuberculosis outbreak displayed similar uncertainty, although the correct source case and several clusters of epidemiologically linked cases were identified. We conclude that genomics cannot wholly replace traditional epidemiology, but that Bayesian reconstructions derived from sequence data may form a useful starting point for a genomic epidemiology investigation.

Software: No software package. Uses BEAST trees, for data visualisation uses DensiTree, R and Cytoscape.

thumbsup

Pictures: I probably shouldn’t post these as it’s still preprint, so you’ll have to just look at the PDF. But there are some nice figures and a few different ways of presenting the data. I particularly like Figure 4A, which shows the phylogenetic tree against a binary matrix indicating SNP alleles in each sample, for those that vary within the group. This would be good to have handy for inspecting fine phylogenies… it looks like it was done in R, I think I might add this to my tree visualisation R functions (to be posted soon).

thumbsup

3. In preprint on arXiv, using a Staph. aureus hospital outbreak example

Reconstructing transmission networks for communicable diseases using densely sampled genomic data: a generalized approach

Colin J. Worby1, Philip D. O’Neill, Theodore Kypraios, Julie V. Robotham, Daniela De Angelis, Edward J. P. Cartwright, Sharon J. Peacock and Ben S. Cooper

Abstract: Genomics is increasingly being used to investigate disease outbreaks, but an important question remains unanswered — how well do genomic data capture known transmission events, particularly for pathogens with long carriage periods or large within-host population sizes? Here we present a novel Bayesian approach to reconstruct densely-sampled outbreaks from genomic data whilst considering within-host diversity. We infer a time-labelled phylogeny using BEAST, then infer a transmission network via a Monte-Carlo Markov Chain. We find that under a realistic model of within-host evolution, reconstructions of simulated outbreaks contain substantial uncertainty even when genomic data reflect a high substitution rate. Reconstruction of a real-world tuberculosis outbreak displayed similar uncertainty, although the correct source case and several clusters of epidemiologically linked cases were identified. We conclude that genomics cannot wholly replace traditional epidemiology, but that Bayesian reconstructions derived from sequence data may form a useful starting point for a genomic epidemiology investigation.

Software: No mention of any software package or how this was implemented.

Pictures: I probably shouldn’t post these as it’s still preprint, so you’ll have to just look at the PDF. But in my opinion figures are much less clear and harder to follow than the others (and mostly black and white…why???).

Leave a comment