Inference of Transmission Network Structure from HIV Phylogenetic Trees.
ABSTRACT: Phylogenetic inference is an attractive means to reconstruct transmission histories and epidemics. However, there is not a perfect correspondence between transmission history and virus phylogeny. Both node height and topological differences may occur, depending on the interaction between within-host evolutionary dynamics and between-host transmission patterns. To investigate these interactions, we added a within-host evolutionary model in epidemiological simulations and examined if the resulting phylogeny could recover different types of contact networks. To further improve realism, we also introduced patient-specific differences in infectivity across disease stages, and on the epidemic level we considered incomplete sampling and the age of the epidemic. Second, we implemented an inference method based on approximate Bayesian computation (ABC) to discriminate among three well-studied network models and jointly estimate both network parameters and key epidemiological quantities such as the infection rate. Our ABC framework used both topological and distance-based tree statistics for comparison between simulated and observed trees. Overall, our simulations showed that a virus time-scaled phylogeny (genealogy) may be substantially different from the between-host transmission tree. This has important implications for the interpretation of what a phylogeny reveals about the underlying epidemic contact network. In particular, we found that while the within-host evolutionary process obscures the transmission tree, the diversification process and infectivity dynamics also add discriminatory power to differentiate between different types of contact networks. We also found that the possibility to differentiate contact networks depends on how far an epidemic has progressed, where distance-based tree statistics have more power early in an epidemic. Finally, we applied our ABC inference on two different outbreaks from the Swedish HIV-1 epidemic.
Project description:Models of the spread of disease in a population often make the simplifying assumption that the population is homogeneously mixed, or is divided into homogeneously mixed compartments. However, human populations have complex structures formed by social contacts, which can have a significant influence on the rate of epidemic spread. Contact network models capture this structure by explicitly representing each contact which could possibly lead to a transmission. We developed a method based on approximate Bayesian computation (ABC), a likelihood-free inference strategy, for estimating structural parameters of the contact network underlying an observed viral phylogeny. The method combines adaptive sequential Monte Carlo for ABC, Gillespie simulation for propagating epidemics though networks, and a kernel-based tree similarity score. We used the method to fit the Barabási-Albert network model to simulated transmission trees, and also applied it to viral phylogenies estimated from ten published HIV sequence datasets. This model incorporates a feature called preferential attachment (PA), whereby individuals with more existing contacts accumulate new contacts at a higher rate. On simulated data, we found that the strength of PA and the number of infected nodes in the network can often be accurately estimated. On the other hand, the mean degree of the network, as well as the total number of nodes, was not estimable with ABC. We observed sub-linear PA power in all datasets, as well as higher PA power in networks of injection drug users. These results underscore the importance of considering contact structures when performing phylodynamic inference. Our method offers the potential to quantitatively investigate the contact network structure underlying viral epidemics.
Project description:The shapes of phylogenetic trees relating virus populations are determined by the adaptation of viruses within each host, and by the transmission of viruses among hosts. Phylodynamic inference attempts to reverse this flow of information, estimating parameters of these processes from the shape of a virus phylogeny reconstructed from a sample of genetic sequences from the epidemic. A key challenge to phylodynamic inference is quantifying the similarity between two trees in an efficient and comprehensive way. In this study, I demonstrate that a new distance measure, based on a subset tree kernel function from computational linguistics, confers a significant improvement over previous measures of tree shape for classifying trees generated under different epidemiological scenarios. Next, I incorporate this kernel-based distance measure into an approximate Bayesian computation (ABC) framework for phylodynamic inference. ABC bypasses the need for an analytical solution of model likelihood, as it only requires the ability to simulate data from the model. I validate this "kernel-ABC" method for phylodynamic inference by estimating parameters from data simulated under a simple epidemiological model. Results indicate that kernel-ABC attained greater accuracy for parameters associated with virus transmission than leading software on the same data sets. Finally, I apply the kernel-ABC framework to study a recent outbreak of a recombinant HIV subtype in China. Kernel-ABC provides a versatile framework for phylodynamic inference because it can fit a broader range of models than methods that rely on the computation of exact likelihoods.
Project description:The use of genetic data to reconstruct the transmission tree of infectious disease epidemics and outbreaks has been the subject of an increasing number of studies, but previous approaches have usually either made assumptions that are not fully compatible with phylogenetic inference, or, where they have based inference on a phylogeny, have employed a procedure that requires this tree to be fixed. At the same time, the coalescent-based models of the pathogen population that are employed in the methods usually used for time-resolved phylogeny reconstruction are a considerable simplification of epidemic process, as they assume that pathogen lineages mix freely. Here, we contribute a new method that is simultaneously a phylogeny reconstruction method for isolates taken from an epidemic, and a procedure for transmission tree reconstruction. We observe that, if one or more samples is taken from each host in an epidemic or outbreak and these are used to build a phylogeny, a transmission tree is equivalent to a partition of the set of nodes of this phylogeny, such that each partition element is a set of nodes that is connected in the full tree and contains all the tips corresponding to samples taken from one and only one host. We then implement a Monte Carlo Markov Chain (MCMC) procedure for simultaneous sampling from the spaces of both trees, utilising a newly-designed set of phylogenetic tree proposals that also respect node partitions. We calculate the posterior probability of these partitioned trees based on a model that acknowledges the population structure of an epidemic by employing an individual-based disease transmission model and a coalescent process taking place within each host. We demonstrate our method, first using simulated data, and then with sequences taken from the H7N7 avian influenza outbreak that occurred in the Netherlands in 2003. We show that it is superior to established coalescent methods for reconstructing the topology and node heights of the phylogeny and performs well for transmission tree reconstruction when the phylogeny is well-resolved by the genetic data, but caution that this will often not be the case in practice and that existing genetic and epidemiological data should be used to configure such analyses whenever possible. This method is available for use by the research community as part of BEAST, one of the most widely-used packages for reconstruction of dated phylogenies.
Project description:Spread of HIV typically involves uneven transmission patterns where some individuals spread to a large number of individuals while others to only a few or none. Such transmission heterogeneity can impact how fast and how much an epidemic spreads. Further, more efficient interventions may be achieved by taking such transmission heterogeneity into account. To address these issues, we developed two phylogenetic methods based on virus sequence data: 1) to generally detect if significant transmission heterogeneity is present, and 2) to pinpoint where in a phylogeny high-level spread is occurring. We derive inference procedures to estimate model parameters, including the amount of transmission heterogeneity, in a sampled epidemic. We show that it is possible to detect transmission heterogeneity under a wide range of simulated situations, including incomplete sampling, varying levels of heterogeneity, and including within-host genetic diversity. When evaluating real HIV-1 data from different epidemic scenarios, we found a lower level of transmission heterogeneity in slowly spreading situations and a higher level of heterogeneity in data that included a rapid outbreak, while R0 and Sackin's index (overall tree shape statistic) were similar in the two scenarios, suggesting that our new method is able to detect transmission heterogeneity in real data. We then show by simulations that targeted prevention, where we pinpoint high-level spread using a coalescence measurement, is efficient when sequence data are collected in an ongoing surveillance system. Such phylogeny-guided prevention is efficient under both single-step contact tracing as well as iterative contact tracing as compared to random intervention.
Project description:Contact structure is believed to have a large impact on epidemic spreading and consequently using networks to model such contact structure continues to gain interest in epidemiology. However, detailed knowledge of the exact contact structure underlying real epidemics is limited. Here we address the question whether the structure of the contact network leaves a detectable genetic fingerprint in the pathogen population. To this end we compare phylogenies generated by disease outbreaks in simulated populations with different types of contact networks. We find that the shape of these phylogenies strongly depends on contact structure. In particular, measures of tree imbalance allow us to quantify to what extent the contact structure underlying an epidemic deviates from a null model contact network and illustrate this in the case of random mixing. Using a phylogeny from the Swiss HIV epidemic, we show that this epidemic has a significantly more unbalanced tree than would be expected from random mixing.
Project description:Motivation:Genomic analysis has become one of the major tools for disease outbreak investigations. However, existing computational frameworks for inference of transmission history from viral genomic data often do not consider intra-host diversity of pathogens and heavily rely on additional epidemiological data, such as sampling times and exposure intervals. This impedes genomic analysis of outbreaks of highly mutable viruses associated with chronic infections, such as human immunodeficiency virus and hepatitis C virus, whose transmissions are often carried out through minor intra-host variants, while the additional epidemiological information often is either unavailable or has a limited use. Results:The proposed framework QUasispecies Evolution, Network-based Transmission INference (QUENTIN) addresses the above challenges by evolutionary analysis of intra-host viral populations sampled by deep sequencing and Bayesian inference using general properties of social networks relevant to infection dissemination. This method allows inference of transmission direction even without the supporting case-specific epidemiological information, identify transmission clusters and reconstruct transmission history. QUENTIN was validated on experimental and simulated data, and applied to investigate HCV transmission within a community of hosts with high-risk behavior. It is available at https://github.com/skumsp/QUENTIN. Contact:firstname.lastname@example.org or email@example.com or firstname.lastname@example.org or email@example.com. Supplementary information:Supplementary data are available at Bioinformatics online.
Project description:BACKGROUND:Transmission patterns of sexually-transmitted infections (STIs) could relate to the structure of the underlying sexual contact network, whose features are therefore of interest to clinicians. Conventionally, we represent sexual contacts in a population with a graph, that can reveal the existence of communities. Phylogenetic methods help infer the history of an epidemic and incidentally, may help detecting communities. In particular, phylogenetic analyses of HIV-1 epidemics among men who have sex with men (MSM) have revealed the existence of large transmission clusters, possibly resulting from within-community transmissions. Past studies have explored the association between contact networks and phylogenies, including transmission clusters, producing conflicting conclusions about whether network features significantly affect observed transmission history. As far as we know however, none of them thoroughly investigated the role of communities, defined with respect to the network graph, in the observation of clusters. METHODS:The present study investigates, through simulations, community detection from phylogenies. We simulate a large number of epidemics over both unweighted and weighted, undirected random interconnected-islands networks, with islands corresponding to communities. We use weighting to modulate distance between islands. We translate each epidemic into a phylogeny, that lets us partition our samples of infected subjects into transmission clusters, based on several common definitions from the literature. We measure similarity between subjects' island membership indices and transmission cluster membership indices with the adjusted Rand index. RESULTS AND CONCLUSION:Analyses reveal modest mean correspondence between communities in graphs and phylogenetic transmission clusters. We conclude that common methods often have limited success in detecting contact network communities from phylogenies. The rarely-fulfilled requirement that network communities correspond to clades in the phylogeny is their main drawback. Understanding the link between transmission clusters and communities in sexual contact networks could help inform policymaking to curb HIV incidence in MSMs.
Project description:The characteristics of the host contact network over which a pathogen is transmitted affect both epidemic spread and the projected effectiveness of control strategies. Given the importance of understanding these contact networks, it is unfortunate that they are very difficult to measure directly. This challenge has led to an interest in methods to infer information about host contact networks from pathogen phylogenies, because in shaping a pathogen's opportunities for reproduction, contact networks also shape pathogen evolution. Host networks influence pathogen phylogenies both directly, through governing opportunities for evolution, and indirectly by changing the prevalence and incidence. Here, we aim to separate these two effects by comparing pathogen evolution on different host networks that share similar epidemic trajectories. This approach allows use to examine the direct effects of network structure on pathogen phylogenies, largely controlling for confounding differences arising from population dynamics. We find that networks with more heterogeneous degree distributions yield pathogen phylogenies with more variable cluster numbers, smaller mean cluster sizes, shorter mean branch lengths, and somewhat higher tree imbalance than networks with relatively homogeneous degree distributions. However, in particular for dynamic networks, we find that these direct effects are relatively modest. These findings suggest that the role of the epidemic trajectory, the dynamics of the network and the inherent variability of metrics such as cluster size must each be taken into account when trying to use pathogen phylogenies to understand characteristics about the underlying host contact network.
Project description:The gut microbiomes of mammals appear to mirror their hosts' phylogeny, suggesting host-driven microbial community assembly. Yet, much of this evidence stems from comparative studies of distinct wild or captive populations that lack data for disentangling the relative influences of shared phylogeny and environment. Here, we present phylogenetic and multivariate analyses of gut microbiomes from six sympatric (i.e., co-occurring) mammal species inhabiting a 1-km2 area in western Madagascar-three lemur and three non-primate species-that consider genetic, dietary, and ecological predictors of microbiome functionality and composition. Host evolutionary history, indeed, appears to shape gut microbial patterns among both closely and distantly related species. However, we also find that diet-reliance on leaves versus fruit-is the best predictor of microbiome similarity among closely related lemur species, and that host substrate use-ground versus tree-constrains horizontal transmission via incidental contact with feces, with arboreal species harboring far more distinct communities than those of their terrestrial and semi-terrestrial counterparts.
Project description:Genomic data are increasingly being used to understand infectious disease epidemiology. Isolates from a given outbreak are sequenced, and the patterns of shared variation are used to infer which isolates within the outbreak are most closely related to each other. Unfortunately, the phylogenetic trees typically used to represent this variation are not directly informative about who infected whom-a phylogenetic tree is not a transmission tree. However, a transmission tree can be inferred from a phylogeny while accounting for within-host genetic diversity by coloring the branches of a phylogeny according to which host those branches were in. Here we extend this approach and show that it can be applied to partially sampled and ongoing outbreaks. This requires computing the correct probability of an observed transmission tree and we herein demonstrate how to do this for a large class of epidemiological models. We also demonstrate how the branch coloring approach can incorporate a variable number of unique colors to represent unsampled intermediates in transmission chains. The resulting algorithm is a reversible jump Monte-Carlo Markov Chain, which we apply to both simulated data and real data from an outbreak of tuberculosis. By accounting for unsampled cases and an outbreak which may not have reached its end, our method is uniquely suited to use in a public health environment during real-time outbreak investigations. We implemented this transmission tree inference methodology in an R package called TransPhylo, which is freely available from https://github.com/xavierdidelot/TransPhylo.