QS-Net: Reconstructing Phylogenetic Networks Based on Quartet and Sextet.
ABSTRACT: Phylogenetic networks are used to estimate evolutionary relationships among biological entities or taxa involving reticulate events such as horizontal gene transfer, hybridization, recombination, and reassortment. In the past decade, many phylogenetic tree and network reconstruction methods have been proposed. Despite that they are highly accurate in reconstructing simple to moderate complex reticulate events, the performance decreases when several reticulate events are present simultaneously. In this paper, we proposed QS-Net, a phylogenetic network reconstruction method taking advantage of information on the relationship among six taxa. To evaluate the performance of QS-Net, we conducted experiments on three artificial sequence data simulated from an evolutionary tree, an evolutionary network involving three reticulate events, and a complex evolutionary network involving five reticulate events. Comparison with popular phylogenetic methods including Neighbor-Joining, Split-Decomposition, Neighbor-Net, and Quartet-Net suggests that QS-Net is comparable with other methods in reconstructing tree-like evolutionary histories, while it outperforms them in reconstructing reticulate events. In addition, we also applied QS-Net in real data including a bacterial taxonomy data consisting of 36 bacterial species and the whole genome sequences of 22 H7N9 influenza A viruses. The results indicate that QS-Net is capable of inferring commonly believed bacterial taxonomy and influenza evolution as well as identifying novel reticulate events. The software QS-Net is publically available at https://github.com/Tmyiri/QS-Net.
Project description:Phylogenetic networks can model reticulate evolutionary events such as hybridization, recombination, and horizontal gene transfer. However, reconstructing such networks is not trivial. Popular character-based methods are computationally inefficient, whereas distance-based methods cannot guarantee reconstruction accuracy because pairwise genetic distances only reflect partial information about a reticulate phylogeny. To balance accuracy and computational efficiency, here we introduce a quartet-based method to construct a phylogenetic network from a multiple sequence alignment. Unlike distances that only reflect the relationship between a pair of taxa, quartets contain information on the relationships among four taxa; these quartets provide adequate capacity to infer a more accurate phylogenetic network. In applications to simulated and biological data sets, we demonstrate that this novel method is robust and effective in reconstructing reticulate evolutionary events and it has the potential to infer more accurate phylogenetic distances than other conventional phylogenetic network construction methods such as Neighbor-Joining, Neighbor-Net, and Split Decomposition. This method can be used in constructing phylogenetic networks from simple evolutionary events involving a few reticulate events to complex evolutionary histories involving a large number of reticulate events. A software called "Quartet-Net" is implemented and available at http://sysbio.cvm.msstate.edu/QuartetNet/.
Project description:The role of hybridization and subsequent introgression has been demonstrated in an increasing number of species. Recently, Fontaine et al. (Science, 347, 2015, 1258524) conducted a phylogenomic analysis of six members of the Anopheles gambiae species complex. Their analysis revealed a reticulate evolutionary history and pointed to extensive introgression on all four autosomal arms. The study further highlighted the complex evolutionary signals that the co-occurrence of incomplete lineage sorting (ILS) and introgression can give rise to in phylogenomic analyses. While tree-based methodologies were used in the study, phylogenetic networks provide a more natural model to capture reticulate evolutionary histories. In this work, we reanalyse the Anopheles data using a recently devised framework that combines the multispecies coalescent with phylogenetic networks. This framework allows us to capture ILS and introgression simultaneously, and forms the basis for statistical methods for inferring reticulate evolutionary histories. The new analysis reveals a phylogenetic network with multiple hybridization events, some of which differ from those reported in the original study. To elucidate the extent and patterns of introgression across the genome, we devise a new method that quantifies the use of reticulation branches in the phylogenetic network by each genomic region. Applying the method to the mosquito data set reveals the evolutionary history of all the chromosomes. This study highlights the utility of 'network thinking' and the new insights it can uncover, in particular in phylogenomic analyses of large data sets with extensive gene tree incongruence.
Project description:Phylogenetic networks are a generalization of evolutionary trees and are an important tool for analyzing reticulate evolutionary histories. Recently, there has been great interest in developing new methods to construct rooted phylogenetic networks, that is, networks whose internal vertices correspond to hypothetical ancestors, whose leaves correspond to sampled taxa, and in which vertices with more than one parent correspond to taxa formed by reticulate evolutionary events such as recombination or hybridization. Several methods for constructing evolutionary trees use the strategy of building up a tree from simpler building blocks (such as triplets or clusters), and so it is natural to look for ways to construct networks from smaller networks. In this article, we shall demonstrate a fundamental issue with this approach. Namely, we show that even if we are given all of the subnetworks induced on all proper subsets of the leaves of some rooted phylogenetic network, we still do not have all of the information required to completely determine that network. This implies that even if all of the building blocks for some reticulate evolutionary history were to be taken as the input for any given network building method, the method might still output an incorrect history. We also discuss some potential consequences of this result for constructing phylogenetic networks.
Project description:The tree structure is currently the accepted paradigm to represent evolutionary relationships between organisms, species or other taxa. However, horizontal, or reticulate, genomic exchanges are pervasive in nature and confound characterization of phylogenetic trees. Drawing from algebraic topology, we present a unique evolutionary framework that comprehensively captures both clonal and reticulate evolution. We show that whereas clonal evolution can be summarized as a tree, reticulate evolution exhibits nontrivial topology of dimension greater than zero. Our method effectively characterizes clonal evolution, reassortment, and recombination in RNA viruses. Beyond detecting reticulate evolution, we succinctly recapitulate the history of complex genetic exchanges involving more than two parental strains, such as the triple reassortment of H7N9 avian influenza and the formation of circulating HIV-1 recombinants. In addition, we identify recurrent, large-scale patterns of reticulate evolution, including frequent PB2-PB1-PA-NP cosegregation during avian influenza reassortment. Finally, we bound the rate of reticulate events (i.e., 20 reassortments per year in avian influenza). Our method provides an evolutionary perspective that not only captures reticulate events precluding phylogeny, but also indicates the evolutionary scales where phylogenetic inference could be accurate.
Project description:The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation.
Project description:Over the past two decades, phylogenetic networks have been studied to model reticulate evolutionary events. The relationships among phylogenetic networks, phylogenetic trees and clusters serve as the basis for reconstruction and comparison of phylogenetic networks. To understand these relationships, two problems are raised: the tree containment problem, which asks whether a phylogenetic tree is displayed in a phylogenetic network, and the cluster containment problem, which asks whether a cluster is represented at a node in a phylogenetic network. Both the problems are NP-complete.A fast exponential-time algorithm for the cluster containment problem on arbitrary networks is developed and implemented in C. The resulting program is further extended into a computer program for fast computation of the Soft Robinson-Foulds distance between phylogenetic networks.Two computer programs are developed for facilitating reconstruction and validation of phylogenetic network models in evolutionary and comparative genomics. Our simulation tests indicated that they are fast enough for use in practice. Additionally, the distribution of the Soft Robinson-Foulds distance between phylogenetic networks is demonstrated to be unlikely normal by our simulation data.
Project description:Hybridization plays an important role in the evolution of certain groups of organisms, adaptation to their environments, and diversification of their genomes. The evolutionary histories of such groups are reticulate, and methods for reconstructing them are still in their infancy and have limited applicability. We present a maximum likelihood method for inferring reticulate evolutionary histories while accounting simultaneously for incomplete lineage sorting. Additionally, we propose methods for assessing confidence in the amount of reticulation and the topology of the inferred evolutionary history. Our method obtains accurate estimates of reticulate evolutionary histories on simulated datasets. Furthermore, our method provides support for a hypothesis of a reticulate evolutionary history inferred from a set of house mouse (Mus musculus) genomes. As evidence of hybridization in eukaryotic groups accumulates, it is essential to have methods that infer reticulate evolutionary histories. The work we present here allows for such inference and provides a significant step toward putting phylogenetic networks on par with phylogenetic trees as a model of capturing evolutionary relationships.
Project description:Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic network-that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose "horizontal" moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and "vertical" moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves-named rNNI and rSPR-reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results-separating the contributions of horizontal and vertical moves-we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide a solid basis for practical phylogenetic network reconstruction.
Project description:Phylogenetic trees are widely used to display estimates of how groups of species are evolved. Each phylogenetic tree can be seen as a collection of clusters, subgroups of the species that evolved from a common ancestor. When phylogenetic trees are obtained for several datasets (e.g. for different genes), then their clusters are often contradicting. Consequently, the set of all clusters of such a dataset cannot be combined into a single phylogenetic tree. Phylogenetic networks are a generalization of phylogenetic trees that can be used to display more complex evolutionary histories, including reticulate events, such as hybridizations, recombinations and horizontal gene transfers. Here, we present the new Cass algorithm that can combine any set of clusters into a phylogenetic network. We show that the networks constructed by Cass are usually simpler than networks constructed by other available methods. Moreover, we show that Cass is guaranteed to produce a network with at most two reticulations per biconnected component, whenever such a network exists. We have implemented Cass and integrated it into the freely available Dendroscope software.Supplementary data are available at Bioinformatics online.
Project description:MOTIVATION: The presence of reticulate evolutionary events in phylogenies turn phylogenetic trees into phylogenetic networks. These events imply in particular that there may exist multiple evolutionary paths from a non-extant species to an extant one, and this multiplicity makes the comparison of phylogenetic networks much more difficult than the comparison of phylogenetic trees. In fact, all attempts to define a sound distance measure on the class of all phylogenetic networks have failed so far. Thus, the only practical solutions have been either the use of rough estimates of similarity (based on comparison of the trees embedded in the networks), or narrowing the class of phylogenetic networks to a certain class where such a distance is known and can be efficiently computed. The first approach has the problem that one may identify two networks as equivalent, when they are not; the second one has the drawback that there may not exist algorithms to reconstruct such networks from biological sequences. RESULTS: We present in this article a distance measure on the class of semi-binary tree-sibling time consistent phylogenetic networks, which generalize tree-child time consistent phylogenetic networks, and thus also galled-trees. The practical interest of this distance measure is 2-fold: it can be computed in polynomial time by means of simple algorithms, and there also exist polynomial-time algorithms for reconstructing networks of this class from DNA sequence data. AVAILABILITY: The Perl package Bio::PhyloNetwork, included in the BioPerl bundle, implements many algorithms on phylogenetic networks, including the computation of the distance presented in this article. SUPPLEMENTARY INFORMATION: Some counterexamples, proofs of the results not included in this article, and some computational experiments are available at Bioinformatics online.