Evaluating evolutionary history in the face of high gene tree discordance in Australian Gehyra (Reptilia: Gekkonidae).
ABSTRACT: Species tree methods have provided improvements for estimating species relationships and the timing of diversification in recent radiations by allowing for gene tree discordance. Although gene tree discordance is often observed, most discordance is attributed to incomplete lineage sorting rather than other biological phenomena, and the causes of discordance are rarely investigated. We use species trees from multi-locus data to estimate the species relationships, evolutionary history and timing of diversification among Australian Gehyra-a group renowned for taxonomic uncertainty and showing a large degree of gene tree discordance. We find support for a recent Asian origin and two major clades: a tropically adapted clade and an arid adapted clade, with some exceptions, but no support for allopatric speciation driven by chromosomal rearrangement in the group. Bayesian concordance analysis revealed high gene tree discordance and comparisons of Robinson-Foulds distances showed that discordance between gene trees was significantly higher than that generated by topological uncertainty within each gene. Analysis of gene tree discordance and incomplete taxon sampling revealed that gene tree discordance was high whether terminal taxon or gene sampling was maximized, indicating discordance is due to biological processes, which may be important in contributing to gene tree discordance in many recently diversified organisms.
Project description:The species rich butterfly family Nymphalidae has been used to study evolutionary interactions between plants and insects. Theories of insect-hostplant dynamics predict accelerated diversification due to key innovations. In evolutionary biology, analysis of maximum credibility trees in the software MEDUSA (modelling evolutionary diversity using stepwise AIC) is a popular method for estimation of shifts in diversification rates. We investigated whether phylogenetic uncertainty can produce different results by extending the method across a random sample of trees from the posterior distribution of a Bayesian run. Using the MultiMEDUSA approach, we found that phylogenetic uncertainty greatly affects diversification rate estimates. Different trees produced diversification rates ranging from high values to almost zero for the same clade, and both significant rate increase and decrease in some clades. Only four out of 18 significant shifts found on the maximum clade credibility tree were consistent across most of the sampled trees. Among these, we found accelerated diversification for Ithomiini butterflies. We used the binary speciation and extinction model (BiSSE) and found that a hostplant shift to Solanaceae is correlated with increased net diversification rates in Ithomiini, congruent with the diffuse cospeciation hypothesis. Our results show that taking phylogenetic uncertainty into account when estimating net diversification rate shifts is of great importance, as very different results can be obtained when using the maximum clade credibility tree and other trees from the posterior distribution.
Project description:The origin and timing of the diversification of modern birds remains controversial, primarily because phylogenetic relationships are incompletely resolved and uncertainty persists in molecular estimates of lineage ages. Here, we present a species tree for the major palaeognath lineages using 27 nuclear genes and 27 archaic retroposon insertions. We show that rheas are sister to the kiwis, emu and cassowaries, and confirm ratite paraphyly because tinamous are sister to moas. Divergence dating using 10 genes with broader taxon sampling, including emu, cassowary, ostrich, five kiwis, two rheas, three tinamous, three extinct moas and 15 neognath lineages, suggests that three vicariant events and possibly two dispersals are required to explain their historical biogeography. The age of crown group birds was estimated at 131 Ma (95% highest posterior density 122-138 Ma), similar to previous molecular estimates. Problems associated with gene tree discordance and incomplete lineage sorting in birds will require much larger gene sets to increase species tree accuracy and improve error in divergence times. The relatively rapid branching within neoaves pre-dates the extinction of dinosaurs, suggesting that the genesis of the radiation within this diverse clade of birds was not in response to the Cretaceous-Paleogene extinction event.
Project description:Gene tree discordance in large genomic data sets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The data set included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations. [Amaranthaceae; gene tree discordance; hybridization; incomplete lineage sorting; phylogenomics; species network; species tree; transcriptomics.].
Project description:<h4>Premise</h4>Cornales is an order of flowering plants containing ecologically and horticulturally important families, including Cornaceae (dogwoods) and Hydrangeaceae (hydrangeas), among others. While many relationships in Cornales are strongly supported by previous studies, some uncertainty remains with regards to the placement of Hydrostachyaceae and to relationships among families in Cornales and within Cornaceae. Here we analyzed hundreds of nuclear loci to test published phylogenetic hypotheses and estimated a robust species tree for Cornales.<h4>Methods</h4>Using the Angiosperms353 probe set and existing data sets, we generated phylogenomic data for 158 samples, representing all families in the Cornales, with intensive sampling in the Cornaceae.<h4>Results</h4>We curated an average of 312 genes per sample, constructed maximum likelihood gene trees, and inferred a species tree using the summary approach implemented in ASTRAL-III, a method statistically consistent with the multispecies coalescent model.<h4>Conclusions</h4>The species tree we constructed generally shows high support values and a high degree of concordance among individual nuclear gene trees. Relationships among families are largely congruent with previous molecular studies, except for the placement of the nyssoids and the Grubbiaceae-Curtisiaceae clades. Furthermore, we were able to place Hydrostachyaceae within Cornales, and within Cornaceae, the monophyly of known morphogroups was well supported. However, patterns of gene tree discordance suggest potential ancient reticulation, gene flow, and/or ILS in the Hydrostachyaceae lineage and the early diversification of Cornus. Our findings reveal new insights into the diversification process across Cornales and demonstrate the utility of the Angiosperms353 probe set.
Project description:BACKGROUND:Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. RESULTS:We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. CONCLUSIONS:TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .
Project description:A major challenge in phylogenetics and -genomics is to resolve young rapidly radiating groups. The fast succession of species increases the probability of incomplete lineage sorting (ILS), and different topologies of the gene trees are expected, leading to gene tree discordance, i.e., not all gene trees represent the species tree. Phylogenetic discordance is common in phylogenomic datasets, and apart from ILS, additional sources include hybridization, whole-genome duplication, and methodological artifacts. Despite a high degree of gene tree discordance, species trees are often well supported and the sources of discordance are not further addressed in phylogenomic studies, which can eventually lead to incorrect phylogenetic hypotheses, especially in rapidly radiating groups. We chose the high-Andean Asteraceae genus <i>Loricaria</i> to shed light on the potential sources of phylogenetic discordance and generated a phylogenetic hypothesis. By accounting for paralogy during gene tree inference, we generated a species tree based on hundreds of nuclear loci, using Hyb-Seq, and a plastome phylogeny obtained from off-target reads during target enrichment. We observed a high degree of gene tree discordance, which we found implausible at first sight, because the genus did not show evidence of hybridization in previous studies. We used various phylogenomic analyses (trees and networks) as well as the D-statistics to test for ILS and hybridization, which we developed into a workflow on how to tackle phylogenetic discordance in recent radiations. We found strong evidence for ILS and hybridization within the genus <i>Loricaria</i>. Low genetic differentiation was evident between species located in different Andean cordilleras, which could be indicative of substantial introgression between populations, promoted during Pleistocene glaciations, when alpine habitats shifted creating opportunities for secondary contact and hybridization.
Project description:<h4>Background</h4>Habronattus is a diverse clade of jumping spiders with complex courtship displays and repeated evolution of Y chromosomes. A well-resolved species phylogeny would provide an important framework to study these traits, but has not yet been achieved, in part because the few genes available in past studies gave conflicting signals. Such discordant gene trees could be the result of incomplete lineage sorting (ILS) in recently diverged parts of the phylogeny, but there are indications that introgression could be a source of conflict.<h4>Results</h4>To infer Habronattus phylogeny and investigate the cause of gene tree discordance, we assembled transcriptomes for 34 Habronattus species and 2 outgroups. The concatenated 2.41 Mb of nuclear data (1877 loci) resolved phylogeny by Maximum Likelihood (ML) with high bootstrap support (95-100%) at most nodes, with some uncertainty surrounding the relationships of H. icenoglei, H. cambridgei, H. oregonensis, and Pellenes canadensis. Species tree analyses by ASTRAL and SVDQuartets gave almost completely congruent results. Several nodes in the ML phylogeny from 12.33 kb of mitochondrial data are incongruent with the nuclear phylogeny and indicate possible mitochondrial introgression: the internal relationships of the americanus and the coecatus groups, the relationship between the altanus, decorus, banksi, and americanus group, and between H. clypeatus and the coecatus group. To determine the relative contributions of ILS and introgression, we analyzed gene tree discordance for nuclear loci longer than 1 kb using Bayesian Concordance Analysis (BCA) for the americanus group (679 loci) and the VCCR clade (viridipes/clypeatus/coecatus/roberti groups) (517 loci) and found signals of introgression in both. Finally, we tested specifically for introgression in the concatenated nuclear matrix with Patterson's D statistics and D<sub>FOIL</sub>. We found nuclear introgression resulting in substantial admixture between americanus group species, between H. roberti and the clypeatus group, and between the clypeatus and coecatus groups.<h4>Conclusions</h4>Our results indicate that the phylogenetic history of Habronattus is predominantly a diverging tree, but that hybridization may have been common between phylogenetically distant species, especially in subgroups with complex courtship displays.
Project description:BACKGROUND:The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of various sources error on phylogenomic analysis is not yet known. This study employs an updated mammal data set of 5162 coding loci sampled from 90 species to evaluate the effects of alignment uncertainty, substitution models, and fossil priors on gene tree, species tree, and divergence time estimation. Additionally, a novel coalescent likelihood ratio test is introduced for comparing competing species trees against a given set of gene trees. RESULTS:The aligned DNA sequences of 5162 loci from 90 species were trimmed and filtered using trimAL and two filtering protocols. The final dataset contains 4 sets of alignments - before trimming, after trimming, filtered by a recently proposed pipeline, and further filtered by comparing ML gene trees for each locus with the concatenation tree. Our analyses suggest that the average discordance among the coalescent trees is significantly smaller than that among the concatenation trees estimated from the 4 sets of alignments or with different substitution models. There is no significant difference among the divergence times estimated with different substitution models. However, the divergence dates estimated from the alignments after trimming are more recent than those estimated from the alignments before trimming. CONCLUSIONS:Our results highlight that alignment uncertainty of the updated mammal data set and the choice of substitution models have little impact on tree topologies yielded by coalescent methods for species tree estimation, whereas they are more influential on the trees made by concatenation. Given the choice of calibration scheme and clock models, divergence time estimates are robust to the choice of substitution models, but removing alignments deemed problematic by trimming algorithms can lead to more recent dates. Although the fossil prior is important in divergence time estimation, Bayesian estimates of divergence times in this data set are driven primarily by the sequence data.
Project description:Here, I review phylogenetic studies of the lizard family Pygopodidae, a group of 47 extant species that diversified in Australia and New Guinea. The goal of this study was to examine published phylogenetic and phylogenomic hypotheses on pygopodids to identify the strengths and weaknesses in our understanding of their phylogeny. Many parts of the pygopodid family tree are well established by multiple independent tree inferences including: (1) all multispecies genera (i.e., <i>Aprasia</i>, <i>Delma</i>, <i>Lialis</i>, <i>Pletholax</i>, and <i>Pygopus</i>) are monophyletic groups; (2) the root of the pygopodid tree is located along the branch leading to the <i>Delma</i> clade, thus showing that <i>Delma</i> is the sister group to all other pygopodid genera; (3) the <i>Aprasia repens</i> group, <i>Delma tincta</i> group, and several other groups of closely related species are demonstrated to be monophyletic entities; and (4) the monotypic <i>Paradelma orientalis</i> is the sister lineage to the <i>Pygopus</i> clade. Based on accumulated phylogenetic evidence, two taxonomic recommendations are given: <i>Paradelma</i> merits generic status rather than being subsumed into <i>Pygopus</i> as some earlier studies had suggested, and the monotypic <i>Aclys concinna</i> should be recognized as a member of <i>Delma</i> (following current practice) until future studies clarify its placement inside or outside the <i>Delma</i> clade. One chronic problem with phylogenetic studies of pygopodids, which has limited the explanatory power of many tree hypotheses, concerns the undersampling of known species. Although the continual addition of newly described species, especially over the past two decades, has been a major reason for these taxon sampling gaps, deficits in species sampling for ingroups and/or outgroups in several studies of pygopodid species complexes has confounded the testing of some ingroup monophyly hypotheses. Ancient hybridization between non-sister lineages may also be confounding attempts to recover the relationships among pygopodids using molecular data. Indeed, such a phenomenon can explain at least five cases of mito-nuclear discordance and conflicts among trees based on nuclear DNA datasets. Another problem has been the lack of consensus on the relationships among most pygopodid genera, an issue that may stem from rapid diversification of these lineages early in the group's history. Despite current weaknesses in our understanding of pygopodid phylogeny, enough evidence exists to clarify many major and minor structural parts of their family tree. Accordingly, a composite tree for the Pygopodidae was able to be synthesized. This novel tree hypothesis contains all recognized pygopodid species and reveals that about half of the clades are corroborated by multiple independent tree hypotheses, while the remaining clades have less empirical support.
Project description:In the age of next-generation sequencing, the number of loci available for phylogenetic analyses has increased by orders of magnitude. But despite this dramatic increase in the amount of data, some phylogenomic studies have revealed rampant gene-tree discordance that can be caused by many historical processes, such as rapid diversification, gene duplication, or reticulate evolution. We used a target enrichment approach to sample 400 single-copy nuclear genes and estimate the phylogenetic relationships of 13 genera in the lichen-forming family Lobariaceae to address the effect of data type (nucleotides and amino acids) and phylogenetic reconstruction method (concatenation and species tree approaches). Furthermore, we examined datasets for evidence of historical processes, such as rapid diversification and reticulate evolution. We found incongruence associated with sequence data types (nucleotide vs. amino acid sequences) and with different methods of phylogenetic reconstruction (species tree vs. concatenation). The resulting phylogenetic trees provided evidence for rapid and reticulate evolution based on extremely short branches in the backbone of the phylogenies. The observed rapid and reticulate diversifications may explain conflicts among gene trees and the challenges to resolving evolutionary relationships. Based on divergence times, the diversification at the backbone occurred near the Cretaceous-Paleogene (K-Pg) boundary (65 Mya) which is consistent with other rapid diversifications in the tree of life. Although some phylogenetic relationships within the Lobariaceae family remain with low support, even with our powerful phylogenomic dataset of up to 376 genes, our use of target-capturing data allowed for the novel exploration of the mechanisms underlying phylogenetic and systematic incongruence.