LVTree Viewer: An Interactive Display for the All-Species Living Tree Incorporating Automatic Comparison with Prokaryotic Systematics.
ABSTRACT: We describe an interactive viewer for the All-Species Living Tree (LVTree). The viewer incorporates treeing and lineage information from the ARB-SILVA website. It allows collapsing the tree branches at different taxonomic ranks and expanding the collapsed branches as well, keeping the overall topology of the tree unchanged. It also enables the user to observe the consequence of trial lineage modifications by re-collapsing the tree. The system reports taxon statistics at all ranks automatically after each collapsing and re-collapsing. These features greatly facilitate the comparison of the 16S rRNA sequence phylogeny with prokaryotic taxonomy in a taxon by taxon manner. In view of the fact that the present prokaryotic systematics is largely based on 16S rRNA sequence analysis, the current viewer may help reveal discrepancies between phylogeny and taxonomy. As an application, we show that in the latest release of LVTree, based on 11,939 rRNA sequences, as few as 24 lineage modifications are enough to bring all but two phyla (Proteobacteria and Firmicutes) to monophyletic clusters.
Project description:A faithful phylogeny and an objective taxonomy for prokaryotes should agree with each other and ultimately follow the genome data. With the number of sequenced genomes reaching tens of thousands, both tree inference and detailed comparison with taxonomy are great challenges. We now provide one solution in the latest Release 3.0 of the alignment-free and whole-genome-based web server CVTree3. The server resides in a cluster of 64 cores and is equipped with an interactive, collapsible, and expandable tree display. It is capable of comparing the tree branching order with prokaryotic classification at all taxonomic ranks from domains down to species and strains. CVTree3 allows for inquiry by taxon names and trial on lineage modifications. In addition, it reports a summary of monophyletic and non-monophyletic taxa at all ranks as well as produces print-quality subtree figures. After giving an overview of retrospective verification of the CVTree approach, the power of the new server is described for the mega-classification of prokaryotes and determination of taxonomic placement of some newly-sequenced genomes. A few discrepancies between CVTree and 16S rRNA analyses are also summarized with regard to possible taxonomic revisions. CVTree3 is freely accessible to all users at http://tlife.fudan.edu.cn/cvtree3/ without login requirements.
Project description:A monospecific genus contains a single species ever since it was proposed. Though formally more than half of the known prokaryotic genera are monospecific, we pick up those which actually raise taxonomic problems by violating monophyly of the taxon within which it resides. Taking monophyly as a guiding principle, our arguments are based on simultaneous support from 16S rRNA sequence analysis and whole-genome phylogeny of prokaryotes, as provided by the LVTree Viewer and CVTree Web Server, respectively. The main purpose of this study consists in calling attention to this specific way of global taxonomic analysis. Therefore, we refrain from making formal emendations for the time being.
Project description:Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a 'taxonomy to tree' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408,315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/.
Project description:Sequencing of the 16S ribosomal RNA (rRNA) gene is widely used to survey microbial communities. Specialized 16S rRNA databases have been developed to support this approach including Greengenes, RDP and SILVA. Most taxonomy annotations in these databases are predictions from sequence rather than authoritative assignments based on studies of type strains or isolates. In this work, I investigated the taxonomy annotations and guide trees provided by these databases. Using a blinded test, I estimated that the annotation error rate of the RDP database is ?10%. The branching orders of the Greengenes and SILVA guide trees were found to disagree at comparable rates with each other and with taxonomy annotations according to the training set (authoritative reference) provided by RDP, indicating that the trees have comparable quality. Pervasive conflicts between tree branching order and type strain taxonomies strongly suggest that the guide trees are unreliable guides to phylogeny. I found 249,490 identical sequences with conflicting annotations in SILVA v128 and Greengenes v13.5 at ranks up to phylum (7,804 conflicts), indicating that the annotation error rate in these databases is ?17%.
Project description:Genomic information has already been applied to prokaryotic species definition and classification. However, the contribution of the genome sequence to prokaryotic genus delimitation has been less studied. To gain insights into genus definition for the prokaryotes, we attempted to reveal the genus-level genomic differences in the current prokaryotic classification system and to delineate the boundary of a genus on the basis of genomic information. The average nucleotide sequence identity between two genomes can be used for prokaryotic species delineation, but it is not suitable for genus demarcation. We used the percentage of conserved proteins (POCP) between two strains to estimate their evolutionary and phenotypic distance. A comprehensive genomic survey indicated that the POCP can serve as a robust genomic index for establishing the genus boundary for prokaryotic groups. Basically, two species belonging to the same genus would share at least half of their proteins. In a specific lineage, the genus and family/order ranks showed slight or no overlap in terms of POCP values. A prokaryotic genus can be defined as a group of species with all pairwise POCP values higher than 50%. Integration of whole-genome data into the current taxonomy system can provide comprehensive information for prokaryotic genus definition and delimitation.
Project description:Sponges (Porifera) are abundant and diverse members of benthic filter feeding communities in most marine ecosystems, from the deep sea to tropical reefs. A characteristic feature is the associated dense and diverse prokaryotic community present within the sponge mesohyl. Previous molecular genetic studies revealed the importance of host identity for the community composition of the sponge-associated microbiota. However, little is known whether sponge host-specific prokaryotic community patterns observed at 97% 16S rRNA gene sequence similarity are consistent at high taxonomic ranks (from genus to phylum level). In the present study, we investigated the prokaryotic community structure and variation of 24 sponge specimens (seven taxa) and three seawater samples from Sweden. Results show that the resemblance of prokaryotic communities at different taxonomic ranks is consistent with patterns present at 97% operational taxonomic unit level.
Project description:The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial community. Multiple genomes contained in a metagenomic sample can be identified and quantitated through homology searches of sequence reads with known sequences catalogued in reference databases. Traditionally, reads with multiple genomic hits are assigned to non-specific or high ranks of the taxonomy tree, thereby impacting on accurate estimates of relative abundance of multiple genomes present in a sample. Instead of assigning reads one by one to the taxonomy tree as many existing methods do, we propose a statistical framework to model the identified candidate genomes to which sequence reads have hits. After obtaining the estimated proportion of reads generated by each genome, sequence reads are assigned to the candidate genomes and the taxonomy tree based on the estimated probability by taking into account both sequence alignment scores and estimated genome abundance. The proposed method is comprehensively tested on both simulated datasets and two real datasets. It assigns reads to the low taxonomic ranks very accurately. Our statistical approach of taxonomic assignment of metagenomic reads, TAMER, is implemented in R and available at http://faculty.wcas.northwestern.edu/hji403/MetaR.htm.
Project description:The family Richtersiidae, although established recently with the use of phylogenetic methods, was considered potentially paraphyletic at the time of its erection. Until now, the family comprised four genera, Richtersius, Diaforobiotus, Adorybiotus and a newly erected genus Crenubiotus. However, the genetic characterisation for the latter two genera was very limited or absent. To address concerns about the phylogenetic affinity of these two genera, we present a multilocus phylogeny of the families Richtersiidae and Murrayidae based on four molecular markers (18S rRNA, 28S rRNA, ITS-2 and COI). Our results show a distinct evolutionary lineage composed of Adorybiotus and Crenubiotus, which is sister to Murrayidae. In order to accommodate the phylogenetic and morphological distinctiveness of this lineage, we erect a new family, Adorybiotidae fam. nov. The new taxon differs morphologically from other families in the superfamily Macrobiotoidea by a unique combination of traits: (1) the presence of tubercles/cushions with aggregations of microgranules on their surfaces present on all legs and on the dorso-caudal cuticle, (2) a system of internal septa in claws, and (3) buccal apparatus morphology. Moreover, in order to stabilise the taxonomy and nomenclature in the genus Crenubiotus, we redescribe its type species, Crenubiotus crenulatus, by means of integrative taxonomy and designate a new neotype based on a population from the original terra typica.
Project description:The diversity of gymnotid electric fishes has been intensely studied over the past 25 years, with 35 species named since 1994, compared to 11 species in the previous 236 years. Substantial effort has also been applied in recent years to documenting gymnotid interrelationships, with seven systematic studies published using morphological and molecular datasets. Nevertheless, until now, all gymnotids have been assigned to one of just two supraspecific taxa, the subfamily Electrophorinae with one genus Electrophorus and three valid species and the subfamily Gymnotine also with one genus Gymnotus and 43 valid species. This simple classification has obscured the substantial phenotypic and lineage diversity within the subfamily Gymnotine and hampered ecological and evolutionary studies of gymnotid biology. Here we present the most well-resolved and taxon-complete phylogeny of the Gymnotidae to date, including materials from all but one of the valid species. This phylogeny was constructed using a five-gene molecular dataset and a 115-character morphological dataset, enabling the inclusion of several species for which molecular data are still lacking. This phylogeny was time-calibrated using biogeographical priors in the absence of a fossil record. The tree topology is similar to those of previous studies, recovering all the major clades previously recognized with informal names. We propose a new gymnotid classification including two subfamilies (Electrophorinae and Gymnotinae) and six subgenera within the genus Gymnotus. Each subgenus exhibits a distinctive biogeographic distribution, within which most species have allopatric distributions and the subgenera are diverged from one another by an estimated 5-35 million years. We further provide robust taxonomic diagnoses, descriptions and identification keys to all gymnotid subgenera and all but four species. This new taxonomy more equitably partitions species diversity among supra-specific taxa, employing the previously vacant subgenus and subfamily ranks. This new taxonomy renders known gymnotid diversity more accessible to study by highlighting the deep divergences (chronological, geographical, genetic and morphological) among its several clades.
Project description:Comparable taxonomic ranks within clades can facilitate more consistent classifications and objective comparisons among taxa. Here we use a temporal approach to identify taxonomic ranks. This is an extension of the temporal banding approach including a Temporal Error Score that finds an objective cut-off for each taxonomic rank using information for the current classification. We illustrate this method using a data set of the lichenized fungal family Parmeliaceae. To assess its performance, we simulated the effect of taxon sampling and compared our method with the other temporal banding method. For our sampled phylogeny, 11 of the 12 included families remained intact and 55 genera were confirmed, whereas 32 genera were lumped and 15 genera were split. Taxon sampling impacted the method at the genus level, whereas yielded only insignificant changes at the family level. The other available temporal approach also gives a similar cutoff point to our method. Our approach to identify taxonomic ranks enables taxonomists to revise and propose classifications on an objective basis, changing ranks of clades only when inconsistent with most taxa in a phylogenetic tree. An R script to find the time point with the minimal temporal error is provided.