Defining DNA-based operational taxonomic units for microbial-eukaryote ecology.
ABSTRACT: DNA sequence information has increasingly been used in ecological research on microbial eukaryotes. Sequence-based approaches have included studies of the total diversity of selected ecosystems, studies of the autecology of ecologically relevant species, and identification and enumeration of species of interest for human health. It is still uncommon, however, to delineate protistan species based on their genetic signatures. The reluctance to assign species-level designations based on DNA sequences is in part a consequence of the limited amount of sequence information presently available for many free-living microbial eukaryotes and in part a consequence of the problematic nature of and debate surrounding the microbial species concept. Despite the difficulties inherent in assigning species names to DNA sequences, there is a growing need to attach meaning to the burgeoning amount of sequence information entering the literature, and there is a growing desire to apply this information in ecological studies. We describe a computer-based tool that assigns DNA sequences from environmental databases to operational taxonomic units at approximately species-level distinctions. This approach provides a practical method for ecological studies of microbial eukaryotes (primarily protists) by enabling semiautomated analysis of large numbers of samples spanning great taxonomic breadth. Derivation of the algorithm was based on an analysis of complete small-subunit (18S) rRNA gene sequences and partial gene sequences obtained from the GenBank database for morphologically described protistan species. The program was tested using environmental 18S rRNA data sets for two oceanic ecosystems. A total of 388 operational taxonomic units were observed for 2,207 sequences obtained from samples collected in the western North Atlantic and eastern North Pacific oceans.
Project description:Molecular surveys suggest that communities of microbial eukaryotes are remarkably rich, because even large clone libraries seem to capture only a minority of species. This provides a qualitative picture of protistan richness but does not measure its real extent either locally or globally. Statistical analysis can estimate a community's richness, but the specific methods used to date are not always well grounded in statistical theory. Here we study a large protistan molecular survey from an anoxic water column in the Cariaco Basin (Caribbean Sea). We group individual 18S rRNA gene sequences into operational taxonomic units (OTUs) using different cutoff values for sequence similarity (99 to 50%) and systematically apply parametric models and nonparametric estimators to the OTU frequency data to estimate the total protistan diversity. The parametric models provided statistically sound estimates of protistan richness, with biologically meaningful standard errors, maximal data usage, and extensive model diagnostics and were preferable to the available nonparametric tools. Our clone library exceeded 700 clones but still covered only a minority of species and less than half of the larger protistan clades. Our estimates of total protistan richness portray the target community as very rich at all OTU levels, with hundreds of different populations apparently co-occurring in the small (3-liter) volume of our sample, as well as dozens of clades of the highest taxonomic order. These estimates are among the first for microbial eukaryotes that are obtained using state-of-the-art statistical methods and can serve as benchmark numbers for the local diversity of protists.
Project description:Next-generation DNA sequencing (NGS) approaches are rapidly surpassing Sanger sequencing for characterizing the diversity of natural microbial communities. Despite this rapid transition, few comparisons exist between Sanger sequences and the generally much shorter reads of NGS. Operational taxonomic units (OTUs) derived from full-length (Sanger sequencing) and pyrotag (454 sequencing of the V9 hypervariable region) sequences of 18S rRNA genes from 10 global samples were analyzed in order to compare the resulting protistan community structures and species richness. Pyrotag OTUs called at 98% sequence similarity yielded numbers of OTUs that were similar overall to those for full-length sequences when the latter were called at 97% similarity. Singleton OTUs strongly influenced estimates of species richness but not the higher-level taxonomic composition of the community. The pyrotag and full-length sequence data sets had slightly different taxonomic compositions of rhizarians, stramenopiles, cryptophytes, and haptophytes, but the two data sets had similarly high compositions of alveolates. Pyrotag-based OTUs were often derived from sequences that mapped to multiple full-length OTUs at 100% similarity. Thus, pyrotags sequenced from a single hypervariable region might not be appropriate for establishing protistan species-level OTUs. However, nonmetric multidimensional scaling plots constructed with the two data sets yielded similar clusters, indicating that beta diversity analysis results were similar for the Sanger and NGS sequences. Short pyrotag sequences can provide holistic assessments of protistan communities, although care must be taken in interpreting the results. The longer reads (>500 bp) that are now becoming available through NGS should provide powerful tools for assessing the diversity of microbial eukaryotic assemblages.
Project description:BACKGROUND: Recent advances in sequencing strategies make possible unprecedented depth and scale of sampling for molecular detection of microbial diversity. Two major paradigm-shifting discoveries include the detection of bacterial diversity that is one to two orders of magnitude greater than previous estimates, and the discovery of an exciting 'rare biosphere' of molecular signatures ('species') of poorly understood ecological significance. We applied a high-throughput parallel tag sequencing (454 sequencing) protocol adopted for eukaryotes to investigate protistan community complexity in two contrasting anoxic marine ecosystems (Framvaren Fjord, Norway; Cariaco deep-sea basin, Venezuela). Both sampling sites have previously been scrutinized for protistan diversity by traditional clone library construction and Sanger sequencing. By comparing these clone library data with 454 amplicon library data, we assess the efficiency of high-throughput tag sequencing strategies. We here present a novel, highly conservative bioinformatic analysis pipeline for the processing of large tag sequence data sets. RESULTS: The analyses of ca. 250,000 sequence reads revealed that the number of detected Operational Taxonomic Units (OTUs) far exceeded previous richness estimates from the same sites based on clone libraries and Sanger sequencing. More than 90% of this diversity was represented by OTUs with less than 10 sequence tags. We detected a substantial number of taxonomic groups like Apusozoa, Chrysomerophytes, Centroheliozoa, Eustigmatophytes, hyphochytriomycetes, Ichthyosporea, Oikomonads, Phaeothamniophytes, and rhodophytes which remained undetected by previous clone library-based diversity surveys of the sampling sites. The most important innovations in our newly developed bioinformatics pipeline employ (i) BLASTN with query parameters adjusted for highly variable domains and a complete database of public ribosomal RNA (rRNA) gene sequences for taxonomic assignments of tags; (ii) a clustering of tags at k differences (Levenshtein distance) with a newly developed algorithm enabling very fast OTU clustering for large tag sequence data sets; and (iii) a novel parsing procedure to combine the data from individual analyses. CONCLUSION: Our data highlight the magnitude of the under-sampled 'protistan gap' in the eukaryotic tree of life. This study illustrates that our current understanding of the ecological complexity of protist communities, and of the global species richness and genome diversity of protists, is severely limited. Even though 454 pyrosequencing is not a panacea, it allows for more comprehensive insights into the diversity of protistan communities, and combined with appropriate statistical tools, enables improved ecological interpretations of the data and projections of global diversity.
Project description:Microbial diversity and distribution are topics of intensive research. In two companion papers in this issue, we describe the results of the Cariaco Microbial Observatory (Caribbean Sea, Venezuela). The Basin contains the largest body of marine anoxic water, and presents an opportunity to study protistan communities across biogeochemical gradients. In the first paper, we survey 18S ribosomal RNA (rRNA) gene sequence diversity using both Sanger- and pyrosequencing-based approaches, employing multiple PCR primers, and state-of-the-art statistical analyses to estimate microbial richness missed by the survey. Sampling the Basin at three stations, in two seasons, and at four depths with distinct biogeochemical regimes, we obtained the largest, and arguably the least biased collection of over 6000 nearly full-length protistan rRNA gene sequences from a given oceanographic regime to date, and over 80,000 pyrosequencing tags. These represent all major and many minor protistan taxa, at frequencies globally similar between the two sequence collections. This large data set provided, via the recently developed parametric modeling, the first statistically sound prediction of the total size of protistan richness in a large and varied environment, such as the Cariaco Basin: over 36,000 species, defined as almost full-length 18S rRNA gene sequence clusters sharing over 99% sequence homology. This richness is a small fraction of the grand total of known protists (over 100,000-500,000 species), suggesting a degree of protistan endemism.
Project description:Present knowledge of microbial diversity is decidedly incomplete (S. J. Giovannoni and M. S. Rappé, p. 47-84, in D. Kirchman, ed., Microbial Ecology of the Oceans, 2000; E. Stackebrandt and T. M. Embley, p. 57-75, in R. R. Colwell and D. J. Grimes, ed., Nonculturable Microorganisms in the Environment, 2000). Protistan phylogenies are particularly deficient and undoubtedly exclude clades of principal ecological and evolutionary importance (S. L. Baldauf, Science 300:1703-1706, 2003). The rRNA approach has been extraordinarily successful in expanding the global prokaryotic record (S. J. Giovannoni and M. S. Rappé, p. 47-84, in D. Kirchman, ed., Microbial Ecology of the Oceans, 2000; E. Stackebrandt and T. M. Embley, p. 57-75, in R. R. Colwell and D. J. Grimes, ed., Nonculturable Microorganisms in the Environment, 2000) but has rarely been used in protistan discovery. Here we report the first application of the 18S rRNA approach to a permanently anoxic environment, the Cariaco Basin off the Venezuelan coast. On the basis of rRNA sequences, we uncovered a substantial number of novel protistan lineages. These included new clades of the highest taxonomic level unrelated to any known eukaryote as well as deep branches within established protistan groups. Three novel lineages branch at the base of the eukaryotic evolutionary tree preceding, contemporary with, or immediately following the earliest eukaryotic branches. These newly discovered protists may retain traits reminiscent of an early eukaryotic ancestor(s).
Project description:BACKGROUND: The impact of climate on biodiversity is indisputable. Climate changes over geological time must have significantly influenced the evolution of biodiversity, ultimately leading to its present pattern. Here we consider the paleoclimate data record, inferring that present-day hot and cold environments should contain, respectively, the largest and the smallest diversity of ancestral lineages of microbial eukaryotes. METHODOLOGY/PRINCIPAL FINDINGS: We investigate this hypothesis by analyzing an original dataset of 18S rRNA gene sequences from Western Greenland in the Arctic, and data from the existing literature on 18S rRNA gene diversity in hydrothermal vent, temperate sediments, and anoxic water column communities. Unexpectedly, the community from the cold environment emerged as one of the richest observed to date in protistan species, and most diverse in ancestral lineages. CONCLUSIONS/SIGNIFICANCE: This pattern is consistent with natural selection sweeps on aerobic non-psychrophilic microbial eukaryotes repeatedly caused by low temperatures and global anoxia of snowball Earth conditions. It implies that cold refuges persisted through the periods of greenhouse conditions, which agrees with some, although not all, current views on the extent of the past global cooling and warming events. We therefore identify cold environments as promising targets for microbial discovery.
Project description:To resolve the fine-scale architecture of anoxic protistan communities, we conducted a cultivation-independent 18S rRNA survey in the superanoxic Framvaren Fjord in Norway. We generated three clone libraries along the steep O(2)/H(2)S gradient, using the multiple-primer approach. Of 1,100 clones analyzed, 753 proved to be high-quality protistan target sequences. These sequences were grouped into 92 phylotypes, which displayed high protistan diversity in the fjord (17 major eukaryotic phyla). Only a few were closely related to known taxa. Several sequences were dissimilar to all previously described sequences and occupied a basal position in the inferred phylogenies, suggesting that the sequences recovered were derived from novel, deeply divergent eukaryotes. We detected sequence clades with evolutionary importance (for example, clades in the euglenozoa) and clades that seem to be specifically adapted to anoxic environments, challenging the hypothesis that the global dispersal of protists is uniform. Moreover, with the detection of clones affiliated with jakobid flagellates, we present evidence that primitive descendants of early eukaryotes are present in this anoxic environment. To estimate sample coverage and phylotype richness, we used parametric and nonparametric statistical methods. The results show that although our data set is one of the largest published inventories, our sample missed a substantial proportion of the protistan diversity. Nevertheless, statistical and phylogenetic analyses of the three libraries revealed the fine-scale architecture of anoxic protistan communities, which may exhibit adaptation to different environmental conditions along the O(2)/H(2)S gradient.
Project description:Characterizing ecological relationships between viruses, bacteria and protists in the ocean are critical to understanding ecosystem function, yet these relationships are infrequently investigated together. We evaluated these relationships through microbial association network analysis of samples collected approximately monthly from March 2008 to January 2011 in the surface ocean (0-5 m) at the San Pedro Ocean Time series station. Bacterial, T4-like myoviral and protistan communities were described by Automated Ribosomal Intergenic Spacer Analysis and terminal restriction fragment length polymorphism of the gene encoding the major capsid protein (g23) and 18S ribosomal DNA, respectively. Concurrent shifts in community structure suggested similar timing of responses to environmental and biological parameters. We linked T4-like myoviral, bacterial and protistan operational taxonomic units by local similarity correlations, which were then visualized as association networks. Network links (correlations) potentially represent synergistic and antagonistic relationships such as viral lysis, grazing, competition or other interactions. We found that virus-bacteria relationships were more cross-linked than protist-bacteria relationships, suggestive of increased taxonomic specificity in virus-bacteria relationships. We also found that 80% of bacterial-protist and 74% of bacterial-viral correlations were positive, with the latter suggesting that at monthly and seasonal timescales, viruses may be following their hosts more often than controlling host abundance.
Project description:Operational Taxonomic Units (OTUs), usually defined as clusters of similar 16S/18S rRNA sequences, are the most widely used basic diversity units in large-scale characterizations of microbial communities. However, it remains unclear how well the various proposed OTU clustering algorithms approximate 'true' microbial taxa. Here, we explore the ecological consistency of OTUs--based on the assumption that, like true microbial taxa, they should show measurable habitat preferences (niche conservatism). In a global and comprehensive survey of available microbial sequence data, we systematically parse sequence annotations to obtain broad ecological descriptions of sampling sites. Based on these, we observe that sequence-based microbial OTUs generally show high levels of ecological consistency. However, different OTU clustering methods result in marked differences in the strength of this signal. Assuming that ecological consistency can serve as an objective external benchmark for cluster quality, we conclude that hierarchical complete linkage clustering, which provided the most ecologically consistent partitions, should be the default choice for OTU clustering. To our knowledge, this is the first approach to assess cluster quality using an external, biologically meaningful parameter as a benchmark, on a global scale.
Project description:BACKGROUND AND OBJECTIVES:Phytoplanktons are organisms with a very high diversities and global distribution in different habitats. The high distribution of phytoplankton is due to ecological flexibility and their ability to tolerate different climatic conditions and environmental stress. Phytoplankton is the most sensitive biological indicators of water resources. The purpose of this study was to identify the phytoplankton species with emphasis on DNA bar-coding method. The study of phytoplankton variation and the identification of their species composition can provide useful information about the water quality. MATERIALS AND METHODS:In this research project, a clone library of the ribosomal small subunit RNA gene (18S rDNA) in the nuclear genome was constructed by PCR using A and SSU-inR1 primers, and then, after examining the clones, selected clones were sequenced. RESULTS:Eleven analyzed sequences were identified correctly and characterized by a similarity search of the GenBank database using BLAST (NCBI). In this study, we revealed a wide range of taxonomic groups in the Alveolata (Ciliphora and Dinophyceae), Stramenopiles (Bacillariophyta and Bicosoecida), Rhodophyta and Haptophyceae. Moreover, we found species of fungi and Metazoa (Arthropoda). Most of the sequences were previously unknown but could still be assigned to important marine phyla. CONCLUSION:Clone library of 18S rDNA is an accurate method to identify marine specimens and it is recommended as an efficient method for phylogenic studies in marine environments. There seems to be a high diversity and abundance of small eukaryotes in the marine regions of Persian Gulf.