Project description:An unexpectedly large fraction of genes in metazoans (human, mouse, zebrafish, worm, fruit fly) express high levels of circularized RNAs containing canonical exons. Here we report that circular RNA isoforms are found in diverse species whose most recent common ancestor existed more than one billion years ago: fungi (Schizosaccharomyces pombe and Saccharomyces cerevisiae), a plant (Arabidopsis thaliana), and protists (Plasmodium falciparum and Dictyostelium discoideum). For all species studied to date, including those in this report, only a small fraction of the theoretically possible circular RNA isoforms from a given gene are actually observed. Unlike metazoans, Arabidopsis, D. discoideum, P. falciparum, S. cerevisiae, and S. pombe have very short introns (∼ 100 nucleotides or shorter), yet they still produce circular RNAs. A minority of genes in S. pombe and P. falciparum have documented examples of canonical alternative splicing, making it unlikely that all circular RNAs are by-products of alternative splicing or 'piggyback' on signals used in alternative RNA processing. In S. pombe, the relative abundance of circular to linear transcript isoforms changed in a gene-specific pattern during nitrogen starvation. Circular RNA may be an ancient, conserved feature of eukaryotic gene expression programs.
Project description:Rapid advances in DNA sequencing technologies have resulted in the accumulation of large data sets in the public domain, facilitating comparative studies to provide novel insights into the evolution of life. Phylogenetic studies across the eukaryotic taxa have been reported but on the basis of a limited number of genes. Here we present a genome-wide analysis across different plant, fungal, protist, and animal species, with reference to the 36,002 expressed genes of the rice genome. Our analysis revealed 9831 genes unique to rice and 98 genes conserved across all 49 eukaryotic species analysed. The 98 genes conserved across diverse eukaryotes mostly exhibited binding and catalytic activities and shared common sequence motifs; and hence appeared to have a common origin. The 98 conserved genes belonged to 22 functional gene families including 26S protease, actin, ADP-ribosylation factor, ATP synthase, casein kinase, DEAD-box protein, DnaK, elongation factor 2, glyceraldehyde 3-phosphate, phosphatase 2A, ras-related protein, Ser/Thr protein phosphatase family protein, tubulin, ubiquitin and others. The consensus Bayesian eukaryotic tree of life developed in this study demonstrated widely separated clades of plants, fungi, and animals. Musa acuminata provided an evolutionary link between monocotyledons and dicotyledons, and Salpingoeca rosetta provided an evolutionary link between fungi and animals, which indicating that protozoan species are close relatives of fungi and animals. The divergence times for 1176 species pairs were estimated accurately by integrating fossil information with synonymous substitution rates in the comprehensive set of 98 genes. The present study provides valuable insight into the evolution of eukaryotes.
Project description:The first analyses of gene sequence data indicated that the eukaryotic tree of life consisted of a long stem of microbial groups "topped" by a crown-containing plants, animals, and fungi and their microbial relatives. Although more recent multigene concatenated analyses have refined the relationships among the many branches of eukaryotes, the root of the eukaryotic tree of life has remained elusive. Inferring the root of extant eukaryotes is challenging because of the age of the group (?1.7-2.1 billion years old), tremendous heterogeneity in rates of evolution among lineages, and lack of obvious outgroups for many genes. Here, we reconstruct a rooted phylogeny of extant eukaryotes based on minimizing the number of duplications and losses among a collection of gene trees. This approach does not require outgroup sequences or assumptions of orthology among sequences. We also explore the impact of taxon and gene sampling and assess support for alternative hypotheses for the root. Using 20 gene trees from 84 diverse eukaryotic lineages, this approach recovers robust eukaryotic clades and reveals evidence for a eukaryotic root that lies between the Opisthokonta (animals, fungi and their microbial relatives) and all remaining eukaryotes.
Project description:High-throughput sequencing of reduced representation libraries obtained through digestion with restriction enzymes--generically known as restriction site associated DNA sequencing (RAD-seq)--is a common strategy to generate genome-wide genotypic and sequence data from eukaryotes. A critical design element of any RAD-seq study is knowledge of the approximate number of genetic markers that can be obtained for a taxon using different restriction enzymes, as this number determines the scope of a project, and ultimately defines its success. This number can only be directly determined if a reference genome sequence is available, or it can be estimated if the genome size and restriction recognition sequence probabilities are known. However, both scenarios are uncommon for nonmodel species. Here, we performed systematic in silico surveys of recognition sequences, for diverse and commonly used type II restriction enzymes across the eukaryotic tree of life. Our observations reveal that recognition sequence frequencies for a given restriction enzyme are strikingly variable among broad eukaryotic taxonomic groups, being largely determined by phylogenetic relatedness. We demonstrate that genome sizes can be predicted from cleavage frequency data obtained with restriction enzymes targeting "neutral" elements. Models based on genomic compositions are also effective tools to accurately calculate probabilities of recognition sequences across taxa, and can be applied to species for which reduced representation data are available (including transcriptomes and neutral RAD-seq data sets). The analytical pipeline developed in this study, PredRAD (https://github.com/phrh/PredRAD), and the resulting databases constitute valuable resources that will help guide the design of any study using RAD-seq or related methods.
Project description:Messenger RNA (mRNA) has broad potential for application in biological systems. However, one fundamental limitation to its use is its relatively short half-life in biological systems. Here we develop exogenous circular RNA (circRNA) to extend the duration of protein expression from full-length RNA messages. First, we engineer a self-splicing intron to efficiently circularize a wide range of RNAs up to 5 kb in length in vitro by rationally designing ubiquitous accessory sequences that aid in splicing. We maximize translation of functional protein from these circRNAs in eukaryotic cells, and we find that engineered circRNA purified by high performance liquid chromatography displays exceptional protein production qualities in terms of both quantity of protein produced and stability of production. This study pioneers the use of exogenous circRNA for robust and stable protein expression in eukaryotic cells and demonstrates that circRNA is a promising alternative to linear mRNA.
Project description:An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations underlying the diversity of microbial and macroscopic (e.g., plants and animals) eukaryotes. Previous work has divided eukaryotic diversity into a small number of high-level "supergroups," many of which receive strong support in phylogenomic analyses. However, the abundance of data in phylogenomic analyses can lead to highly supported but incorrect relationships due to systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or fewer) included in these genomic studies may exaggerate systematic error and reduce power to evaluate hypotheses. Here, we use a taxon-rich strategy to assess eukaryotic relationships. We show that analyses emphasizing broad taxonomic sampling (up to 451 taxa representing 72 major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic tree of life. The consistency across analyses with varying numbers of taxa (88-451) and levels of missing data (17-69%) supports the accuracy of the resulting topologies. The resulting stable topology emerges without the removal of rapidly evolving genes or taxa, a practice common to phylogenomic analyses. Several major groups are stable and strongly supported in these analyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup "Chromalveolata" is rejected. Furthermore, extensive instability among photosynthetic lineages suggests the presence of systematic biases including endosymbiotic gene transfer from symbiont (nucleus or plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionary relationships can be achieved with broad taxonomic sampling and a moderate number of genes. Finally, taxon-rich analyses such as presented here provide a method for testing the accuracy of relationships that receive high bootstrap support (BS) in phylogenomic analyses and enable placement of the multitude of lineages that lack genome scale data.
Project description:By modeling the homoeologous gene losses that occurred in 50 genomes deriving from ten distinct polyploidy events, we show that the evolutionary forces acting on polyploids are remarkably similar, regardless of whether they occur in flowering plants, ciliates, fishes, or yeasts. We show that many of the events show a relative rate of duplicate gene loss before the first postpolyploidy speciation that is significantly higher than in later phases of their evolution. The relatively weak selective constraint experienced by the single-copy genes these losses produced leads us to suggest that most of the purely selectively neutral duplicate gene losses occur in the immediate postpolyploid period. Nearly all of the events show strong evidence of biases in the duplicate losses, consistent with them being allopolyploidies, with 2 distinct progenitors contributing to the modern species. We also find ongoing and extensive reciprocal gene losses (alternative losses of duplicated ancestral genes) between these genomes. With the exception of a handful of closely related taxa, all of these polyploid organisms are separated from each other by tens to thousands of reciprocal gene losses. As a result, it is very unlikely that viable diploid hybrid species could form between these taxa, since matings between such hybrids would tend to produce offspring lacking essential genes. It is, therefore, possible that the relatively high frequency of recurrent polyploidies in some lineages may be due to the ability of new polyploidies to bypass reciprocal gene loss barriers.
Project description:The use of molecular methods is altering our understanding of the microbial biosphere and the complexity of the tree of life. Here, we report a newly discovered uncultured plastid-bearing eukaryotic lineage named the rappemonads. Phylogenies using near-complete plastid ribosomal DNA (rDNA) operons demonstrate that this group represents an evolutionarily distinct lineage branching with haptophyte and cryptophyte algae. Environmental DNA sequencing revealed extensive diversity at North Atlantic, North Pacific, and European freshwater sites, suggesting a broad ecophysiology and wide habitat distribution. Quantitative PCR analyses demonstrate that the rappemonads are often rare but can form transient blooms in the Sargasso Sea, where high 16S rRNA gene copies mL(-1) were detected in late winter. This pattern is consistent with these microbes being a member of the rare biosphere, whose constituents have been proposed to play important roles under ecosystem change. Fluorescence in situ hybridization revealed that cells from this unique lineage were 6.6 ± 1.2 × 5.7 ± 1.0 ?m, larger than numerically dominant open-ocean phytoplankton, and appear to contain two to four plastids. The rappemonads are unique, widespread, putatively photosynthetic algae that are absent from present-day ecosystem models and current versions of the tree of life.
Project description:Explaining the dramatic variation in species richness across the tree of life remains a key challenge in evolutionary biology. At the largest phylogenetic scales, the extreme heterogeneity in species richness observed among different groups of organisms is almost certainly a function of many complex and interdependent factors. However, the most fundamental expectation in macroevolutionary studies is simply that species richness in extant clades should be correlated with clade age: all things being equal, older clades will have had more time for diversity to accumulate than younger clades. Here, we test the relationship between stem clade age and species richness across 1,397 major clades of multicellular eukaryotes that collectively account for more than 1.2 million described species. We find no evidence that clade age predicts species richness at this scale. We demonstrate that this decoupling of age and richness is unlikely to result from variation in net diversification rates among clades. At the largest phylogenetic scales, contemporary patterns of species richness are inconsistent with unbounded diversity increase through time. These results imply that a fundamentally different interpretative paradigm may be needed in the study of phylogenetic diversity patterns in many groups of organisms.
Project description:The successful colonization of new habitats has played a fundamental role during the evolution of life. Salinity is one of the strongest barriers for organisms to cross, which has resulted in the evolution of distinct marine and non-marine (including both freshwater and soil) communities. Although microbes represent by far the vast majority of eukaryote diversity, the role of the salt barrier in shaping the diversity across the eukaryotic tree is poorly known. Traditional views suggest rare and ancient marine/non-marine transitions but this view is being challenged by the discovery of several recently transitioned lineages. Here, we investigate habitat evolution across the tree of eukaryotes using a unique set of taxon-rich phylogenies inferred from a combination of long-read and short-read environmental metabarcoding data spanning the ribosomal DNA operon. Our results show that, overall, marine and non-marine microbial communities are phylogenetically distinct but transitions have occurred in both directions in almost all major eukaryotic lineages, with hundreds of transition events detected. Some groups have experienced relatively high rates of transitions, most notably fungi for which crossing the salt barrier has probably been an important aspect of their successful diversification. At the deepest phylogenetic levels, ancestral habitat reconstruction analyses suggest that eukaryotes may have first evolved in non-marine habitats and that the two largest known eukaryotic assemblages (TSAR and Amorphea) arose in different habitats. Overall, our findings indicate that the salt barrier has played an important role during eukaryote evolution and provide a global perspective on habitat transitions in this domain of life.