A Comprehensive Analysis of Transcript-Supported De Novo Genes in Saccharomyces sensu stricto Yeasts.
ABSTRACT: Novel genes arising from random DNA sequences (de novo genes) have been suggested to be widespread in the genomes of different organisms. However, our knowledge about the origin and evolution of de novo genes is still limited. To systematically understand the general features of de novo genes, we established a robust pipeline to analyze >20,000 transcript-supported coding sequences (CDSs) from the budding yeast Saccharomyces cerevisiae. Our analysis pipeline combined phylogeny, synteny, and sequence alignment information to identify possible orthologs across 20 Saccharomycetaceae yeasts and discovered 4,340?S. cerevisiae-specific de novo genes and 8,871?S. sensu stricto-specific de novo genes. We further combine information on CDS positions and transcript structures to show that >65% of de novo genes arose from transcript isoforms of ancient genes, especially in the upstream and internal regions of ancient genes. Fourteen identified de novo genes with high transcript levels were chosen to verify their protein expressions. Ten of them, including eight transcript isoform-associated CDSs, showed translation signals and five proteins exhibited specific cytosolic localizations. Our results suggest that de novo genes frequently arise in the S. sensu stricto complex and have the potential to be quickly integrated into ancient cellular network.
Project description:Advances in second-generation sequencing of RNA made a near-complete characterization of transcriptomes affordable. However, the reconstruction of full-length mRNAs via de novo RNA-seq assembly is still difficult due to the complexity of eukaryote transcriptomes with highly similar paralogs and multiple alternative splice variants. Here, we present FRAMA, a genome-independent annotation tool for de novo mRNA assemblies that addresses several post-assembly tasks, such as reduction of contig redundancy, ortholog assignment, correction of misassembled transcripts, scaffolding of fragmented transcripts and coding sequence identification.We applied FRAMA to assemble and annotate the transcriptome of the naked mole-rat and assess the quality of the obtained compilation of transcripts with the aid of publicy available naked mole-rat gene annotations. Based on a de novo transcriptome assembly (Trinity), FRAMA annotated 21,984 naked mole-rat mRNAs (12,100 full-length CDSs), corresponding to 16,887 genes. The scaffolding of 3488 genes increased the median sequence information 1.27-fold. In total, FRAMA detected and corrected 4774 misassembled genes, which were predominantly caused by fusion of genes. A comparison with three different sources of naked mole-rat transcripts reveals that FRAMA's gene models are better supported by RNA-seq data than any other transcript set. Further, our results demonstrate the competitiveness of FRAMA to state of the art genome-based transcript reconstruction approaches.FRAMA realizes the de novo construction of a low-redundant transcript catalog for eukaryotes, including the extension and refinement of transcripts. Thereby, results delivered by FRAMA provide the basis for comprehensive downstream analyses like gene expression studies or comparative transcriptomics. FRAMA is available at https://github.com/gengit/FRAMA .
Project description:Members of genus Gordonia are known to degrade various xenobitics and produce secondary metabolites. The genome of a halotorelant phthalic acid ester (PAEs) degrading actinobacterium Gordonia alkanivorans strain YC-RL2 was sequenced using Biosciences RS II platform and Single Molecular Real-Time (SMRT) technology. The reads were assembled de novo by hierarchical genome assembly process (HGAP) algorithm version 2. Genes were annotated by NCBI Prokaryotic Genome Annotation Pipeline. The generated genome sequence was 4,979,656 bp with an average G+C content of 67.45%. Calculation of ANI confirmed previous classification that strain YC-RL2 is G. alkanivorans. The sequences were searched against KEGG and COG databases; 3132 CDSs were assigned to COG families and 1808 CDSs were predicted to be involved in 111 pathways. 95 of the KEGG annotated genes were predicted to be involved in the degradation of xenobiotics. A phthalate degradation operon could not be identified in the genome indicating that strain YC-RL2 possesses a novel way of phthalate degradation. A total of 203 and 22 CDSs were annotated as esterase/hydrolase and dioxygenase genes respectively. A total of 53 biosynthetic gene clusters (BGCs) were predicted by antiSMASH (antibiotics & Secondary Metabolite Analysis Shell) bacterial version 4.0. The genome also contained putative genes for heavy metal metabolism. The strain could tolerate 1 mM of Cd2+, Co2+, Cu2+, Ni2+, Zn2+, Mn2+ and Pb2+ ions. These results show that strain YC-RL2 has a great potential to degrade various xenobiotics in different environments and will provide a rich genetic resource for further biotechnological and remediation studies.
Project description:Lyme disease is caused by spirochaetes of the Borrelia burgdorferi sensu lato genospecies. Complete genome assemblies are available for fewer than ten strains of Borrelia burgdorferi sensu stricto, the primary cause of Lyme disease in North America. MM1 is a sensu stricto strain originally isolated in the midwestern United States. Aside from a small number of genes, the complete genome sequence of this strain has not been reported. Here we present the complete genome sequence of MM1 in relation to other sensu stricto strains and in terms of its Multi Locus Sequence Typing. Our results indicate that MM1 is a new sequence type which contains a conserved main chromosome and 15 plasmids. Our results include the first contiguous 28.5 kb assembly of lp28-8, a linear plasmid carrying the vls antigenic variation system, from a Borrelia burgdorferi sensu stricto strain.
Project description:The edible white rot fungus Lentinula edodes possesses a variety of lignin degrading enzymes such as manganese peroxidases and laccases. Laccases belong to the multicopper oxidases, which have a wide range of catalytic activities including polyphenol degradation and synthesis, lignin degradation, and melanin formation. The exact number of laccases in L. edodes is unknown, as are their complete properties and biological functions. We analyzed the draft genome sequence of L. edodes D703PP-9 and identified 13 multicopper oxidase-encoding genes; 11 laccases in sensu stricto, of which three are new, and two ferroxidases. lcc8, a laccase previously reported in L. edodes, was not identified in D703PP-9 genome. Phylogenetic analysis showed that the 13 multicopper oxidases can be classified into laccase sensu stricto subfamily 1, laccase sensu stricto subfamily 2 and ferroxidases. From sequence similarities and expression patterns, laccase sensu stricto subfamily 1 can be divided into two subgroups. Laccase sensu stricto subfamily 1 group A members are mainly secreted from mycelia, while laccase sensu stricto subfamily 1 group B members are expressed mainly in fruiting bodies during growth or after harvesting but are lowly expressed in mycelia. Laccase sensu stricto subfamily 2 members are mainly expressed in mycelia, and two ferroxidases are mainly expressed in the fruiting body during growth or after harvesting, and are expressed at very low levels in mycelium. Our data suggests that L. edodes laccases in same group share expression patterns and would have common biological functions.
Project description:The de novo assembly of transcriptomes from short shotgun sequences raises challenges due to random and non-random sequencing biases and inherent transcript complexity. We sought to define a pipeline for de novo transcriptome assembly to aid researchers working with emerging model systems where well annotated genome assemblies are not available as a reference. To detail this experimental and computational method, we used early embryos of the sea anemone, Nematostella vectensis, an emerging model system for studies of animal body plan evolution. We performed RNA-seq on embryos up to 24 h of development using Illumina HiSeq technology and evaluated independent de novo assembly methods. The resulting reads were assembled using either the Trinity assembler on all quality controlled reads or both the Velvet and Oases assemblers on reads passing a stringent digital normalization filter. A control set of mRNA standards from the National Institute of Standards and Technology (NIST) was included in our experimental pipeline to invest our transcriptome with quantitative information on absolute transcript levels and to provide additional quality control.We generated >200 million paired-end reads from directional cDNA libraries representing well over 20 Gb of sequence. The Trinity assembler pipeline, including preliminary quality control steps, resulted in more than 86% of reads aligning with the reference transcriptome thus generated. Nevertheless, digital normalization combined with assembly by Velvet and Oases required far less computing power and decreased processing time while still mapping 82% of reads. We have made the raw sequencing reads and assembled transcriptome publically available.Nematostella vectensis was chosen for its strategic position in the tree of life for studies into the origins of the animal body plan, however, the challenge of reference-free transcriptome assembly is relevant to all systems for which well annotated gene models and independently verified genome assembly may not be available. To navigate this new territory, we have constructed a pipeline for library preparation and computational analysis for de novo transcriptome assembly. The gene models defined by this reference transcriptome define the set of genes transcribed in early Nematostella development and will provide a valuable dataset for further gene regulatory network investigations.
Project description:The order Piroplasmida contains a diverse group of intracellular parasites, many of which can cause significant disease in humans, domestic animals, and wildlife. Two piroplasm species have been reported from raccoons (Procyon lotor), Babesia lotori (Babesia sensu stricto clade) and a species related to Babesia microti (called B. microti-like sp.). The goal of this study was to investigate prevalence, distribution, and diversity of Babesia in raccoons. We tested raccoons from selected regions in the United States and Canada for the presence of Babesia sensu stricto and Babesia microti-like sp. piroplasms. Infections of Babesia microti-like sp. were found in nearly all locations sampled, often with high prevalence, while Babesia sensu stricto infections had higher prevalence in the Southeastern United States (20-45% prevalence). Co-infections with both Babesia sp. were common. Sequencing of the partial 18S rRNA and cytochrome oxidase subunit 1 (cox1) genes led to the discovery of two new Babesia species, both found in several locations in the eastern and western United States. One novel Babesia sensu stricto sp. was most similar to Babesia gibsoni while the other Babesia species was present in the 'western piroplasm' group and was related to Babesia conradae. Phylogenetic analysis of the cox1 sequences indicated possible eastern and western genetic variants for the three Babesia sensu stricto species. Additional analyses are needed to characterize these novel species; however, this study indicates there are now at least four species of piroplasms infecting raccoons in the United States and Canada (Babesia microti-like sp., Babesia lotori, a novel Babesia sensu stricto sp., a novel western Babesia sp.) and a possible fifth species (Babesia sensu stricto) in raccoons in Japan.
Project description:The ospC genes of 20 southern Borrelia strains were sequenced. The strains consisted of B. burgdorferi sensu stricto, B. andersonii, B. bissettii, one undescribed genospecies, MI-8, and one probably new Borrelia species, TXW-1. A high degree of similarity exists between B. burgdorferi sensu stricto and B. bissettii and between B. bissettii and B. andersonii. Lateral transfers of the ospC gene probably occurred between B. burgdorferi sensu stricto and B. bissettii and between B. bissettii and B. andersonii. Internal gene recombination appears to occur among them. The highest degree of genetic diversity among them was observed in the two variable domains (V1 and V2), semivariable domain (SV), and the species-specific epitopes (between amino acids 28 and 31). Differences in ospC sequences among southern strains reflect diversity at the strain and genospecies levels. MI-8, which was recognized as an undescribed genospecies in our previous reports, remains distinguishable in our current analysis of ospC genes and is distinct from B. burgdorferi sensu stricto. Interestingly, another undescribed southern isolate, TXW-1, was not amplified under various PCR conditions. Compared to European B. burgdorferi sensu stricto strains, American B. burgdorferi sensu stricto strains show greater genetic heterogeneity. Southern B. burgdorferi sensu stricto, B. andersonii, and B. bissettii isolates were intermixed with each other in the phylogenetic trees. In the derived trees in our work, at least one southeastern strain of B. burgdorferi, MI-2, most closely aligns with a so-called invasive cluster that possesses many proven human-invasive strains. Transmission experiments show that MI-2 and the strains in this group of southern spirochetes are able to infect mice and hamsters and that the typical vector of Lyme disease, Ixodes scapularis, can acquire the spirochetes from infected mammals. Currently, strain MI-2 appears to be the only southern isolate among the 20 we analyzed that clusters with an OspC invasive group and thus might be invasive for humans.
Project description:To examine the role of nucleosome occupancy in the evolution of gene expression, we measured the genome-wide nucleosome profiles of four yeast species, three belonging to the Saccharomyces sensu stricto lineage and the more distantly related Candida glabrata. Nucleosomes and associated promoter elements at C. glabrata genes are typically shifted upstream by ?20 bp, compared to their orthologs from sensu stricto species. Nonetheless, all species display the same global organization features first described for Saccharomyces cerevisiae: a stereotypical nucleosome organization along genes and a division of promoters into those that contain or lack a pronounced nucleosome-depleted region (NDR), with the latter displaying a more dynamic pattern of gene expression. Despite this global similarity, however, nucleosome occupancy at specific genes diverged extensively between sensu stricto and C. glabrata orthologs (?50 million years). Orthologs with dynamic expression patterns tend to maintain their lack of NDR, but apart from that, sensu stricto and C. glabrata orthologs are nearly as similar in nucleosome occupancy patterns as nonorthologous genes. This extensive divergence in nucleosome occupancy contrasts with a conserved pattern of gene expression. Thus, while some evolutionary changes in nucleosome occupancy contribute to gene expression divergence, nucleosome occupancy often diverges extensively with apparently little impact on gene expression.
Project description:The goal of this study was to develop the Listeria species-specific PCR assays based on a house-keeping gene (lmo1634) encoding alcohol acetaldehyde dehydrogenase (Aad), previously designated as Listeria adhesion protein (LAP), and compare results with a label-free light scattering sensor, BARDOT (bacterial rapid detection using optical scattering technology). PCR primer sets targeting the lap genes from the species of Listeria sensu stricto were designed and tested with 47 Listeria and 8 non-Listeria strains. The resulting PCR primer sets detected either all species of Listeria sensu stricto or individual L. innocua, L. ivanovii and L. seeligeri, L. welshimeri, and L. marthii without producing any amplified products from other bacteria tested. The PCR assays with Listeria sensu stricto-specific primers also successfully detected all species of Listeria sensu stricto and/or Listeria innocua from mixed culture-inoculated food samples, and each bacterium in food was verified by using the light scattering sensor that generated unique scatter signature for each species of Listeria tested. The PCR assays based on the house-keeping gene aad (lap) can be used for detection of either all species of Listeria sensu stricto or certain individual Listeria species in a mixture from food with a detection limit of about 10⁴ CFU/mL.
Project description:Here we report the genome sequence of Helicobacter heilmannii sensu stricto ASB1 isolated from the gastric mucosa of a kitten with severe gastritis. Helicobacter heilmannii sensu stricto has also been associated with gastric disease in humans. Availability of this genome sequence will contribute to the identification of genes involved in the pathogen's virulence and carcinogenic properties.