Project description:In aquatic environments, fungal communities remain little studied despite their taxonomic and functional diversity. To extend the ecological coverage of this group, we conducted an in-depth analysis of fungal sequences within our collection of 3.6 million V4 18S rRNA pyrosequences originating from 319 individual marine (including sea-ice) and freshwater samples from libraries generated within diverse projects studying Arctic and temperate biomes in the past decade. Among the ~1.7 million post-filtered reads of highest taxonomic and phylogenetic quality, 23,263 fungal sequences were identified. The overall mean proportion was 1.35%, but with large variability; for example, from 0.01 to 59% of total sequences for Arctic seawater samples. Almost all sample types were dominated by Chytridiomycota-like sequences, followed by moderate-to-minor contributions of Ascomycota, Cryptomycota and Basidiomycota. Species and/or strain richness was high, with many novel sequences and high niche separation. The affinity of the most common reads to phytoplankton parasites suggests that aquatic fungi deserve renewed attention for their role in algal succession and carbon cycling.
Project description:BACKGROUND: Fungi from environmental samples are typically identified to species level through DNA sequencing of the nuclear ribosomal internal transcribed spacer (ITS) region for use in BLAST-based similarity searches in the International Nucleotide Sequence Databases. These searches are time-consuming and regularly require a significant amount of manual intervention and complementary analyses. We here present software - in the form of an identification pipeline for large sets of fungal ITS sequences - developed to automate the BLAST process and several additional analysis steps. The performance of the pipeline was evaluated on a dataset of 350 ITS sequences from fungi growing as epiphytes on building material. RESULTS: The pipeline was written in Perl and uses a local installation of NCBI-BLAST for the similarity searches of the query sequences. The variable subregion ITS2 of the ITS region is extracted from the sequences and used for additional searches of higher sensitivity. Multiple alignments of each query sequence and its closest matches are computed, and query sequences sharing at least 50% of their best matches are clustered to facilitate the evaluation of hypothetically conspecific groups. The pipeline proved to speed up the processing, as well as enhance the resolution, of the evaluation dataset considerably, and the fungi were found to belong chiefly to the Ascomycota, with Penicillium and Aspergillus as the two most common genera. The ITS2 was found to indicate a different taxonomic affiliation than did the complete ITS region for 10% of the query sequences, though this figure is likely to vary with the taxonomic scope of the query sequences. CONCLUSION: The present software readily assigns large sets of fungal query sequences to their respective best matches in the international sequence databases and places them in a larger biological context. The output is highly structured to be easy to process, although it still needs to be inspected and possibly corrected for the impact of the incomplete and sometimes erroneously annotated fungal entries in these databases. The open source pipeline is available for UNIX-type platforms, and updated releases of the target database are made available biweekly. The pipeline is easily modified to operate on other molecular regions and organism groups.
Project description:Studying fungal biodiversity using data generated from Illumina MiSeq sequencing platforms poses a number of bioinformatic challenges with the analysis typically involving a large number of tools for each analytical step from quality filtering to generating identified operational taxonomic unit (OTU) abundance tables.Here, we introduce PIPITS, an open-source stand-alone suite of software for automated processing of Illumina MiSeq sequences for fungal community analysis. PIPITS exploits a number of state of the art applications to process paired-end reads from quality filtering to producing OTU abundance tables.We provide detailed descriptions of the pipeline and show its utility in the analysis of 9 396 092 sequences generated on the MiSeq platform from Illumina MiSeq. PIPITS is the first automated bioinformatics pipeline dedicated for fungal ITS sequences which incorporates ITSx to extract subregions of ITS and exploits the latest RDP Classifier to classify sequences against the curated UNITE fungal data set.
Project description:The data in this article contains the sequences of fungal Internal Transcribed Spacer (ITS) and 18S rRNA gene from a metagenome of Lonar soda lake, India. Sequences were amplified using fungal specific primers, which amplified the amplicon lined between the 18S and 28S rRNA genes. Data were obtained using Fungal tag-encoded FLX amplicon pyrosequencing (fTEFAP) technique and used to analyze fungal profile by the culture-independent method. Primary analysis using PlutoF 454 pipeline suggests the Lonar lake mycobiome contained the 29 different fungal species. The raw sequencing data used to perform this analysis along with FASTQ file are located in the NCBI Sequence Read Archive (SRA) under accession No. SRX889598 (http://www.ncbi.nlm.nih.gov/sra/SRX889598).
Project description:One of the most crucial steps in high-throughput sequence-based microbiome studies is the taxonomic assignment of sequences belonging to operational taxonomic units (OTUs). Without taxonomic classification, functional and biological information of microbial communities cannot be inferred or interpreted. The internal transcribed spacer (ITS) region of the ribosomal DNA is the conventional marker region for fungal community studies. While bioinformatics pipelines that cluster reads into OTUs have received much attention in the literature, less attention has been given to the taxonomic classification of these sequences, upon which biological inference is dependent.Here we compare how three common fungal OTU taxonomic assignment tools (RDP Classifier, UTAX, and SINTAX) handle ITS fungal sequence data. The classification power, defined as the proportion of assigned OTUs at a given taxonomic rank, varied among the classifiers. Classifiers were generally consistent (assignment of the same taxonomy to a given OTU) across datasets and ranks; a small number of OTUs were assigned unique classifications across programs. We developed CONSTAX (CONSensus TAXonomy), a Python tool that compares taxonomic classifications of the three programs and merges them into an improved consensus taxonomy. This tool also produces summary classification outputs that are useful for downstream analyses.Our results demonstrate that independent taxonomy assignment tools classify unique members of the fungal community, and greater classification power is realized by generating consensus taxonomy of available classifiers with CONSTAX.
Project description:BACKGROUND: Repeat-induced point mutation (RIP) is a fungal-specific genome defence mechanism that alters the sequences of repetitive DNA, thereby inactivating coding genes. Repeated DNA sequences align between mating and meiosis and both sequences undergo C:G to T:A transitions. In most fungi these transitions preferentially affect CpA di-nucleotides thus altering the frequency of certain di-nucleotides in the affected sequences. The majority of previously published in silico analyses were limited to the comparison of ratios of pre- and post-RIP di-nucleotides in putatively RIP-affected sequences - so-called RIP indices. The analysis of RIP is significantly more informative when comparing sequence alignments of repeated sequences. There is, however, a dearth of bioinformatics tools available to the fungal research community for alignment-based RIP analysis of repeat families. RESULTS: We present RIPCAL http://www.sourceforge.net/projects/ripcal, a software tool for the automated analysis of RIP in fungal genomic DNA repeats, which performs both RIP index and alignment-based analyses. We demonstrate the ability of RIPCAL to detect RIP within known RIP-affected sequences of Neurospora crassa and other fungi. We also predict and delineate the presence of RIP in the genome of Stagonospora nodorum - a Dothideomycete pathogen of wheat. We show that RIP has affected different members of the S. nodorum rDNA tandem repeat to different extents depending on their genomic contexts. CONCLUSION: The RIPCAL alignment-based method has considerable advantages over RIP indices for the analysis of whole genomes. We demonstrate its application to the recently published genome assembly of S. nodorum.
Project description:Here, we report the draft genome sequences of three fungal-interactive 10.1601/nm.27008 strains, denoted BS110, BS007 and BS437. Phylogenetic analyses showed that the three strains belong to clade II of the genus 10.1601/nm.1619, which was recently renamed 10.1601/nm.26956. This novel genus primarily contains environmental species, encompassing non-pathogenic plant- as well as fungal-interactive species. The genome of strain BS007 consists of 11,025,273 bp, whereas those of strains BS110 and BS437 have 11,178,081 and 11,303,071 bp, respectively. Analyses of the three annotated genomes revealed the presence of (1) a large suite of substrate capture systems, and (2) a suite of genetic systems required for adaptation to microenvironments in soil and the mycosphere. Thus, genes encoding traits that potentially confer fungal interactivity were found, such as type 4 pili, type 1, 2, 3, 4 and 6 secretion systems, and biofilm formation (PGA, alginate and pel) and glycerol uptake systems. Furthermore, the three genomes also revealed the presence of a highly conserved five-gene cluster that had previously been shown to be upregulated upon contact with fungal hyphae. Moreover, a considerable number of prophage-like and CRISPR spacer sequences was found, next to genetic systems responsible for secondary metabolite production. Overall, the three 10.1601/nm.27008 strains possess the genetic repertoire necessary for adaptation to diverse soil niches, including those influenced by soil fungi.
Project description:BACKGROUND:The submission of DNA sequences to public sequence databases is an essential, but insufficiently automated step in the process of generating and disseminating novel DNA sequence data. Despite the centrality of database submissions to biological research, the range of available software tools that facilitate the preparation of sequence data for database submissions is low, especially for sequences generated via plant and fungal DNA barcoding. Current submission procedures can be complex and prohibitively time expensive for any but a small number of input sequences. A user-friendly software tool is needed that streamlines the file preparation for database submissions of DNA sequences that are commonly generated in plant and fungal DNA barcoding. METHODS:A Python package was developed that converts DNA sequences from the common EMBL and GenBank flat file formats to submission-ready, tab-delimited spreadsheets (so-called 'checklists') for a subsequent upload to the annotated sequence section of the European Nucleotide Archive (ENA). The software tool, titled 'EMBL2checklists', automatically converts DNA sequences, their annotation features, and associated metadata into the idiosyncratic format of marker-specific ENA checklists and, thus, generates files that can be uploaded via the interactive Webin submission system of ENA. RESULTS:EMBL2checklists provides a simple, platform-independent tool that automates the conversion of common DNA barcoding sequences into easily editable spreadsheets that require no further processing but their upload to ENA via the interactive Webin submission system. The software is equipped with an intuitive graphical as well as an efficient command-line interface for its operation. The utility of the software is illustrated by its application in four recent investigations, including plant phylogenetic and fungal metagenomic studies. DISCUSSION:EMBL2checklists bridges the gap between common software suites for DNA sequence assembly and annotation and the interactive data submission process of ENA. It represents an easy-to-use solution for plant and fungal biologists without bioinformatics expertise to generate submission-ready checklists from common DNA sequence data. It allows the post-processing of checklists as well as work-sharing during the submission process and solves a critical bottleneck in the effort to increase participation in public data sharing.
Project description:Terminal restriction fragment length polymorphism (TRFLP) profiling of the internally transcribed spacer (ITS) ribosomal DNA of unknown fungal communities is currently unsupported by a broad-range enzyme-choosing rationale. An in silico study of terminal fragment size distribution was therefore performed following virtual digestion (by use of a set of commercially available 135 type IIP restriction endonucleases) of all published fungal ITS sequences putatively annealing to primers ITS1 and ITS4. Different diversity measurements were used to rank primer-enzyme pairs according to the richness and evenness that they showed. Top-performing pairs were hierarchically clustered to test for data dependency. The enzyme set composed of MaeII, BfaI, and BstNI returned much better results than randomly chosen enzyme sets in computer simulations and is therefore recommended for in vitro TRFLP profiling of fungal ITSs.