Project description:The Mexican axolotl (Ambystoma mexicanum) is a critically endangered species and a fruitful amphibian model for regenerative biology. Despite growing body of research on the cellular and molecular biology of axolotl limb regeneration, microbiological aspects of this process remain poorly understood. Here, we describe bacterial 16S rRNA amplicon dataset derived from axolotl limb tissue samples in the course of limb regeneration. The raw data was obtained by sequencing V3-V4 region of 16S rRNA gene and comprised 14,569,756 paired-end raw reads generated from 21 samples. Initial data analysis using DADA2 pipeline resulted in amplicon sequence variant (ASV) table containing a total of ca. 5.9 million chimera-removed, high-quality reads and a median of 296,971 reads per sample. The data constitute a useful resource for the research on the microbiological aspects of axolotl limb regeneration and will also broadly facilitate comparative studies in the developmental and conservation biology of this critically endangered species.
Project description:Mangosteen (Garcinia mangostana L.) is known for its delectable taste and contains high amount of xanthones which have been reported to possess anti-cancer, anti-inflammatory and other bioactive properties. However, stage-specific regulation of mangosteen fruit ripening has never been studied in detail. We have performed a comparative transcriptomic analysis of three ripening stages (Stage 0, 2 and 6) of mangosteen. We have obtained a raw data from six libraries through Illumina HiSeq 4000. A total of ~ 40 Gb of raw data were generated. Clean reads of 650,887,650 (bp) were obtained from 656,913,570 (bp) raw reads. The raw transcriptome data were deposited to SRA database, with the BioProject accession number of PRJNA339916. These data will be beneficial for transcriptome profiling in order to study the regulation of mangosteen fruit ripening. The lack of a complete sequence database from this species impedes protein identification. These data sets provide a reference data for the exploration of novel genes or proteins to understand mangosteen fruit ripening behaviour.
Project description:High-throughput parallel sequencing is a powerful tool for the quantification of microbial diversity through the amplification of nuclear ribosomal gene regions. Recent work has extended this approach to the quantification of diversity within otherwise difficult-to-study metazoan groups. However, nuclear ribosomal genes present both analytical challenges and practical limitations that are a consequence of the mutational properties of nuclear ribosomal genes. Here we exploit useful properties of protein-coding genes for cross-species amplification and denoising of 454 flowgrams. We first use experimental mixtures of species from the class Collembola to amplify and pyrosequence the 5' region of the COI barcode, and we implement a new algorithm called PyroClean for the denoising of Roche GS FLX pyrosequences. Using parameter values from the analysis of experimental mixtures, we then analyse two communities sampled from field sites on the island of Tenerife. Cross-species amplification success of target mitochondrial sequences in experimental species mixtures is high; however, there is little relationship between template DNA concentrations and pyrosequencing read abundance. Homopolymer error correction and filtering against a consensus reference sequence reduced the volume of unique sequences to approximately 5% of the original unique raw reads. Filtering of remaining non-target sequences attributed to PCR error, sequencing error, or numts further reduced unique sequence volume to 0.8% of the original raw reads. PyroClean reduces or eliminates the need for an additional, time-consuming step to cluster reads into Operational Taxonomic Units, which facilitates the detection of intraspecific DNA sequence variation. PyroCleaned sequence data from field sites in Tenerife demonstrate the utility of our approach for quantifying evolutionary diversity and its spatial structure. Comparison of our sequence data to public databases reveals that we are able to successfully recover both interspecific and intraspecific sequence diversity.
Project description:Loggerhead sea turtle Caretta caretta is widely distributed in the oceans of tropical and subtropical latitude. This turtle is an endangered species due to anthropic and natural factors that have decreased their population levels. In this study, RNA sequencing and de-novo assembly of genes expressed in blood were performed. The raw FASTQ files have been deposited on NCBI's SRA database with accession number SRX2629512. A total of 5.4 Gb raw sequence data were obtained, corresponding to 48,257,019 raw reads. Trinity pipeline was used to perform a de-novo assembly, we were able to identify 64,930 transcripts for female loggerhead turtle transcriptome with an N50 of 1131 bp. The obtained transcriptome data will be useful for further studies of the physiology, biochemistry and evolution in this species.
Project description:BACKGROUND: Relatively recently, the software KB™ Basecaller has replaced phred for identifying the bases from raw sequence data in DNA sequencing employing dideoxy chemistry. We have measured quantitatively the consequences of that change. RESULTS: The high quality sequence segment of reads derived from the KB™ Basecaller were, on average, 30-to-50 bases longer than reads derived from phred. However, microbe identification appeared to have been unaffected by the change in software. CONCLUSIONS: We have demonstrated a modest, but statistically significant, superiority in high quality read length of the KB™ Basecaller compared to phred. We found no statistically significant difference between the numbers of microbial species identified from the sequence data.
Project description:Metagenomics projects collect DNA from uncharacterized environments that may contain thousands of species per sample. One main challenge facing metagenomic analysis is phylogenetic classification of raw sequence reads into groups representing the same or similar taxa, a prerequisite for genome assembly and for analyzing the biological diversity of a sample. New sequencing technologies have made metagenomics easier, by making sequencing faster, and more difficult, by producing shorter reads than previous technologies. Classifying sequences from reads as short as 100 base pairs has until now been relatively inaccurate, requiring researchers to use older, long-read technologies. We present Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 base pairs, a substantial improvement over previous composition-based classification methods. We also describe how combining Phymm with sequence alignment algorithms improves accuracy.
Project description:Next-generation technologies for determination of genomics and transcriptomics composition have a wide range of applications. Andrias davidianus, has become an endangered amphibian species of salamander endemic in China. However, there is a lack of the molecular information. In this study, we obtained the RNA-Seq data from a pool of A. davidianus tissue including spleen, liver, muscle, kidney, skin, testis, gut and heart using Illumina HiSeq 2500 platform. A total of 15,398,997,600 bp were obtained, corresponding to 102,659,984 raw reads. A total of 102,659,984 reads were filtered after removing low-quality reads and trimming the adapter sequences. The Trinity program was used to de novo assemble 132,912 unigenes with an average length of 690 bp and N50 of 1263 bp. Unigenes were annotated through number of databases. These transcriptomic data of A. davidianus should open the door to molecular evolution studies based on the entire transcriptome or targeted genes of interest to sequence. The raw data in this study can be available in NCBI SRA database with accession number of SRP099564.
Project description:The hawksbill sea turtle, Eretmochelys imbricata, is an endangered species of the Caribbean Colombian coast due to anthropic and natural factors that have decreased their population levels. Little is known about the genes that are involved in their immune system, sex determination, aging and others important functions. The data generated represents RNA sequencing and the first de-novo assembly of transcripts expressed in the blood of the hawksbill sea turtle. The raw FASTQ files were deposited in the NCBI SRA database with accession number SRX2653641. A total of 5.7 Gb raw sequence data were obtained, corresponding to 47,555,108 raw reads. Trinity was used to perform a first de-novo assembly, and we were able to identify 47,586 transcripts of the female hawksbill turtle transcriptome with an N50 of 1100 bp. The obtained transcriptome data will be useful for further studies of the physiology, biochemistry and evolution in this species.
Project description:The advent of next generation sequencing has coincided with a growth in interest in using these approaches to better understand the role of the structure and function of the microbial communities in human, animal, and environmental health. Yet, use of next generation sequencing to perform 16S rRNA gene sequence surveys has resulted in considerable controversy surrounding the effects of sequencing errors on downstream analyses. We analyzed 2.7×10(6) reads distributed among 90 identical mock community samples, which were collections of genomic DNA from 21 different species with known 16S rRNA gene sequences; we observed an average error rate of 0.0060. To improve this error rate, we evaluated numerous methods of identifying bad sequence reads, identifying regions within reads of poor quality, and correcting base calls and were able to reduce the overall error rate to 0.0002. Implementation of the PyroNoise algorithm provided the best combination of error rate, sequence length, and number of sequences. Perhaps more problematic than sequencing errors was the presence of chimeras generated during PCR. Because we knew the true sequences within the mock community and the chimeras they could form, we identified 8% of the raw sequence reads as chimeric. After quality filtering the raw sequences and using the Uchime chimera detection program, the overall chimera rate decreased to 1%. The chimeras that could not be detected were largely responsible for the identification of spurious operational taxonomic units (OTUs) and genus-level phylotypes. The number of spurious OTUs and phylotypes increased with sequencing effort indicating that comparison of communities should be made using an equal number of sequences. Finally, we applied our improved quality-filtering pipeline to several benchmarking studies and observed that even with our stringent data curation pipeline, biases in the data generation pipeline and batch effects were observed that could potentially confound the interpretation of microbial community data.
Project description:Polymerase chain reaction and different barcoding methods commonly used for plant identification from metagenomics samples are based on the amplification of a limited number of pre-selected barcoding regions. These methods are often inapplicable due to DNA degradation, low amplification success or low species discriminative power of selected genomic regions. Here we introduce a method for the rapid identification of plant taxon-specific k-mers, that is applicable for the fast detection of plant taxa directly from raw sequencing reads without aligning, mapping or assembling the reads. We identified more than 800 Solanum lycopersicum specific k-mers (32 nucleotides in length) from 42 different chloroplast genome regions using the developed method. We demonstrated that identified k-mers are also detectable in whole genome sequencing raw reads from S. lycopersicum. Also, we demonstrated the usability of taxon-specific k-mers in artificial mixtures of sequences from closely related species. Developed method offers a novel strategy for fast identification of taxon-specific genome regions and offers new perspectives for detection of plant taxa directly from sequencing raw reads.