Project description:The soil represents the main source of novel biocatalysts and biomolecules of industrial relevance. We searched for hydrolases in silico in four shotgun metagenomes (4,079,223 sequences) obtained in a 13-year field trial carried out in southern Brazil, under the no-tillage (NT), or conventional tillage (CT) managements, with crop succession (CS, soybean/wheat), or crop rotation (CR, soybean/maize/wheat/lupine/oat). We identified 42,631 hydrolases belonging to five classes by comparing with the KEGG database, and 44,928 sequences by comparing with the NCBI-NR database. The abundance followed the order: lipases>laccases>cellulases>proteases>amylases>pectinases. Statistically significant differences were attributed to the tillage system, with the NT showing about five times more hydrolases than the CT system. The outstanding differences can be attributed to the management of crop residues, left on the soil surface in the NT, and mechanically broken and incorporated into the soil in the CT. Differences between the CS and the CR were slighter, 10% higher for the CS, but not statistically different. Most of the sequences belonged to fungi (Verticillium, and Colletotrichum for lipases and laccases, and Aspergillus for proteases), and to the archaea Sulfolobus acidocaldarius for amylases. Our results indicate that agricultural soils under conservative managements may represent a hotspot for bioprospection of hydrolases.
Project description:The soil ecosystem is critical for human health, affecting aspects of the environment from key agricultural and edaphic parameters to critical influence on climate change. Soil has more unknown biodiversity than any other ecosystem. We have applied diverse DNA extraction methods coupled with high throughput pyrosequencing to explore 4.88 × 10(9)?bp of metagenomic sequence data from the longest continually studied soil environment (Park Grass experiment at Rothamsted Research in the UK). Results emphasize important DNA extraction biases and unexpectedly low seasonal and vertical soil metagenomic functional class variations. Clustering-based subsystems and carbohydrate metabolism had the largest quantity of annotated reads assigned although <50% of reads were assigned at an E value cutoff of 10(-5). In addition, with the more detailed subsystems, cAMP signaling in bacteria (3.24±0.27% of the annotated reads) and the Ton and Tol transport systems (1.69±0.11%) were relatively highly represented. The most highly represented genome from the database was that for a Bradyrhizobium species. The metagenomic variance created by integrating natural and methodological fluctuations represents a global picture of the Rothamsted soil metagenome that can be used for specific questions and future inter-environmental metagenomic comparisons. However, only 1% of annotated sequences correspond to already sequenced genomes at 96% similarity and E values of <10(-5), thus, considerable genomic reconstructions efforts still have to be performed.
Project description:We present raw sequence reads and genome assemblies derived from 17 accessions of the Ethiopian orphan crop plant enset (Ensete ventricosum (Welw.) Cheesman) using the Illumina HiSeq and MiSeq platforms. Also presented is a catalogue of single-nucleotide polymorphisms inferred from the sequence data at an average density of approximately one per kilobase of genomic DNA.
Project description:Guinea grass (Panicum maximum Jacq), an important fodder crop of humid and sub-humid tropical regions, reproduces through apomixis, a method of clonal propagation through seeds. Lack of knowledge of the genetic and molecular control of this phenomena has hindered the genetic improvement of this crop. The dataset provided here represents the first RNA-Seq based assembly and analysis of florets at pre-meiotic stage from the apomictic and sexual genotypes of guinea grass. The raw sequence files in FASTQ format were deposited in the NCBI SRA database with accession number SRP115883. A total of 24.8?Gb raw sequence data, corresponding to 17,96,65,827 raw reads was obtained by paired end sequencing. We used Trinity for de-novo assembly and identified 57,647 transcripts in sexual and 49,093 transcripts in apomictic type. This transcriptome data will be useful for identification and comparative analysis of genes regulating the mode of reproduction in grasses.
Project description:BACKGROUND:Discovering single nucleotide polymorphisms (SNPs) from agriculture crop genome sequences has been a widely used strategy for developing genetic markers for several applications including marker-assisted breeding, population diversity studies for eco-geographical adaption, genotyping crop germplasm collections, and others. Accurately detecting SNPs from large polyploid crop genomes such as wheat is crucial and challenging. A few variant calling methods have been previously developed but they show a low concordance between their variant calls. A gold standard of variant sets generated from one human individual sample was established for variant calling tool evaluations, however hitherto no gold standard of crop variant set is available for wheat use. The intent of this study was to evaluate seven SNP variant calling tools (FreeBayes, GATK, Platypus, Samtools/mpileup, SNVer, VarScan, VarDict) with the two most popular mapping tools (BWA-mem and Bowtie2) on wheat whole exome capture (WEC) re-sequencing data from allohexaploid wheat. RESULTS:We found the BWA-mem mapping tool had both a higher mapping rate and a higher accuracy rate than Bowtie2. With the same mapping quality (MQ) cutoff, BWA-mem detected more variant bases in mapping reads than Bowtie2. The reads preprocessed with quality trimming or duplicate removal did not significantly affect the final mapping performance in terms of mapped reads. Based on the concordance and receiver operating characteristic (ROC), the Samtools/mpileup variant calling tool with BWA-mem mapping of raw sequence reads outperformed other tests followed by FreeBayes and GATK in terms of specificity and sensitivity. VarDict and VarScan were the poorest performing variant calling tools with the wheat WEC sequence data. CONCLUSION:The BWA-mem and Samtools/mpileup pipeline, with no need to preprocess the raw read data before mapping onto the reference genome, was ascertained the optimum for SNP calling for the complex wheat genome re-sequencing. These results also provide useful guidelines for reliable variant identification from deep sequencing of other large polyploid crop genomes.
Project description:Diaphorina citri (Hemiptera: Psyllidae), the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China.
Project description:Microorganisms are useful environmental indicators, able to deliver essential insights to processes regarding mine land rehabilitation. To compare microbial communities from a chronosequence of mine land rehabilitation to pre-disturbance levels from references sites covered by native vegetation, we sampled non-rehabilitated, rehabilitating and reference study sites from the Urucum Massif, Southwestern Brazil. From each study site, three composed soil samples were collected for chemical, physical, and metagenomics analysis. We used a paired-end library sequencing technology (NextSeq 500 Illumina); the reads were assembled using MEGAHIT. Coding DNA sequences (CDS) were identified using Kaiju in combination with non-redundant NCBI BLAST reference sequences containing archaea, bacteria, and viruses. Additionally, a functional classification was performed by EMG v2.3.2. Here, we provide the raw data and assembly (reads and contigs), followed by initial functional and taxonomic analysis, as a base-line for further studies of this kind. Further investigation is needed to fully understand the mechanisms of environmental rehabilitation in tropical regions, inspiring further researchers to explore this collection for hypothesis testing.