Project description:Diaphorina citri (Hemiptera: Psyllidae), the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China.
Project description:<h4>Background</h4>The human microbiota are complex systems with important roles in our physiological activities and diseases. Sequencing the microbial genomes in the microbiota can help in our interpretation of their activities. The vast majority of the microbes in the microbiota cannot be isolated for individual sequencing. Current metagenomics practices use short-read sequencing to simultaneously sequence a mixture of microbial genomes. However, these results are in ambiguity during genome assembly, leading to unsatisfactory microbial genome completeness and contig continuity. Linked-read sequencing is able to remove some of these ambiguities by attaching the same barcode to the reads from a long DNA fragment (10-100?kb), thus improving metagenome assembly. However, it is not clear how the choices for several parameters in the use of linked-read sequencing affect the assembly quality.<h4>Results</h4>We first examined the effects of read depth (C) on metagenome assembly from linked-reads in simulated data and a mock community. The results showed that C positively correlated with the length of assembled sequences but had little effect on their qualities. The latter observation was corroborated by tests using real data from the human gut microbiome, where C demonstrated minor impact on the sequence quality as well as on the proportion of bins annotated as draft genomes. On the other hand, metagenome assembly quality was susceptible to read depth per fragment (C<sub>R</sub>) and DNA fragment physical depth (C<sub>F</sub>). For the same C, deeper C<sub>R</sub> resulted in more draft genomes while deeper C<sub>F</sub> improved the quality of the draft genomes. We also found that average fragment length (?<sub>FL</sub>) had marginal effect on assemblies, while fragments per partition (N<sub>F/P</sub>) impacted the off-target reads involved in local assembly, namely, lower N<sub>F/P</sub> values would lead to better assemblies by reducing the ambiguities of the off-target reads. In general, the use of linked-reads improved the assembly for contig N50 when compared to Illumina short-reads, but not when compared to PacBio CCS (circular consensus sequencing) long-reads.<h4>Conclusions</h4>We investigated the influence of linked-read sequencing parameters on metagenome assembly comprehensively. While the quality of genome assembly from linked-reads cannot rival that from PacBio CCS long-reads, the case for using linked-read sequencing remains persuasive due to its low cost and high base-quality. Our study revealed that the probable best practice in using linked-reads for metagenome assembly was to merge the linked-reads from multiple libraries, where each had sufficient C<sub>R</sub> but a smaller amount of input DNA. Video Abstract.
Project description:Purpose:We develop an accessible and reliable RNA sequencing (RNA-seq) transcriptome database of healthy human eye tissues and a matching reactive web application to query gene expression in eye and body tissues. Methods:We downloaded the raw sequence data for 1375 RNA-seq samples across 54 tissues in the Genotype-Tissue Expression (GTEx) project as a noneye reference set. We then queried several public repositories to find all healthy, nonperturbed, human eye-related tissue RNA-seq samples. The 916 eye and 1375 GTEx samples were sent into a Snakemake-based reproducible pipeline we wrote to quantify all known transcripts and genes, removes samples with poor sequence quality and mislabels, normalizes expression values across each tissue, perform 882 differential expression tests, calculate GO term enrichment, and output all as a single SQLite database file: the Eye in a Disk (EiaD) dataset. Furthermore, we rewrote the web application eyeIntegration (available in the public domain at https://eyeIntegration.nei.nih.gov) to display EiaD. Results:The new eyeIntegration portal provides quick visualization of human eye-related transcriptomes published to date by database version, gene/transcript, 19 eye tissues, and 54 body tissues. As a test of the value of this unified pan-eye dataset, we showed that fetal and organoid retina are highly similar at a pan-transcriptome level, but display distinct differences in certain pathways and gene families, such as protocadherin and HOXB family members. Conclusions:The eyeIntegration v1.0 web app serves the pan-human eye and body transcriptome dataset, EiaD. This offers the eye community a powerful and quick means to test hypotheses on human gene and transcript expression across 54 body and 19 eye tissues.
Project description:BACKGROUND: Variation of microorganism communities in the rumen of cattle (Bos taurus) is of great interest because of possible links to economically or environmentally important traits, such as feed conversion efficiency or methane emission levels. The resolution of studies investigating this variation may be improved by utilizing untargeted massively parallel sequencing (MPS), that is, sequencing without targeted amplification of genes. The objective of this study was to develop a method which used MPS to generate "rumen metagenome profiles", and to investigate if these profiles were repeatable among samples taken from the same cow. Given faecal samples are much easier to obtain than rumen fluid samples; we also investigated whether rumen metagenome profiles were predictive of faecal metagenome profiles. RESULTS: Rather than focusing on individual organisms within the rumen, our method used MPS data to generate quantitative rumen micro-biome profiles, regardless of taxonomic classifications. The method requires a previously assembled reference metagenome. A number of such reference metagenomes were considered, including two rumen derived metagenomes, a human faecal microflora metagenome and a reference metagenome made up of publically available prokaryote sequences. Sequence reads from each test sample were aligned to these references. The "rumen metagenome profile" was generated from the number of the reads that aligned to each contig in the database. We used this method to test the hypothesis that rumen fluid microbial community profiles vary more between cows than within multiple samples from the same cow. Rumen fluid samples were taken from three cows, at three locations within the rumen. DNA from the samples was sequenced on the Illumina GAIIx. When the reads were aligned to a rumen metagenome reference, the rumen metagenome profiles were repeatable (P?<?0.00001) by cow regardless of location of sampling rumen fluid. The repeatability was estimated at 9%, albeit with a high standard error, reflecting the small number of animals in the study. Finally, we compared rumen microbial profiles to faecal microbial profiles. Our hypothesis, that there would be a stronger correlation between faeces and rumen fluid from the same cow than between faeces and rumen fluid from different cows, was not supported by our data (with much greater significance of rumen versus faeces effect than animal effect in mixed linear model). CONCLUSIONS: We have presented a simple and high throughput method of metagenome profiling to assess the similarity of whole metagenomes, and illustrated its use on two novel datasets. This method utilises widely used freeware. The method should be useful in the exploration and comparison of metagenomes.
Project description:BACKGROUND: Metagenomics, based on culture-independent sequencing, is a well-fitted approach to provide insights into the composition, structure and dynamics of environmental viral communities. Following recent advances in sequencing technologies, new challenges arise for existing bioinformatic tools dedicated to viral metagenome (i.e. virome) analysis as (i) the number of viromes is rapidly growing and (ii) large genomic fragments can now be obtained by assembling the huge amount of sequence data generated for each metagenome. RESULTS: To face these challenges, a new version of Metavir was developed. First, all Metavir tools have been adapted to support comparative analysis of viromes in order to improve the analysis of multiple datasets. In addition to the sequence comparison previously provided, viromes can now be compared through their k-mer frequencies, their taxonomic compositions, recruitment plots and phylogenetic trees containing sequences from different datasets. Second, a new section has been specifically designed to handle assembled viromes made of thousands of large genomic fragments (i.e. contigs). This section includes an annotation pipeline for uploaded viral contigs (gene prediction, similarity search against reference viral genomes and protein domains) and an extensive comparison between contigs and reference genomes. Contigs and their annotations can be explored on the website through specifically developed dynamic genomic maps and interactive networks. CONCLUSIONS: The new features of Metavir 2 allow users to explore and analyze viromes composed of raw reads or assembled fragments through a set of adapted tools and a user-friendly interface.
Project description:An important step in 'metagenomics' analysis is the assembly of multiple genomes from mixed sequence reads of multiple species in a microbial community. Most conventional pipelines use a single-genome assembler with carefully optimized parameters. A limitation of a single-genome assembler for de novo metagenome assembly is that sequences of highly abundant species are likely misidentified as repeats in a single genome, resulting in a number of small fragmented scaffolds. We extended a single-genome assembler for short reads, known as 'Velvet', to metagenome assembly, which we called 'MetaVelvet', for mixed short reads of multiple species. Our fundamental concept was to first decompose a de Bruijn graph constructed from mixed short reads into individual sub-graphs, and second, to build scaffolds based on each decomposed de Bruijn sub-graph as an isolate species genome. We made use of two features, the coverage (abundance) difference and graph connectivity, for the decomposition of the de Bruijn graph. For simulated datasets, MetaVelvet succeeded in generating significantly higher N50 scores than any single-genome assemblers. MetaVelvet also reconstructed relatively low-coverage genome sequences as scaffolds. On real datasets of human gut microbial read data, MetaVelvet produced longer scaffolds and increased the number of predicted genes.