Project description:ObjectivesThe black rhinoceros (Diceros bicornis) is an endangered mammal for which a captive breeding program is part of the conservation effort. Black rhinos in zoo's often suffer from chronic infections and heamochromatosis. Furthermore, breeding is hampered by low male fertility. To aid a research project studying these topics, we sequenced and assembled the genome of a captive male black rhino using ONT sequencing data only.Data descriptionThis work produced over 100 Gb whole genome sequencing reads from whole blood. These were assembled into a 2.47 Gb draft genome consisting of 834 contigs with an N50 of 29.53 Mb. The genome annotation was lifted over from an available genome annotation for black rhino, which resulted in the retrieval of over 99% of gene features. This new genome assembly will be a valuable resource in for conservation genetic research in this species.
Project description:BackgroundDetection of genomic inversions remains challenging. Many existing methods primarily target inzversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored.ResultWe present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We benchmark npInv with other tools in both simulation and real data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm the presence of two of these novel inversions. We show that there is a near linear relationship between the length of flanking IR and the minimum inversion size, without inverted repeats.ConclusionThe application of npInv shows high accuracy in both simulation and real data. The results give deeper insight into understanding inversion.
Project description:Hierarchical genotyping approaches can provide insights into the source, geography and temporal distribution of bacterial pathogens. Multiple hierarchical SNP genotyping schemes have previously been developed so that new isolates can rapidly be placed within pre-computed population structures, without the need to rebuild phylogenetic trees for the entire dataset. This classification approach has, however, seen limited uptake in routine public health settings due to analytical complexity and the lack of standardized tools that provide clear and easy ways to interpret results. The BioHansel tool was developed to provide an organism-agnostic tool for hierarchical SNP-based genotyping. The tool identifies split k-mers that distinguish predefined lineages in whole genome sequencing (WGS) data using SNP-based genotyping schemes. BioHansel uses the Aho-Corasick algorithm to type isolates from assembled genomes or raw read sequence data in a matter of seconds, with limited computational resources. This makes BioHansel ideal for use by public health agencies that rely on WGS methods for surveillance of bacterial pathogens. Genotyping results are evaluated using a quality assurance module which identifies problematic samples, such as low-quality or contaminated datasets. Using existing hierarchical SNP schemes for Mycobacterium tuberculosis and Salmonella Typhi, we compare the genotyping results obtained with the k-mer-based tools BioHansel and SKA, with those of the organism-specific tools TBProfiler and genotyphi, which use gold-standard reference-mapping approaches. We show that the genotyping results are fully concordant across these different methods, and that the k-mer-based tools are significantly faster. We also test the ability of the BioHansel quality assurance module to detect intra-lineage contamination and demonstrate that it is effective, even in populations with low genetic diversity. We demonstrate the scalability of the tool using a dataset of ~8100 S. Typhi public genomes and provide the aggregated results of geographical distributions as part of the tool's output. BioHansel is an open source Python 3 application available on PyPI and Conda repositories and as a Galaxy tool from the public Galaxy Toolshed. In a public health context, BioHansel enables rapid and high-resolution classification of bacterial pathogens with low genetic diversity.
Project description:We performed genotyping of Neuroblastoma Primary tumors using Illumina HumanHap 550 - v1,v3,v3duo and 610 Quad genotyping beadchips.
Project description:Pharmacogenomics is a field of personalized medicine that aims to tailor drug dosing based on the genetics of an individual. The polymorphic and complex CYP2D6 gene is important to analyze because of its role in the metabolism of approximately a quarter of all drugs. Several bioinformatic tools have been developed to genotype CYP2D6 from short-read sequencing data. Among these, Cyrius, a tool specifically designed for CYP2D6 genotyping, has demonstrated high performance across various datasets. However, Cyrius has not been updated in the past 3 years, during which dozens of new star alleles have been identified and some previously defined ones revised. In this work, we simulated all known CYP2D6 haplotypes to assess the ability of Cyrius to identify them. In that dataset, Cyrius was unable to call or misidentified 50 of 360 samples. Given the importance of providing an up-to-date tool, particularly in clinical settings, we present an upgraded version of the tool, named BCyrius, which includes all the missing star alleles as well as revisions to the previously listed ones. BCyrius successfully identified 100% of the currently defined minor star alleles, higher than Cyrius (85.6%) and the two other tested tools, Aldy and StellarPGx, which identified 92.2% and 87.8%, respectively. BCyrius also demonstrated slightly improved performance on a dataset of real biological samples, resulting in a higher call rate while maintaining similar accuracy with Cyrius. In addition to providing genotyping results, BCyrius also reports the predicted phenotype, along with information for each detected haplotype, including population frequencies.
Project description:We developed ONT-cappable-seq, a specialized long-read RNA sequencing technique that allows end-to-end sequencing of primary prokaryotic transcripts using the Nanopore sequencing platform. We applied ONT-cappable-seq to study the transcriptional landscape of Pseudomonas aeruginosa phage LUZ7, leading to a comprehensive genome-wide map of viral transcription start sites, terminators and complex operon structures that fine-regulate gene expression. At the same time, it provides new insights in the RNA biology of LUZ7 and paves the way for more in depth transcription studies that can help unveil the complex layers of phage-host interactions.
Project description:A streaming assembly pipeline utilising real-time Oxford Nanopore Technology (ONT) sequencing data is important for saving sequencing resources and reducing time-to-result. A previous approach implemented in npScarf provided an efficient streaming algorithm for hybrid assembly but was relatively prone to mis-assemblies compared to other graph-based methods. Here we present npGraph, a streaming hybrid assembly tool using the assembly graph instead of the separated pre-assembly contigs. It is able to produce more complete genome assembly by resolving the path finding problem on the assembly graph using long reads as the traversing guide. Application to synthetic and real data from bacterial isolate genomes show improved accuracy while still maintaining a low computational cost. npGraph also provides a graphical user interface (GUI) which provides a real-time visualisation of the progress of assembly. The tool and source code is available at https://github.com/hsnguyen/assembly.