Project description:ObjectivesThe black rhinoceros (Diceros bicornis) is an endangered mammal for which a captive breeding program is part of the conservation effort. Black rhinos in zoo's often suffer from chronic infections and heamochromatosis. Furthermore, breeding is hampered by low male fertility. To aid a research project studying these topics, we sequenced and assembled the genome of a captive male black rhino using ONT sequencing data only.Data descriptionThis work produced over 100 Gb whole genome sequencing reads from whole blood. These were assembled into a 2.47 Gb draft genome consisting of 834 contigs with an N50 of 29.53 Mb. The genome annotation was lifted over from an available genome annotation for black rhino, which resulted in the retrieval of over 99% of gene features. This new genome assembly will be a valuable resource in for conservation genetic research in this species.
Project description:BackgroundDetection of genomic inversions remains challenging. Many existing methods primarily target inzversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored.ResultWe present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We benchmark npInv with other tools in both simulation and real data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm the presence of two of these novel inversions. We show that there is a near linear relationship between the length of flanking IR and the minimum inversion size, without inverted repeats.ConclusionThe application of npInv shows high accuracy in both simulation and real data. The results give deeper insight into understanding inversion.
Project description:Hierarchical genotyping approaches can provide insights into the source, geography and temporal distribution of bacterial pathogens. Multiple hierarchical SNP genotyping schemes have previously been developed so that new isolates can rapidly be placed within pre-computed population structures, without the need to rebuild phylogenetic trees for the entire dataset. This classification approach has, however, seen limited uptake in routine public health settings due to analytical complexity and the lack of standardized tools that provide clear and easy ways to interpret results. The BioHansel tool was developed to provide an organism-agnostic tool for hierarchical SNP-based genotyping. The tool identifies split k-mers that distinguish predefined lineages in whole genome sequencing (WGS) data using SNP-based genotyping schemes. BioHansel uses the Aho-Corasick algorithm to type isolates from assembled genomes or raw read sequence data in a matter of seconds, with limited computational resources. This makes BioHansel ideal for use by public health agencies that rely on WGS methods for surveillance of bacterial pathogens. Genotyping results are evaluated using a quality assurance module which identifies problematic samples, such as low-quality or contaminated datasets. Using existing hierarchical SNP schemes for Mycobacterium tuberculosis and Salmonella Typhi, we compare the genotyping results obtained with the k-mer-based tools BioHansel and SKA, with those of the organism-specific tools TBProfiler and genotyphi, which use gold-standard reference-mapping approaches. We show that the genotyping results are fully concordant across these different methods, and that the k-mer-based tools are significantly faster. We also test the ability of the BioHansel quality assurance module to detect intra-lineage contamination and demonstrate that it is effective, even in populations with low genetic diversity. We demonstrate the scalability of the tool using a dataset of ~8100 S. Typhi public genomes and provide the aggregated results of geographical distributions as part of the tool's output. BioHansel is an open source Python 3 application available on PyPI and Conda repositories and as a Galaxy tool from the public Galaxy Toolshed. In a public health context, BioHansel enables rapid and high-resolution classification of bacterial pathogens with low genetic diversity.
Project description:We performed genotyping of Neuroblastoma Primary tumors using Illumina HumanHap 550 - v1,v3,v3duo and 610 Quad genotyping beadchips.
Project description:Pharmacogenomics is a field of personalized medicine that aims to tailor drug dosing based on the genetics of an individual. The polymorphic and complex CYP2D6 gene is important to analyze because of its role in the metabolism of approximately a quarter of all drugs. Several bioinformatic tools have been developed to genotype CYP2D6 from short-read sequencing data. Among these, Cyrius, a tool specifically designed for CYP2D6 genotyping, has demonstrated high performance across various datasets. However, Cyrius has not been updated in the past 3 years, during which dozens of new star alleles have been identified and some previously defined ones revised. In this work, we simulated all known CYP2D6 haplotypes to assess the ability of Cyrius to identify them. In that dataset, Cyrius was unable to call or misidentified 50 of 360 samples. Given the importance of providing an up-to-date tool, particularly in clinical settings, we present an upgraded version of the tool, named BCyrius, which includes all the missing star alleles as well as revisions to the previously listed ones. BCyrius successfully identified 100% of the currently defined minor star alleles, higher than Cyrius (85.6%) and the two other tested tools, Aldy and StellarPGx, which identified 92.2% and 87.8%, respectively. BCyrius also demonstrated slightly improved performance on a dataset of real biological samples, resulting in a higher call rate while maintaining similar accuracy with Cyrius. In addition to providing genotyping results, BCyrius also reports the predicted phenotype, along with information for each detected haplotype, including population frequencies.
Project description:Data from the VLA lyssavirus genotyping microarray. The array platform for this data is GEO accession GPL8066, and consists of 624 oligos representing two viral families. The data set itself consists of 14 arrays, 7 hybridised with RNA from mice brains infected with 7 genotypes of lyssaviruses, 1 hybridised with RNA from normal mouse brain, and 6 hybridised with RNA from coded samples consisting of infected mouse brains or control mouse brains. Keywords: Lyssavirus genotyping microarray