Project description:One significant characteristic of Multiple Sclerosis (MS), a chronic inflammatory demyelinating disease of the central nervous system, is the evolution of highly variable patterns of white matter lesions. Based on geostatistical metrics, the MS-Lesion Pattern Discrimination Plot reduces complex three- and four-dimensional configurations of MS-White Matter Lesions to a well-arranged and standardized two-dimensional plot that facilitates follow-up, cross-sectional and medication impact analysis. Here, we present a script that generates the MS-Lesion Pattern Discrimination Plot, using the widespread statistical computing environment R. Input data to the script are Nifti-1 or Analyze-7.5 files with individual MS-White Matter Lesion masks in Montreal Normal Brain geometry. The MS-Lesion Pattern Discrimination Plot, variogram plots and associated fitting statistics are output to the R console and exported to standard graphics and text files. Besides reviewing relevant geostatistical basics and commenting on implementation details for smooth customization and extension, the paper guides through generating MS-Lesion Pattern Discrimination Plots using publicly available synthetic MS-Lesion patterns. The paper is accompanied by the R script LDPgenerator.r, a small sample data set and associated graphics for comparison.
Project description:SummaryWhen investigating gene expression profiles, determining important directed edges between genes can provide valuable insights in addition to identifying differentially expressed genes. In the subsequent functional enrichment analysis (EA), understanding how enriched pathways or genes in the pathway interact with one another can help infer the gene regulatory network (GRN), important for studying the underlying molecular mechanisms. However, packages for easy inference of the GRN based on EA are scarce. Here, we developed an R package, CBNplot, which infers the Bayesian network (BN) from gene expression data, explicitly utilizing EA results obtained from curated biological pathway databases. The core features include convenient wrapping for structure learning, visualization of the BN from EA results, comparison with reference networks, and reflection of gene-related information on the plot. As an example, we demonstrate the analysis of bladder cancer-related datasets using CBNplot, including probabilistic reasoning, which is a unique aspect of BN analysis. We display the transformability of results obtained from one dataset to another, the validity of the analysis as assessed using established knowledge and literature, and the possibility of facilitating knowledge discovery from gene expression datasets.Availability and implementationThe library, documentation and web server are available at https://github.com/noriakis/CBNplot.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:MotivationWith an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species' (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results.ResultsWe show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy. Several engineering steps (e.g., shortcuts for unsuccessful lookups, software prefetching) improve the performance even further.AvailabilityOur software xengsort is available under the MIT license at http://gitlab.com/genomeinformatics/xengsort . It is written in numba-compiled Python and comes with sample Snakemake workflows for hash table construction and dataset processing.
Project description:MotivationRecovery of metagenome-assembled genomes (MAGs) from shotgun metagenomic data is an important task for the comprehensive analysis of microbial communities from variable sources. Single binning tools differ in their ability to leverage specific aspects in MAG reconstruction, the use of ensemble binning refinement tools is often time consuming and computational demand increases with community complexity. We introduce MAGScoT, a fast, lightweight and accurate implementation for the reconstruction of highest-quality MAGs from the output of multiple genome-binning tools.ResultsMAGScoT outperforms popular bin-refinement solutions in terms of quality and quantity of MAGs as well as computation time and resource consumption.Availability and implementationMAGScoT is available via GitHub (https://github.com/ikmb/MAGScoT) and as an easy-to-use Docker container (https://hub.docker.com/repository/docker/ikmb/magscot).Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:BACKGROUND:The use of environmental DNA (eDNA) has become an increasing important tool in environmental surveys and taxonomic research. High throughput sequencing of samples from soil, water, sediment, trap alcohol or bulk samples generate large amount of barcode sequences that can be assigned to a known taxon with a reference sequence. This process can however be bioinformatic cumbersome and time consuming, especially for researchers without specialised bioinformatic training. A number of different software packages and pipelines are available, but require training in preparation of data, running of analysis and formatting results. Comparison of results produced by different packages are often difficult. RESULTS:FACEPIE is an open source script dependant on a few open source applications that provides a pipeline for rapid analysis and taxonomic assignment of environmental DNA samples. It requires an initial formatting of a reference database, using the script CaPReSe, and a configuration file and can thereafter be run to process any number of samples in succession using the same settings and references. Both configuration and executing are designed to demand as little hands on work as possible, while assuring repeatable results. CONCLUSION:The demonstration using example data from real environmental samples provides results in a time span ranging from less than 3 min to just above 15 min depending on the numbers of sequences to process. The memory usage is below 2 GB on a desktop PC. FACEPAI and CaPReSe provides a pipeline for analysing a large number of eDNA samples on common equipment, with little bioinformatic skills necessary, for subsequent ecological and taxonomical studies.
Project description:SummaryThe ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter. Instead of alignments, ntJoin uses a lightweight mapping approach based on a graph data structure generated from ordered minimizer sketches. The tool can be used in a variety of different applications, including improving a draft assembly with a reference-grade genome, a short-read assembly with a draft long-read assembly and a draft assembly with an assembly from a closely related species. When scaffolding a human short-read assembly using the reference human genome or a long-read assembly, ntJoin improves the NGA50 length 23- and 13-fold, respectively, in under 13 m, using <11 GB of RAM. Compared to existing reference-guided scaffolders, ntJoin generates highly contiguous assemblies faster and using less memory.Availability and implementationntJoin is written in C++ and Python and is freely available at https://github.com/bcgsc/ntjoin.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:In early development, the environment triggers mnemonic epigenomic programs resulting in memory and learning experiences to confer cognitive phenotypes into adulthood. To uncover how environmental stimulation impacts the epigenome and genome organization, we used the paradigm of environmental enrichment (EE) in young mice constantly receiving novel stimulation. We profiled epigenome and chromatin architecture in whole cortex and sorted neurons by deep-sequencing techniques. Specifically, we studied chromatin accessibility, gene and protein regulation, and 3D genome conformation, combined with predicted enhancer and chromatin interactions. We identified increased chromatin accessibility, transcription factor binding including CTCF-mediated insulation, differential occupancy of H3K36me3 and H3K79me2, and changes in transcriptional programs required for neuronal development. EE stimuli led to local genome re-organization by inducing increased contacts between chromosomes 7 and 17 (inter-chromosomal). Our findings support the notion that EE-induced learning and memory processes are directly associated with the epigenome and genome organization.
Project description:SummaryAliView is an alignment viewer and editor designed to meet the requirements of next-generation sequencing era phylogenetic datasets. AliView handles alignments of unlimited size in the formats most commonly used, i.e. FASTA, Phylip, Nexus, Clustal and MSF. The intuitive graphical interface makes it easy to inspect, sort, delete, merge and realign sequences as part of the manual filtering process of large datasets. AliView also works as an easy-to-use alignment editor for small as well as large datasets.Availability and implementationAliView is released as open-source software under the GNU General Public License, version 3.0 (GPLv3), and is available at GitHub (www.github.com/AliView). The program is cross-platform and extensively tested on Linux, Mac OS X and Windows systems. Downloads and help are available at http://ormbunkar.se/aliviewContactanders.larsson@ebc.uu.seSupplementary informationSupplementary data are available at Bioinformatics online.
Project description:SummaryIn recent years, major initiatives such as the International Human Epigenome Consortium have generated thousands of high-quality genome-wide datasets for a large variety of assays and cell types. This data can be used as a reference to assess whether the signal from a user-provided dataset corresponds to its expected experiment, as well as to help reveal unexpected biological associations. We have developed the epiGenomic Efficient Correlator (epiGeEC) tool to enable genome-wide comparisons of very large numbers of datasets. A public Galaxy implementation of epiGeEC allows comparison of user datasets with thousands of public datasets in a few minutes.Availability and implementationThe source code is available at https://bitbucket.org/labjacquespe/epigeec and the Galaxy implementation at http://epigeec.genap.ca.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:PurposeCommonly employed in polyp segmentation, single-image UNet architectures lack the temporal insight clinicians gain from video data in diagnosing polyps. To mirror clinical practices more faithfully, our proposed solution, PolypNextLSTM, leverages video-based deep learning, harnessing temporal information for superior segmentation performance with least parameter overhead, making it possibly suitable for edge devices.MethodsPolypNextLSTM employs a UNet-like structure with ConvNext-Tiny as its backbone, strategically omitting the last two layers to reduce parameter overhead. Our temporal fusion module, a Convolutional Long Short Term Memory (ConvLSTM), effectively exploits temporal features. Our primary novelty lies in PolypNextLSTM, which stands out as the leanest in parameters and the fastest model, surpassing the performance of five state-of-the-art image and video-based deep learning models. The evaluation of the SUN-SEG dataset spans easy-to-detect and hard-to-detect polyp scenarios, along with videos containing challenging artefacts like fast motion and occlusion.ResultsComparison against 5 image-based and 5 video-based models demonstrates PolypNextLSTM's superiority, achieving a Dice score of 0.7898 on the hard-to-detect polyp test set, surpassing image-based PraNet (0.7519) and video-based PNS+ (0.7486). Notably, our model excels in videos featuring complex artefacts such as ghosting and occlusion.ConclusionPolypNextLSTM, integrating pruned ConvNext-Tiny with ConvLSTM for temporal fusion, not only exhibits superior segmentation performance but also maintains the highest frames per speed among evaluated models. Code can be found here: https://github.com/mtec-tuhh/PolypNextLSTM .