Project description:Next-Generation-Sequencing (NGS) technologies have led to important improvement in the detection of new or unrecognized infective agents, related to infectious diseases. In this context, NGS high-throughput technology can be used to achieve a comprehensive and unbiased sequencing of the nucleic acids present in a clinical sample (i.e. tissues). Metagenomic shotgun sequencing has emerged as powerful high-throughput approaches to analyze and survey microbial composition in the field of infectious diseases. By directly sequencing millions of nucleic acid molecules in a sample and matching the sequences to those available in databases, pathogens of an infectious disease can be inferred. Despite the large amount of metagenomic shotgun data produced, there is a lack of a comprehensive and easy-use pipeline for data analysis that avoid annoying and complicated bioinformatics steps. Here we present HOME-BIO, a modular and exhaustive pipeline for analysis of biological entity estimation, specific designed for shotgun sequenced clinical samples. HOME-BIO analysis provides comprehensive taxonomy classification by querying different source database and carry out main steps in metagenomic investigation. HOME-BIO is a powerful tool in the hand of biologist without computational experience, which are focused on metagenomic analysis. Its easy-to-use intrinsic characteristic allows users to simply import raw sequenced reads file and obtain taxonomy profile of their samples.
Project description:Sequence-based deep learning models have become the state of the art for the analysis of the genomic regulatory code. Particularly for transcriptional enhancers, deep learning models excel at deciphering sequence features and grammar that underlie their spatiotemporal activity. To enable end-to-end enhancer modeling and design, we developed a software and modeling package, called CREsted. It combines preprocessing starting from single-cell ATAC-seq data; modeling with a choice of several architectures for training classification and regression models on either topics or pseudobulk peak heights; sequence design using multiple strategies; and downstream analysis through a collection of tools to locate transcription factor (TF) binding sites, infer the effect of a TF (activating or repressing) on enhancer accessibility, decipher enhancer grammar, and score gene loci. We demonstrate CREsted using a mouse cortex model that we validate using the BICCN collection of in vivo validated mouse brain enhancers. Classical enhancers in immune cells, including the IFN-β enhanceosome are revisited using a PBMC model, and we assess the accuracy of TF binding site predictions with ChIP-seq. Additionally, we use CREsted to compare mesenchymal-like cancer cell states between tumor types; and we investigate different fine-tuning strategies of Borzoi within CREsted, comparing their performance and explainability with CREsted models trained from scratch. Finally, we train a CREsted model on a scATAC-seq atlas of zebrafish development, and use this to design and in vivo validate cell type-specific synthetic enhancers in 3 tissues. For varying datasets we demonstrate that CREsted facilitates efficient training and analyses, enabling scrutinization of the enhancer logic and design of synthetic enhancers across tissues and species. CREsted is available at https://crested.readthedocs.io.
Project description:Next-Generation-Sequencing (NGS) technologies have led to important improvement in the detection of new or unrecognized infective agents, related to infectious diseases. In this context, NGS high-throughput technology can be used to achieve a comprehensive and unbiased sequencing of the nucleic acids present in a clinical sample (i.e. tissues). Metagenomic shotgun sequencing has emerged as powerful high-throughput approaches to analyze and survey microbial composition in the field of infectious diseases. By directly sequencing millions of nucleic acid molecules in a sample and matching the sequences to those available in databases, pathogens of an infectious disease can be inferred. Despite the large amount of metagenomic shotgun data produced, there is a lack of a comprehensive and easy-use pipeline for data analysis that avoid annoying and complicated bioinformatics steps. Here we present HOME-BIO, a modular and exhaustive pipeline for analysis of biological entity estimation, specific designed for shotgun sequenced clinical samples. HOME-BIO analysis provides comprehensive taxonomy classification by querying different source database and carry out main steps in metagenomic investigation. HOME-BIO is a powerful tool in the hand of biologist without computational experience, which are focused on metagenomic analysis. Its easy-to-use intrinsic characteristic allows users to simply import raw sequenced reads file and obtain taxonomy profile of their samples.