Dataset Information

Interpretable detection of novel human viruses from genome sequencing data.

ABSTRACT: Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics.

SUBMITTER: Bartoszewicz JM

PROVIDER: S-EPMC7849996 | biostudies-literature | 2021 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Interpretable detection of novel human viruses from genome sequencing data.

Bartoszewicz Jakub M JM Seidel Anja A Renard Bernhard Y BY

NAR genomics and bioinformatics 20210201 1

Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half ...[more]

PMID: 33554119

Dataset Information

Interpretable detection of novel human viruses from genome sequencing data.

Publications

Interpretable detection of novel human viruses from genome sequencing data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

VirHunter: A Deep Learning-Based Method for Detection of Novel RNA Viruses in Plant Sequencing Data.
| S-EPMC9580956 | biostudies-literature

Nanopore Sequencing Reveals Novel Targets for Detection and Surveillance of Human and Avian Influenza A Viruses.
| S-EPMC7180252 | biostudies-literature

Benchmarking of viruses detection from single-cell RNA sequencing data
2020-05-08 | GSE149443 | GEO

Using Small RNA Deep Sequencing Data to Detect Human Viruses.
| S-EPMC4811048 | biostudies-literature

A novel algorithm comprehensively characterizes human RH genes using whole-genome sequencing data.
| S-EPMC7509869 | biostudies-literature

Biophysically Interpretable Inference of Cell Types from Multimodal Sequencing Data.
| S-EPMC10516047 | biostudies-literature

Rapid genome sequencing of RNA viruses.
| S-EPMC2725858 | biostudies-literature

Detection of structural mosaicism from targeted and whole-genome sequencing data.
| S-EPMC5630034 | biostudies-literature

Detection of minor variants in Mycobacterium tuberculosis whole genome sequencing data.
| S-EPMC8769888 | biostudies-literature

Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses.
| S-EPMC4810260 | biostudies-literature