Unknown

Dataset Information

0

Interpretable detection of novel human viruses from genome sequencing data.


ABSTRACT: Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics.

SUBMITTER: Bartoszewicz JM 

PROVIDER: S-EPMC7849996 | biostudies-literature | 2021 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Interpretable detection of novel human viruses from genome sequencing data.

Bartoszewicz Jakub M JM   Seidel Anja A   Renard Bernhard Y BY  

NAR genomics and bioinformatics 20210201 1


Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half  ...[more]

Similar Datasets

| S-EPMC9580956 | biostudies-literature
| S-EPMC7180252 | biostudies-literature
2020-05-08 | GSE149443 | GEO
| S-EPMC4811048 | biostudies-literature
| S-EPMC7509869 | biostudies-literature
| S-EPMC10516047 | biostudies-literature
| S-EPMC2725858 | biostudies-literature
| S-EPMC5630034 | biostudies-literature
| S-EPMC8769888 | biostudies-literature
| S-EPMC4810260 | biostudies-literature