Dataset Information

HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization.

ABSTRACT:

Motivation

Lacking strict proofreading mechanisms, many RNA viruses can generate progeny with slightly changed genomes. Being able to characterize highly similar genomes (i.e. haplotypes) in one virus population helps study the viruses' evolution and their interactions with the host/other microbes. High-throughput sequencing data has become the major source for characterizing viral populations. However, the inherent limitation on read length by next-generation sequencing makes complete haplotype reconstruction difficult.

Results

In this work, we present a new tool named HaploDMF that can construct complete haplotypes using third-generation sequencing (TGS) data. HaploDMF utilizes a deep matrix factorization model with an adapted loss function to learn latent features from aligned reads automatically. The latent features are then used to cluster reads of the same haplotype. Unlike existing tools whose performance can be affected by the overlap size between reads, HaploDMF is able to achieve highly robust performance on data with different coverage, haplotype number and error rates. In particular, it can generate more complete haplotypes even when the sequencing coverage drops in the middle. We benchmark HaploDMF against the state-of-the-art tools on simulated and real sequencing TGS data on different viruses. The results show that HaploDMF competes favorably against all others.

Availability and implementation

The source code and the documentation of HaploDMF are available at https://github.com/dhcai21/HaploDMF.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Cai D

PROVIDER: S-EPMC9750122 | biostudies-literature | 2022 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization.

Cai Dehan D Shang Jiayu J Sun Yanni Y

Bioinformatics (Oxford, England) 20221201 24

<h4>Motivation</h4>Lacking strict proofreading mechanisms, many RNA viruses can generate progeny with slightly changed genomes. Being able to characterize highly similar genomes (i.e. haplotypes) in one virus population helps study the viruses' evolution and their interactions with the host/other microbes. High-throughput sequencing data has become the major source for characterizing viral populations. However, the inherent limitation on read length by next-generation sequencing makes complete h ...[more]

PMID: 36308467

Dataset Information

HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization.

Motivation

Results

Availability and implementation

Supplementary information

Publications

HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Strainline: full-length de novo viral haplotype reconstruction from noisy long reads.
| S-EPMC8771625 | biostudies-literature

Viral quasispecies reconstruction via tensor factorization with successive read removal.
| S-EPMC6022648 | biostudies-literature

HAPHPIPE: Haplotype Reconstruction and Phylodynamics for Deep Sequencing of Intrahost Viral Populations.
| S-EPMC8042772 | biostudies-literature

Haplotype-aware diplotyping from noisy long reads.
| S-EPMC6547545 | biostudies-literature

Haplotype threading: accurate polyploid phasing from long reads.
| S-EPMC7504856 | biostudies-literature

Haplotype-Phased Synthetic Long Reads from Short-Read Sequencing.
| S-EPMC4720449 | biostudies-literature

Knowledge-based gene expression classification via matrix factorization.
| S-EPMC2638868 | biostudies-literature

Detecting haplotype-specific transcript variation in long reads with FLAIR2.
| S-EPMC11218413 | biostudies-literature

Detecting haplotype-specific transcript variation in long reads with FLAIR2.
| S-EPMC10312636 | biostudies-literature

Efficient epistasis inference via higher-order covariance matrix factorization.
| S-EPMC11507688 | biostudies-literature