Unknown

Dataset Information

0

Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning.


ABSTRACT: Metagenome binning is a key step, downstream of metagenome assembly, to group scaffolds by their genome of origin. Although accurate binning has been achieved on datasets containing multiple samples from the same community, the completeness of binning is often low in datasets with a small number of samples due to a lack of robust species co-abundance information. In this study, we exploited the chromatin conformation information obtained from Hi-C sequencing and developed a new reference-independent algorithm, Metagenome Binning with Abundance and Tetra-nucleotide frequencies-Long Range (metaBAT-LR), to improve the binning completeness of these datasets. This self-supervised algorithm builds a model from a set of high-quality genome bins to predict scaffold pairs that are likely to be derived from the same genome. Then, it applies these predictions to merge incomplete genome bins, as well as recruit unbinned scaffolds. We validated metaBAT-LR's ability to bin-merge and recruit scaffolds on both synthetic and real-world metagenome datasets of varying complexity. Benchmarking against similar software tools suggests that metaBAT-LR uncovers unique bins that were missed by all other methods. MetaBAT-LR is open-source and is available at https://bitbucket.org/project-metabat/metabat-lr.

SUBMITTER: Ho H 

PROVIDER: S-EPMC10519199 | biostudies-literature | 2023

REPOSITORIES: biostudies-literature

altmetric image

Publications

Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning.

Ho Harrison H   Chovatia Mansi M   Egan Rob R   He Guifen G   Yoshinaga Yuko Y   Liachko Ivan I   O'Malley Ronan R   Wang Zhong Z  

PeerJ 20230922


Metagenome binning is a key step, downstream of metagenome assembly, to group scaffolds by their genome of origin. Although accurate binning has been achieved on datasets containing multiple samples from the same community, the completeness of binning is often low in datasets with a small number of samples due to a lack of robust species co-abundance information. In this study, we exploited the chromatin conformation information obtained from Hi-C sequencing and developed a new reference-indepen  ...[more]

Similar Datasets

| S-EPMC6821242 | biostudies-literature
| S-EPMC6881972 | biostudies-literature
| S-EPMC7397036 | biostudies-literature
| S-EPMC8184636 | biostudies-literature
| S-EPMC3970055 | biostudies-other
| S-EPMC10849467 | biostudies-literature
| S-EPMC11383967 | biostudies-literature
| S-EPMC2383919 | biostudies-literature
| S-EPMC11346922 | biostudies-literature
2023-11-01 | GSE244807 | GEO