Unknown

Dataset Information

0

MCRL: using a reference library to compress a metagenome into a non-redundant list of sequences, considering viruses as a case study.


ABSTRACT:

Motivation

Metagenomes offer a glimpse into the total genomic diversity contained within a sample. Currently, however, there is no straightforward way to obtain a non-redundant list of all putative homologs of a set of reference sequences present in a metagenome.

Results

To address this problem, we developed a novel clustering approach called 'metagenomic clustering by reference library' (MCRL), where a reference library containing a set of reference genes is clustered with respect to an assembled metagenome. According to our proposed approach, reference genes homologous to similar sets of metagenomic sequences, termed 'signatures', are iteratively clustered in a greedy fashion, retaining at each step the reference genes yielding the lowest E values, and terminating when signatures of remaining reference genes have a minimal overlap. The outcome of this computation is a non-redundant list of reference genes homologous to minimally overlapping sets of contigs, representing potential candidates for gene families present in the metagenome. Unlike metagenomic clustering methods, there is no need for contigs to overlap to be associated with a cluster, enabling MCRL to draw on more information encoded in the metagenome when computing tentative gene families. We demonstrate how MCRL can be used to extract candidate viral gene families from an oral metagenome and an oral virome that otherwise could not be determined using standard approaches. We evaluate the sensitivity, accuracy and robustness of our proposed method for the viral case study and compare it with existing analysis approaches.

Availability and implementation

https://github.com/a-tadmor/MCRL.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Tadmor AD 

PROVIDER: S-EPMC10060711 | biostudies-literature | 2022 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

MCRL: using a reference library to compress a metagenome into a non-redundant list of sequences, considering viruses as a case study.

Tadmor Arbel D AD   Phillips Rob R  

Bioinformatics (Oxford, England) 20220101 3


<h4>Motivation</h4>Metagenomes offer a glimpse into the total genomic diversity contained within a sample. Currently, however, there is no straightforward way to obtain a non-redundant list of all putative homologs of a set of reference sequences present in a metagenome.<h4>Results</h4>To address this problem, we developed a novel clustering approach called 'metagenomic clustering by reference library' (MCRL), where a reference library containing a set of reference genes is clustered with respec  ...[more]

Similar Datasets

| S-EPMC8530359 | biostudies-literature
| S-EPMC5054409 | biostudies-literature
| S-EPMC7827920 | biostudies-literature
| S-EPMC2395239 | biostudies-literature
| S-EPMC9256941 | biostudies-literature
| S-EPMC11316123 | biostudies-literature
| S-EPMC5264535 | biostudies-literature
| S-EPMC310897 | biostudies-literature
| S-EPMC2837355 | biostudies-literature
| S-EPMC2383919 | biostudies-literature