Unknown

Dataset Information

0

MetaCerberus: distributed highly parallelized HMM-based processing for robust functional annotation across the tree of life.


ABSTRACT:

Motivation

MetaCerberus is a massively parallel, fast, low memory, scalable annotation tool for inference gene function across genomes to metacommunities. MetaCerberus provides an elusive HMM/HMMER-based tool at a rapid scale with low memory. It offers scalable gene elucidation to major public databases, including KEGG (KO), COGs, CAZy, FOAM, and specific databases for viruses, including VOGs and PHROGs, from single genomes to metacommunities.

Results

MetaCerberus is 1.3× as fast on a single node than eggNOG-mapper v2 on 5× less memory using an exclusively HMM/HMMER mode. In a direct comparison, MetaCerberus provides better annotation of viruses, phages, and archaeal viruses than DRAM, Prokka, or InterProScan. MetaCerberus annotates more KOs across domains when compared to DRAM, with a 186× smaller database, and with 63× less memory. MetaCerberus is fully integrated for automatic analysis of statistics and pathways using differential statistic tools (i.e. DESeq2 and edgeR), pathway enrichment (GAGE R), and pathview R. MetaCerberus provides a novel tool for unlocking the biosphere across the tree of life at scale.

Availability and implementation

MetaCerberus is written in Python and distributed under a BSD-3 license. The source code of MetaCerberus is freely available at https://github.com/raw-lab/metacerberus compatible with Python 3 and works on both Mac OS X and Linux. MetaCerberus can also be easily installed using bioconda: mamba create -n metacerberus -c bioconda -c conda-forge metacerberus.

SUBMITTER: Figueroa Iii JL 

PROVIDER: S-EPMC10955254 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

MetaCerberus: distributed highly parallelized HMM-based processing for robust functional annotation across the tree of life.

Figueroa Iii Jose L JL   Dhungel Eliza E   Bellanger Madeline M   Brouwer Cory R CR   White Iii Richard Allen RA  

Bioinformatics (Oxford, England) 20240301 3


<h4>Motivation</h4>MetaCerberus is a massively parallel, fast, low memory, scalable annotation tool for inference gene function across genomes to metacommunities. MetaCerberus provides an elusive HMM/HMMER-based tool at a rapid scale with low memory. It offers scalable gene elucidation to major public databases, including KEGG (KO), COGs, CAZy, FOAM, and specific databases for viruses, including VOGs and PHROGs, from single genomes to metacommunities.<h4>Results</h4>MetaCerberus is 1.3× as fast  ...[more]

Similar Datasets

| S-EPMC6704095 | biostudies-literature
| S-EPMC10060690 | biostudies-literature
| S-EPMC11429614 | biostudies-literature
| S-EPMC58584 | biostudies-literature
| S-EPMC11466848 | biostudies-literature
| S-EPMC7200070 | biostudies-literature
| S-EPMC7579964 | biostudies-literature
| S-EPMC2536674 | biostudies-literature
| S-EPMC6942926 | biostudies-literature
| S-EPMC10729974 | biostudies-literature