Unknown

Dataset Information

0

Knowledge-guided data mining on the standardized architecture of NRPS: Subtypes, novel motifs, and sequence entanglements.


ABSTRACT: Non-ribosomal peptide synthetase (NRPS) is a diverse family of biosynthetic enzymes for the assembly of bioactive peptides. Despite advances in microbial sequencing, the lack of a consistent standard for annotating NRPS domains and modules has made data-driven discoveries challenging. To address this, we introduced a standardized architecture for NRPS, by using known conserved motifs to partition typical domains. This motif-and-intermotif standardization allowed for systematic evaluations of sequence properties from a large number of NRPS pathways, resulting in the most comprehensive cross-kingdom C domain subtype classifications to date, as well as the discovery and experimental validation of novel conserved motifs with functional significance. Furthermore, our coevolution analysis revealed important barriers associated with re-engineering NRPSs and uncovered the entanglement between phylogeny and substrate specificity in NRPS sequences. Our findings provide a comprehensive and statistically insightful analysis of NRPS sequences, opening avenues for future data-driven discoveries.

SUBMITTER: He R 

PROVIDER: S-EPMC10212144 | biostudies-literature | 2023 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Knowledge-guided data mining on the standardized architecture of NRPS: Subtypes, novel motifs, and sequence entanglements.

He Ruolin R   Zhang Jinyu J   Shao Yuanzhe Y   Gu Shaohua S   Song Chen C   Qian Long L   Yin Wen-Bing WB   Li Zhiyuan Z  

PLoS computational biology 20230515 5


Non-ribosomal peptide synthetase (NRPS) is a diverse family of biosynthetic enzymes for the assembly of bioactive peptides. Despite advances in microbial sequencing, the lack of a consistent standard for annotating NRPS domains and modules has made data-driven discoveries challenging. To address this, we introduced a standardized architecture for NRPS, by using known conserved motifs to partition typical domains. This motif-and-intermotif standardization allowed for systematic evaluations of seq  ...[more]

Similar Datasets

| S-EPMC3610217 | biostudies-literature
| S-EPMC4588551 | biostudies-other
| S-EPMC441497 | biostudies-literature
| S-EPMC3634064 | biostudies-literature
| S-EPMC3082213 | biostudies-literature
| S-EPMC1459173 | biostudies-literature
| S-EPMC7354552 | biostudies-literature
| S-EPMC5435971 | biostudies-literature
| S-EPMC4222026 | biostudies-literature
| S-EPMC7001330 | biostudies-literature