Unknown

Dataset Information

0

Combination of whole genome sequencing and supervised machine learning provides unambiguous identification of eae-positive Shiga toxin-producing Escherichia coli.


ABSTRACT:

Introduction

The objective of this study was to develop, using a genome wide machine learning approach, an unambiguous model to predict the presence of highly pathogenic STEC in E. coli reads assemblies derived from complex samples containing potentially multiple E. coli strains. Our approach has taken into account the high genomic plasticity of E. coli and utilized the stratification of STEC and E. coli pathogroups classification based on the serotype and virulence factors to identify specific combinations of biomarkers for improved characterization of eae-positive STEC (also named EHEC for enterohemorrhagic E.coli) which are associated with bloody diarrhea and hemolytic uremic syndrome (HUS) in human.

Methods

The Machine Learning (ML) approach was used in this study on a large curated dataset composed of 1,493 E. coli genome sequences and 1,178 Coding Sequences (CDS). Feature selection has been performed using eight classification algorithms, resulting in a reduction of the number of CDS to six. From this reduced dataset, the eight ML models were trained with hyper-parameter tuning and cross-validation steps.

Results and discussion

It is remarkable that only using these six genes, EHEC can be clearly identified from E. coli read assemblies obtained from in silico mixtures and complex samples such as milk metagenomes. These various combinations of discriminative biomarkers can be implemented as novel marker genes for the unambiguous EHEC characterization from different E. coli strains mixtures as well as from raw milk metagenomes.

SUBMITTER: Vorimore F 

PROVIDER: S-EPMC10213463 | biostudies-literature | 2023

REPOSITORIES: biostudies-literature

altmetric image

Publications

Combination of whole genome sequencing and supervised machine learning provides unambiguous identification of <i>eae</i>-positive Shiga toxin-producing <i>Escherichia coli</i>.

Vorimore Fabien F   Jaudou Sandra S   Tran Mai-Lan ML   Richard Hugues H   Fach Patrick P   Delannoy Sabine S  

Frontiers in microbiology 20230512


<h4>Introduction</h4>The objective of this study was to develop, using a genome wide machine learning approach, an unambiguous model to predict the presence of highly pathogenic STEC in <i>E. coli</i> reads assemblies derived from complex samples containing potentially multiple <i>E. coli</i> strains. Our approach has taken into account the high genomic plasticity of <i>E. coli</i> and utilized the stratification of STEC and <i>E. coli</i> pathogroups classification based on the serotype and vir  ...[more]

Similar Datasets

| S-EPMC7733975 | biostudies-literature
| S-EPMC4542925 | biostudies-literature
| S-EPMC3739510 | biostudies-literature
| S-EPMC7040016 | biostudies-literature
| S-EPMC3966390 | biostudies-literature
| S-EPMC262916 | biostudies-other
| S-EPMC229445 | biostudies-other
| S-EPMC4838603 | biostudies-literature
| S-EPMC11278934 | biostudies-literature
| S-EPMC3502965 | biostudies-literature