Proteomics

Dataset Information

0

Predictive Stool-Based Protein Biomarkers for the Classification of Crohn's Disease and Ulcerative Colitis Using a Machine Learning Approach


ABSTRACT: Background and Aim: Crohn's disease (CD) and ulcerative colitis (UC) are the two major chronic inflammatory bowel diseases (IBD). Although their symptoms are similar, their pathological features and clinical treatments differ. Currently, distinguishing between these diseases involves invasive procedures such as colonoscopy and histopathology, causing discomfort and inconvenience to patients. The use of fecal proteins as non-invasive biomarkers offers a promising alternative due to their stability and proximity to inflamed tissues. This study focuses on using high-throughput data-independent acquisition (DIA) mass spectrometry to develop accurate biomarker signatures from complex stool samples. Methods: Stool samples obtained from 46 active CD patients and 23 active UC patients were analyzed. Using DIA-based SWATH mass spectrometry, we explored the stool proteome, identifying and quantifying approximately 1,250 proteins. The samples were divided into training and testing groups. After data processing, various feature selection algorithms were applied on training group to determine proteins that were significantly different between the CD and UC groups. Additionally, six machine learning algorithms including k-Nearest Neighbors, Naive Bayes, eXtreme Gradient Boosting, Random Forest, Support Vector Machine, and glmnet were evaluated to identify the best-performing classifiers. Results: Sixteen proteins were selected based of several feature selection algorithms and the six ML models trained based on them. According to performance metrics of each algorithm on the training dataset, Naïve Bayes model was selected. For performance validation, the final predictive model was applied to 16 prospective samples as the test dataset. Remarkably, the model achieved an AUC of 0.95 on training dataset and AUC of 0.96 on the test dataset, demonstrating its robustness and lack of overfitting. Conclusion: This study demonstrates the effectiveness of SWATH-based proteomics and machine learning in developing predictive models to classify CD and UC. Further future validation on a larger cohort using targeted MRM mass spectrometry would be served to establish the clinical utility and reliability of this approach.

INSTRUMENT(S):

ORGANISM(S): Homo Sapiens (human)

DISEASE(S): Inflammatory Bowel Disease

SUBMITTER: Elmira Shajari  

LAB HEAD: Jean-Francois Beaulieu

PROVIDER: PXD057120 | Pride | 2025-12-01

REPOSITORIES: Pride

Dataset's files

Source:
Action DRS
00_Wiff_IBD_SWATH.zip Other
01_mxMLconverted_DIA_NNinput_4batches.zip Other
FASTA.2022-10-06-reviewed-isoforms-contam-UP000005640-spikein.fas Other
Library-IBD_Lib.tsv.speclib Tabular
SampleAnnotation.xlsx Xlsx
Items per page:
1 - 5 of 23
altmetric image

Publications

Stool-Based Proteomic Signature for the Noninvasive Classification of Crohn's Disease and Ulcerative Colitis Using Machine Learning.

Shajari Elmira E   Gagné David D   Bourassa Francis F   Malick Mandy M   Roy Patricia P   Noël Jean-François JF   Gagnon Hugo H   Delisle Maxime M   Boisvert François-Michel FM   Brunet Marie M   Beaulieu Jean-François JF  

Clinical and translational gastroenterology 20251101 11


<h4>Introduction</h4>Crohn's disease (CD) and ulcerative colitis (UC) have overlapping symptoms, but they differ in pathology and treatment. Currently, distinguishing between these diseases involves invasive procedures such as colonoscopy and histopathology. Fecal proteins, stable and in direct contact with inflammation, offer a noninvasive alternative. This study focuses on using high-throughput data-independent acquisition mass spectrometry and machine learning to develop an accurate biomarker  ...[more]

Similar Datasets

2022-04-11 | MSV000089237 | MassIVE
2023-08-07 | GSE202160 | GEO
2022-04-15 | MTBLS688 | MetaboLights
2025-05-28 | GSE293354 | GEO
2013-09-23 | E-GEOD-36807 | biostudies-arrayexpress
2013-09-23 | GSE36807 | GEO
2013-05-09 | E-GEOD-46754 | biostudies-arrayexpress
2010-05-19 | E-GEOD-15370 | biostudies-arrayexpress
2019-02-25 | GSE99573 | GEO
2017-02-07 | MTBLS237 | MetaboLights