Unknown

Dataset Information

0

Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer.


ABSTRACT: Hierarchical classification offers a more specific categorization of data and breaks down large classification problems into subproblems, providing improved prediction accuracy and predictive power for undefined categories, while also mitigating the impact of poor-quality data. Despite these advantages, its application in predicting primary cancer is rare. To leverage the similarity of cancers and the specificity of methylation patterns among them, we developed the Cancer Hierarchy Classification Tool (CHCT) using the idea of hierarchical classification, with methylation data from 30 cancer types and 8239 methylome samples downloaded from publicly available databases (The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO)). We used unsupervised clustering to divide the classification subproblems and screened differentially methylated sites using Analysis of variance (ANOVA) test, Tukey-kramer test, and Boruta algorithms to construct models for each classifier module. After validation, CHCT accurately classified 1568 out of 1660 cases in the test set, with an average accuracy of 94.46%. We further curated an independent validation cohort of 677 cancer samples from GEO and assigned a diagnosis using CHCT, which showed high diagnostic potential with generally high accuracies (an average accuracy of 91.40%). Moreover, CHCT demonstrates predictive capability for additional cancer types beyond its original classifier scope as demonstrated in the medulloblastoma and pituitary tumor datasets. In summary, CHCT can hierarchically classify primary cancer by methylation profile, by splitting a large-scale classification of 30 cancer types into ten smaller classification problems. These results indicate that cancer hierarchical classification has the potential to be an accurate and robust cancer classification method.

SUBMITTER: Yang Y 

PROVIDER: S-EPMC10709847 | biostudies-literature | 2023 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer.

Yang Youpeng Y   Zeng Qiuhong Q   Liu Gaotong G   Zheng Shiyao S   Luo Tianyang T   Guo Yibin Y   Tang Jia J   Huang Yi Y  

BMC bioinformatics 20231208 1


Hierarchical classification offers a more specific categorization of data and breaks down large classification problems into subproblems, providing improved prediction accuracy and predictive power for undefined categories, while also mitigating the impact of poor-quality data. Despite these advantages, its application in predicting primary cancer is rare. To leverage the similarity of cancers and the specificity of methylation patterns among them, we developed the Cancer Hierarchy Classificatio  ...[more]

Similar Datasets

| S-EPMC9275179 | biostudies-literature
| S-EPMC6056027 | biostudies-literature
| S-EPMC8770539 | biostudies-literature
| S-EPMC7485706 | biostudies-literature
| S-EPMC7848437 | biostudies-literature
| S-EPMC7579968 | biostudies-literature
| S-EPMC9559388 | biostudies-literature
| S-EPMC8345047 | biostudies-literature
| S-EPMC11855777 | biostudies-literature
| S-EPMC9984170 | biostudies-literature