Ontology highlight
ABSTRACT:
SUBMITTER: Thielmann A
PROVIDER: S-EPMC9930816 | biostudies-literature | 2023
REPOSITORIES: biostudies-literature
Thielmann Anton A Weisser Christoph C Krenz Astrid A Säfken Benjamin B
Journal of applied statistics 20210427 3
Unsupervised document classification for imbalanced data sets poses a major challenge. To obtain accurate classification results, training data sets are often created manually by humans which requires expert knowledge, time and money. Depending on the imbalance of the data set, this approach also either requires human labelling of all of the data or it fails to adequately recognize underrepresented categories. We propose an integration of web scraping, one-class Support Vector Machines (SVM) and ...[more]