Ontology highlight
ABSTRACT:
SUBMITTER: Madatov K
PROVIDER: S-EPMC10126844 | biostudies-literature | 2023 Jun
REPOSITORIES: biostudies-literature
Madatov Khabibulla K Bekchanov Shukurla S Vičič Jernej J
Data in brief 20230405
The dataset presented in this paper aims to address the challenge of automatic extraction of stop words in Natural Language Processing (NLP) for the low-resource Karakalpak language spoken by approximately two million people in Uzbekistan. To accomplish this, we have created a corpus of 23 Karakalpak language school textbooks, which we have named the Karakalpak Language School Corpus (KAASC). Using the KAASC corpus, we have constructed lists of stop words using three methods based on Term Freque ...[more]