Ontology highlight
ABSTRACT:
SUBMITTER: Moussa HN
PROVIDER: S-EPMC10293988 | biostudies-literature | 2023 Jun
REPOSITORIES: biostudies-literature
Moussa Hanane Nour HN Mourhir Asmaa A
Data in brief 20230512
DarNERcorp is a manually annotated named entity recognition (NER) dataset in the Moroccan dialect, also called Darija. The dataset consists of 65,905 tokens and their corresponding tags according to BIO scheme. 13.8% of the tokens are named entities spanning four categories: person, location, organization, and miscellaneous. The data were scraped from the Moroccan Dialect section of Wikipedia and processed and annotated using open-source libraries and tools. The data are useful for the Arabic na ...[more]