Ontology highlight
ABSTRACT:
SUBMITTER: Shikali CS
PROVIDER: S-EPMC7339006 | biostudies-literature | 2020 Aug
REPOSITORIES: biostudies-literature
Shikali Casper S CS Mokhosi Refuoe R
Data in brief 20200630
Language modelling using neural networks requires adequate data to guarantee quality word representation which is important for natural language processing (NLP) tasks. However, African languages, Swahili in particular, have been disadvantaged and most of them are classified as low resource languages because of inadequate data for NLP. In this article, we derive and contribute unannotated Swahili dataset, Swahili syllabic alphabet and Swahili word analogy dataset to address the need for language ...[more]