Home   >   CSC-OpenAccess Library   >    Manuscript Information
Automatic Diacritic Restoration for Northern Sotho
Gabofetswe Alafang Malema, Moffat Motlhanka, Boago Okgetheng
Pages - 78 - 85     |    Revised - 30-11-2022     |    Published - 31-12-2022
Volume - 13   Issue - 4    |    Publication Date - December 2022  Table of Contents
Diacritic Restoration, Northern Sotho.
Diacritic markers are usually not inserted in text for convenience as users type text. However, text without diacritic markers could affect the quality of its analysis as it may affect how it is pronounced and its meaning among others. The number of diacritics and the impact of not inserting them vary from language to language. The processes of restoring diacritics in the text can be looked at as language-dependent and language-independent and also as word-based or syllable based. Northern Sotho language uses two diacritic markers to indicate pronunciation and also distinguish between homographs in some cases. Very little research has been done on diacritics restoration in the Northern Sotho language. In this paper, we show that morphological word transformations are consistent in how they insert or do not insert diacritics in derived words. We focus on the caron diacritic marker.An input word is reduced to its root form by a morphological analyzer. The accented form of the root word is retrieved from the diacritic dictionary. This word, together with morphological rules is used to determine the diacritics of the input word. The implemented tool gave a recall performance of 86% on test data. Most errors were due to failures in the morphological analysis of the input word.
Asahiah F. O., Odejobi O. A., Adagunodo E.R., (2018), A survey of Approaches to Diacritic
Ezeani I. M., (2019), Corpus-Based Approaches to Igbo Diacritic Restoration. PhD Thesis, The University of Sheffield, 2019.
Lombard D.P,(1985). Introduction to the Grammar of Northern Sotho, J.L Schaik, 1985.
Louwrens L. J (1994), Dictionary of Northern Sotho Grammatical Terms, Via Afrika, 1994.
Malema G. and MotlhankaM,Okgetheng B., Motlogelwa N.P, Rammidi G., (2018), Rule Based
MihalceaR.,(2002), Diacritics Restoration: Learning from Letters Versus Learning from Words, Third International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2002, pp. 339 - 348.
Poulos G,(1994). A linguistic Analysis of Northern Sotho, Via Afrika, 1994.
Prediction using N-Gram and Memory-Based Learning, International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 8 No.4, 2017.
Restoration, Natural Language Engineering 1 (1): 1 - 23, 2018.
SchryverG. and Mogodi M., (2009). Oxford bilingual school dictionary: Northern Sotho and English.
Setswana Diacritic Restoration, The 20th International Conference on Linguistics and Languages, Nov 15-16, 2018, pp. 1059 - 1062.
Shaikh H, Mahar J.A, Mahar M. H, (2017), Instant Diacritics Restoration System for Sindhi Accent
Stankevicius L, LukoševiciusM., DzikienJ.K.,Briediene M.,Krilaviˇcius T. (2022), Correcting Diacritics and Typos with a ByT5 Transformer Model, Applied Sciences, 2022,12,2636.
Dr. Gabofetswe Alafang Malema
Department of Computer Science, University of Botswana, Gaborone - Botswana
Dr. Moffat Motlhanka
Department of Computer Science, University of Botswana, Gaborone - Botswana
Mr. Boago Okgetheng
Department of Computer Science, University of Botswana, Gaborone - Botswana

View all special issues >>