Home > CSC-OpenAccess Library > Manuscript Information
EXPLORE PUBLICATIONS BY COUNTRIES |
EUROPE | |
MIDDLE EAST | |
ASIA | |
AFRICA | |
............................. | |
United States of America | |
United Kingdom | |
Canada | |
Australia | |
Italy | |
France | |
Brazil | |
Germany | |
Malaysia | |
Turkey | |
China | |
Taiwan | |
Japan | |
Saudi Arabia | |
Jordan | |
Egypt | |
United Arab Emirates | |
India | |
Nigeria |
Automatic Diacritic Restoration for Northern Sotho
Gabofetswe Alafang Malema, Moffat Motlhanka, Boago Okgetheng
Pages - 78 - 85 | Revised - 30-11-2022 | Published - 31-12-2022
MORE INFORMATION
KEYWORDS
Diacritic Restoration, Northern Sotho.
ABSTRACT
Diacritic markers are usually not inserted in text for convenience as users type text. However, text
without diacritic markers could affect the quality of its analysis as it may affect how it is
pronounced and its meaning among others. The number of diacritics and the impact of not
inserting them vary from language to language. The processes of restoring diacritics in the text
can be looked at as language-dependent and language-independent and also as word-based or
syllable based. Northern Sotho language uses two diacritic markers to indicate pronunciation and
also distinguish between homographs in some cases. Very little research has been done on
diacritics restoration in the Northern Sotho language. In this paper, we show that morphological
word transformations are consistent in how they insert or do not insert diacritics in derived words.
We focus on the caron diacritic marker.An input word is reduced to its root form by a
morphological analyzer. The accented form of the root word is retrieved from the diacritic
dictionary. This word, together with morphological rules is used to determine the diacritics of the
input word. The implemented tool gave a recall performance of 86% on test data. Most errors
were due to failures in the morphological analysis of the input word.
Asahiah F. O., Odejobi O. A., Adagunodo E.R., (2018), A survey of Approaches to Diacritic | |
Ezeani I. M., (2019), Corpus-Based Approaches to Igbo Diacritic Restoration. PhD Thesis, The University of Sheffield, 2019. | |
Lombard D.P,(1985). Introduction to the Grammar of Northern Sotho, J.L Schaik, 1985. | |
Louwrens L. J (1994), Dictionary of Northern Sotho Grammatical Terms, Via Afrika, 1994. | |
Malema G. and MotlhankaM,Okgetheng B., Motlogelwa N.P, Rammidi G., (2018), Rule Based | |
MihalceaR.,(2002), Diacritics Restoration: Learning from Letters Versus Learning from Words, Third International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2002, pp. 339 - 348. | |
Poulos G,(1994). A linguistic Analysis of Northern Sotho, Via Afrika, 1994. | |
Prediction using N-Gram and Memory-Based Learning, International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 8 No.4, 2017. | |
Restoration, Natural Language Engineering 1 (1): 1 - 23, 2018. | |
SchryverG. and Mogodi M., (2009). Oxford bilingual school dictionary: Northern Sotho and English. | |
Setswana Diacritic Restoration, The 20th International Conference on Linguistics and Languages, Nov 15-16, 2018, pp. 1059 - 1062. | |
Shaikh H, Mahar J.A, Mahar M. H, (2017), Instant Diacritics Restoration System for Sindhi Accent | |
Stankevicius L, LukoševiciusM., DzikienJ.K.,Briediene M.,Krilaviˇcius T. (2022), Correcting Diacritics and Typos with a ByT5 Transformer Model, Applied Sciences, 2022,12,2636. | |
Dr. Gabofetswe Alafang Malema
Department of Computer Science, University of Botswana, Gaborone - Botswana
malemag@mopipi.ub.bw
Dr. Moffat Motlhanka
Department of Computer Science, University of Botswana, Gaborone - Botswana
Mr. Boago Okgetheng
Department of Computer Science, University of Botswana, Gaborone - Botswana
|
|
|
|
View all special issues >> | |
|
|