Home   >   CSC-OpenAccess Library   >    Manuscript Information
Rule-Based Standard Arabic Phonetization at Phoneme, Allophone, and Syllable Level
Fadi Sindran, Firas Mualla, Tino Haderlein, Khaled Daqrouq, Elmar Nöth
Pages - 23 - 37     |    Revised - 31-10-2016     |    Published - 01-12-2016
Volume - 7   Issue - 2    |    Publication Date - December 2016  Table of Contents
MORE INFORMATION
KEYWORDS
Phonetization, Standard Arabic, Phonetic Transcription, Pronunciation Dictionaries, Transcription Rules.
ABSTRACT
Phonetization is the transcription from written text into sounds. It is used in many natural language processing tasks, such as speech processing, speech synthesis, and computer-aided pronunciation assessment. A common phonetization approach is the use of letter-to-sound rules developed by linguists for the transcription from grapheme to sound. In this paper, we address the problem of rule-based phonetization of standard Arabic. 1The paper contributions can be summarized as follows: 1) Discussion of the transcription rules of standard Arabic which were used in literature on the phonemic and phonetic level. 2) Improvements of existing rules are suggested and new rules are introduced. Moreover, a comprehensive algorithm covering the phenomenon of pharyngealization in standard Arabic is proposed. Finally, the resulting rules set has been tested on large datasets. 3) We present a reliable automatic phonetic transcription of standard Arabic at five levels: phoneme, allophone, syllable, word, and sentence. An encoding which covers all sounds of standard Arabic is proposed, and several pronunciation dictionaries have been automatically generated. These dictionaries have been manually verified yielding an accuracy higher than 99 % for standard Arabic texts that do not contain dates, numbers, acronyms, abbreviations, and special symbols. The dictionaries are available for research purposes.
1 Google Scholar 
2 CiteSeerX 
3 Scribd 
4 SlideShare 
5 PdfSR 
A. Masmoudi, M. Ellouze Khemakhem, Y. Estève, L. Hadrich Belguith, N. Habash, "A corpus and phonetic dictionary for tunisian arabic speech recognition," in: LREC, 2014, pp. 306- 310.
Arpabet, Internet: https://en.wikipedia.org/wiki/Arpabet [October 23, 2016].
F. Biadsy, N. Habash, J. Hirschberg, "Improving the arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules," in: Proceedings of Human Language Technologies, The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL/HLT), 2009, pp. 397-405.
F. Sindran, F. Mualla, K. Bobzin, E. Nöth, "Automatic robust rule-based phonetization of standard Arabic," in: Text, Speech, and Dialogue, Vol. 9302 of LNAI, Springer, 2015, pp. 442-451.
I. A. Salim, [The syllabic structure in Arabic language] (in Arabic), Magazine of the Jordan Academy of Arabic 33, 1987, pp. 45-63.
I. Manzur, [The tongue of the Arabs] (in Arabic), DAR SADER, P. O. B. 10, Beirut, Lebanon, 1994.
K. Bobzin. [Arabic Basic Course] (in German: "Arabisch Grundkurs"). Wiesbaden, Germany: Harrassowitz Verlag, 2009.
K. Hadjar, R. Ingold, "Arabic newspaper page segmentation," in: 7th International Conference on Document Analysis and Recognition, Vol. 2, 2003, pp. 895-899.
M. al-Bukhari. [Sahih al-Bukhari] (in Arabic: " صَحِيحُُ اُلبُخَارِي "). [On-line]. Available: http://shamela.ws/browse.php/book-1681 [October 13, 2016].
M. Al-ghamdi, H. Al-Muhtasib, M. Elshafei, "Phonetic rules in arabic script," Journal of King Saud University - Computer and Information Sciences 16, 2004, pp. 85-115.
M. Alghamdi, A. H. Alhamid, M. M. Aldasuqi, "Database of Arabic Sounds: Sentences," Technical Report, King Abdulaziz City of Science and Technology, Saudi Arabia, 2003. (In Arabic).
M. Alghamdi, Y. O. M. El Hadj, M. Alkanhal, "A manual system to segment and transcribe arabic speech," in: IEEE International Conference on Signal Processing and Communications (ICSPC), 2007, pp. 233-236.
M. Ali, M. Elshafei, M. Al-Ghamdi, H. Al-Muhtaseb, A. Al-Najjar, "Arabic phonetic dictionaries for speech recognition," Journal of Information Technology Research 2, 2009, pp. 67-80.
M. Zeki, O.O. Khalifa, A.W. Naji, "Development of an arabic text-to-speech system," in: International Conference on Computer and Communication Engineering (ICCCE), 2010.
S. Harrat, K. Meftouh, M. Abbas, K. Smaili, " Grapheme to phoneme conversion: an arabic dialect case," in: 4th International Workshop on Spoken Language Technologies for Under- resourced Languages (SLTU'14), 2014.
S. Razi. [Nahj al-Balagha] (in Arabic: " نَهجُُ اُلبَلََغَة "). [On-line]. Available: http://ia600306.us.archive.org/7/items/98472389432/nhj-blagh-ali.pdf [October 13, 2016].
Y. El-Imam, "Phonetization of arabic: rules and algorithms," Computer Speech & Language 18, 2004, pp. 339-373.
[Holy Qur'an]. [On-line]. Available: http://www.holyquran.net/quran/index.html [October 13, 2016].
[The Mecca list of common vocabulary] (in Arabic: " قَائِمَةُُ مَُكَّةَُ لُِلمُفرَدَاتُِ اُلشَّائِعَة "). [On-line]. Available: http://daleel-ar.com/2016/09/08/ قائمة-مكة-للمفردات-الشائعة / [October 13, 2016].
Mr. Fadi Sindran
Friedrich-Alexander-Universität Erlangen-Nürnberg/Department of Computer Science 5 - Germany
fadi.sindran@faui51.informatik.uni-erlangen.de
Mr. Firas Mualla
Faculty of Engineering/Department of Computer Science /Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg - Germany
Dr. Tino Haderlein
Faculty of Engineering/Department of Computer Science /Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg - Germany
Professor Khaled Daqrouq
Department of Electrical and Computer Engineering King Abdulaziz University - Saudi Arabia
Professor Elmar Nöth
Faculty of Engineering/Department of Computer Science /Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg - Germany


CREATE AUTHOR ACCOUNT
 
LAUNCH YOUR SPECIAL ISSUE
View all special issues >>
 
PUBLICATION VIDEOS