An Empirical Study On The Holy Quran Based On A Large Classical Arabic Corpus
Maha Alrabiah, Nawal Alhelewh, AbdulMalik Al-Salman, Eric Atwell
Pages - 1 - 13     |    Revised - 31-03-2014     |    Published - 30-04-2014
Volume - 5   Issue - 1    |    Publication Date - April 2014  Table of Contents
Distributional Lexical Semantics, Quran, Classical Arabic Corpus, Collocation Extraction, Association Measures.
Distributional semantics is one of the empirical approaches to natural language processing and acquisition, which is mainly concerned by modeling word meaning using words distribution statistics gathered from huge corpora. Many distributional semantic models are available in the literature, but none of them have been applied so far to the Quran nor to Classical Arabic in general. This paper reports the construction of a very large corpus of Classical Arabic that will be used as a base to study distributional lexical semantics of the Quran and Classical Arabic. It also reports the results of two empirical studies; the first is applying a number of probabilistic distributional semantic models to automatically identify lexical collocations in the Quran and the other is applying those same models on the Classical Arabic corpus in an attempt to test their ability of capturing lexical collocations and co occurrences for a number of the corpus words. Results show that the MI.log_freq association measure achieved the highest results in extracting significant co-occurrences and collocations from small and large Classical Arabic corpora, while mutual information association measure achieved the worst results.
CITED BY (5)  
1 Dzulkifli, M. A., bin Abdul Rahman, A. W., Badi, J. A. B., & Solihu, A. K. H. (2016). Routes to Remembering: Lessons from al Huffaz. Mediterranean Journal of Social Sciences, 7(3 S1), 121.
2 Siddiqui, M. A., Dahab, M. Y., & Batarfi, O. A. (2015). Building A Sentiment Analysis Corpus With Multifaceted Hierarchical Annotation.
3 Alrabiah, M., Al-Salman, A., & Atwell, E. (2014, October). The refined MI: A significant improvement to mutual information. In Asian Language Processing (IALP), 2014 International Conference on (pp. 132-135). IEEE.
4 Atwell, E., & Alfaifi, A. Arabic corpus linguistics research at the University of Leeds.
5 Alrabiah, M., Al-salman, A., & Atwell, E. A New Distributional Semantic Model for Classical Arabic.
Mr. Maha Alrabiah
King Saud University - Saudi Arabia
Associate Professor Nawal Alhelewh
Princess Nora bint Abdul Rahman University - Saudi Arabia
Professor AbdulMalik Al-Salman
King Saud University - Saudi Arabia
Associate Professor Eric Atwell
Leeds University - United Kingdom