Home   >   CSC-OpenAccess Library   >    Manuscript Information
An Empirical Study On The Holy Quran Based On A Large Classical Arabic Corpus
Maha Alrabiah, Nawal Alhelewh, AbdulMalik Al-Salman, Eric Atwell
Pages - 1 - 13     |    Revised - 31-03-2014     |    Published - 30-04-2014
Volume - 5   Issue - 1    |    Publication Date - April 2014  Table of Contents
MORE INFORMATION
KEYWORDS
Distributional Lexical Semantics, Quran, Classical Arabic Corpus, Collocation Extraction, Association Measures.
ABSTRACT
Distributional semantics is one of the empirical approaches to natural language processing and acquisition, which is mainly concerned by modeling word meaning using words distribution statistics gathered from huge corpora. Many distributional semantic models are available in the literature, but none of them have been applied so far to the Quran nor to Classical Arabic in general. This paper reports the construction of a very large corpus of Classical Arabic that will be used as a base to study distributional lexical semantics of the Quran and Classical Arabic. It also reports the results of two empirical studies; the first is applying a number of probabilistic distributional semantic models to automatically identify lexical collocations in the Quran and the other is applying those same models on the Classical Arabic corpus in an attempt to test their ability of capturing lexical collocations and co occurrences for a number of the corpus words. Results show that the MI.log_freq association measure achieved the highest results in extracting significant co-occurrences and collocations from small and large Classical Arabic corpora, while mutual information association measure achieved the worst results.
CITED BY (5)  
1 Dzulkifli, M. A., bin Abdul Rahman, A. W., Badi, J. A. B., & Solihu, A. K. H. (2016). Routes to Remembering: Lessons from al Huffaz. Mediterranean Journal of Social Sciences, 7(3 S1), 121.
2 Siddiqui, M. A., Dahab, M. Y., & Batarfi, O. A. (2015). Building A Sentiment Analysis Corpus With Multifaceted Hierarchical Annotation.
3 Alrabiah, M., Al-Salman, A., & Atwell, E. (2014, October). The refined MI: A significant improvement to mutual information. In Asian Language Processing (IALP), 2014 International Conference on (pp. 132-135). IEEE.
4 Atwell, E., & Alfaifi, A. Arabic corpus linguistics research at the University of Leeds.
5 Alrabiah, M., Al-salman, A., & Atwell, E. A New Distributional Semantic Model for Classical Arabic.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
A. Elewa, "Did they translate the Qur'an or its exegesis?." 3rd Languages and Translation Conference and Exhibition on Translation and Arbization in Saudi Arabia, Riyadh, Saudi Arabia, 2009.
A. Ibn Ashoor, Al-Tahreer wa Al-tanweer, in Arabic, Dar Sahnoon, Tunisia, 1997.
A. Saif, and M. Ab Aziz, "An Automatic Collocation Extraction from Arabic Corpus." Journal of Computer Science, vol. 7, pp. 6-11, 2011.
C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval.Cambridge, UK: Cambridge University Press, 2008.
C.D. Manning, and H. Schuetze, Foundations of Statistical Natural Language Processing,1st ed., The MIT Press, 1999.
H. Duhainah, "Linguistic Collocations and Their Significance in Determining The Semantics of The Holy Quran A Theoretical and Applied Study." in Arabic, PhD dissertation, Al-Azhar University, Cairo, Egypt, 2007.
H. Rubenstein, and J. Goodenough, "Contextual correlates of synonymy." Communications of the ACM, vol. 8, pp. 627–633, 1965.
I. Bounhas, and Y. Slimani, "A hybrid approach for Arabic multi-word term extraction." In IEEE, pp. 1-8, 2009.
J. Sinclair, "Corpus and Text - Basic Principles." In Developing Linguistic Corpora: a Guide to Good Practice, ed. M. Wynne. Oxford: Oxbow Books, 2005.
J. Weeds, and D. Weir, “Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity.” Computational Linguistics, vol. 31(4), pp. 439-475, 2005.
K. Church, W. Gale, P. Hanks, and D. Hindle, "Using statistics in lexical analysis." In: Uri Zernik (ed.) Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. Lawrence Erlbaum, New Jersey, pp. 115-164, 1991.
K. Dukes, and N. Habash, (2010). "Morphological annotation of Quranic Arabic." The seventh international conference on Language Resources and Evaluation (LREC-2010),Valletta, Malta, 2010.
L. Al-Sulaiti, and E. Atwell, "The design of a corpus of contemporary Arabic." International Journal of Corpus Linguistics, vol. 11, pp. 135-171, 2006.
L. Burnard, "British National Corpus: User's reference guide for the British National Corpus".Oxford, Oxford University Computing Service, 1995.
M. Alrabiah, A. Al-Salman and E. Atwell, “The design and construction of the 50 million words KSUCCA King Saud University Corpus of Classical Arabic”, In Second Workshop on Arabic Corpus Linguistics (WACL-2), Monday 22nd July 2013, Lancaster University, UK,2013.
M. Eid, Manifestations Emerging on Arabic. in Arabic, A'alam Alkutub, Cairo, pp. 20, 1980.
M. Sahlgren, "The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces." Ph.D. dissertation, Department of Linguistics, Stockholm University, 2006.
P. Pantel, "Inducing ontological co-occurrence vectors." In Proceedings of the 43rd Conference of the Association for Computational Linguistics, ACL’05, pp. 125–132, 2005.
P. Rychly, "A lexicographer-friendly association score". In Sojka, P. & Horák, A. (eds.)Proceedings of Second Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2008 , 6-9. Brno: Masaryk University, 2008.
S. Boulaknadel, B. Daille and D. Aboutajdine, “A multi-word term extraction program for Arabic language”, the 6th international Conference on Language Resources and Evaluation LREC 2008, Marrakech, Morocco, pp. 1485-1488, 2008.
S. Gries, "Useful statistics for corpus linguistics." In Aquilino Sánchez & Moisés Almela (ed.),A mosaic of corpus linguistics: selected approaches, pp. 269-291, 2010.
S. Johansson, E. Atwell, R. Garside and G. Leech, "The Tagged LOB Corpus: Users' manual." ICAME, The Norwegian Computing Centre for the Humanities, Bergen University,Norway, 1986.
T. Dunning, "Accurate methods for the statistics of surprise and coincidence." Computational Linguistics, vol. 19, pp. 61-74, 1993.
W. N. Francis, and H. Kucera, "Brown Corpus Manual: Manual Of Information To Accompany A Standard Corpus of Present-Day Edited American English, for use with Digital Computers." Internet: http://khnt.hit.uib.no/icame/manuals/brown/INDEX.HTM [Feb. 20,2014].
Z. Harris, and H. Hiz, "Papers on syntax", Springer, pp. 3-22, 1981.
Mr. Maha Alrabiah
King Saud University - Saudi Arabia
msrabiah@gmail.com
Associate Professor Nawal Alhelewh
Princess Nora bint Abdul Rahman University - Saudi Arabia
Professor AbdulMalik Al-Salman
King Saud University - Saudi Arabia
Associate Professor Eric Atwell
Leeds University - United Kingdom