Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

This is an Open Access publication published under CSC-OpenAccess Policy.
Publications from CSC-OpenAccess Library are being accessed from over 158 countries worldwide.
An Empirical Study On The Holy Quran Based On A Large Classical Arabic Corpus
Maha Alrabiah, Nawal Alhelewh, AbdulMalik Al-Salman, Eric Atwell
Pages - 1 - 13     |    Revised - 31-03-2014     |    Published - 30-04-2014
Volume - 5   Issue - 1    |    Publication Date - April 2014  Table of Contents
Distributional Lexical Semantics, Quran, Classical Arabic Corpus, Collocation Extraction, Association Measures.
Distributional semantics is one of the empirical approaches to natural language processing and acquisition, which is mainly concerned by modeling word meaning using words distribution statistics gathered from huge corpora. Many distributional semantic models are available in the literature, but none of them have been applied so far to the Quran nor to Classical Arabic in general. This paper reports the construction of a very large corpus of Classical Arabic that will be used as a base to study distributional lexical semantics of the Quran and Classical Arabic. It also reports the results of two empirical studies; the first is applying a number of probabilistic distributional semantic models to automatically identify lexical collocations in the Quran and the other is applying those same models on the Classical Arabic corpus in an attempt to test their ability of capturing lexical collocations and co occurrences for a number of the corpus words. Results show that the MI.log_freq association measure achieved the highest results in extracting significant co-occurrences and collocations from small and large Classical Arabic corpora, while mutual information association measure achieved the worst results.
CITED BY (5)  
1 Dzulkifli, M. A., bin Abdul Rahman, A. W., Badi, J. A. B., & Solihu, A. K. H. (2016). Routes to Remembering: Lessons from al Huffaz. Mediterranean Journal of Social Sciences, 7(3 S1), 121.
2 Siddiqui, M. A., Dahab, M. Y., & Batarfi, O. A. (2015). Building A Sentiment Analysis Corpus With Multifaceted Hierarchical Annotation.
3 Alrabiah, M., Al-Salman, A., & Atwell, E. (2014, October). The refined MI: A significant improvement to mutual information. In Asian Language Processing (IALP), 2014 International Conference on (pp. 132-135). IEEE.
4 Atwell, E., & Alfaifi, A. Arabic corpus linguistics research at the University of Leeds.
5 Alrabiah, M., Al-salman, A., & Atwell, E. A New Distributional Semantic Model for Classical Arabic.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
1 K. Dukes, and N. Habash, (2010). "Morphological annotation of Quranic Arabic." The seventh international conference on Language Resources and Evaluation (LREC-2010),Valletta, Malta, 2010.
2 A. Ibn Ashoor, Al-Tahreer wa Al-tanweer, in Arabic, Dar Sahnoon, Tunisia, 1997.
3 C.D. Manning, and H. Schuetze, Foundations of Statistical Natural Language Processing,1st ed., The MIT Press, 1999.
4 A. Elewa, "Did they translate the Qur'an or its exegesis?." 3rd Languages and Translation Conference and Exhibition on Translation and Arbization in Saudi Arabia, Riyadh, Saudi Arabia, 2009.
5 M. Sahlgren, "The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces." Ph.D. dissertation, Department of Linguistics, Stockholm University, 2006.
6 Z. Harris, and H. Hiz, "Papers on syntax", Springer, pp. 3-22, 1981.
7 H. Rubenstein, and J. Goodenough, "Contextual correlates of synonymy." Communications of the ACM, vol. 8, pp. 627–633, 1965.
8 P. Pantel, "Inducing ontological co-occurrence vectors." In Proceedings of the 43rd Conference of the Association for Computational Linguistics, ACL’05, pp. 125–132, 2005.
9 W. N. Francis, and H. Kucera, "Brown Corpus Manual: Manual Of Information To Accompany A Standard Corpus of Present-Day Edited American English, for use with Digital Computers." Internet: http://khnt.hit.uib.no/icame/manuals/brown/INDEX.HTM [Feb. 20,2014].
10 S. Johansson, E. Atwell, R. Garside and G. Leech, "The Tagged LOB Corpus: Users' manual." ICAME, The Norwegian Computing Centre for the Humanities, Bergen University,Norway, 1986.
11 L. Burnard, "British National Corpus: User's reference guide for the British National Corpus".Oxford, Oxford University Computing Service, 1995.
12 L. Al-Sulaiti, and E. Atwell, "The design of a corpus of contemporary Arabic." International Journal of Corpus Linguistics, vol. 11, pp. 135-171, 2006.
13 M. Alrabiah, A. Al-Salman and E. Atwell, “The design and construction of the 50 million words KSUCCA King Saud University Corpus of Classical Arabic”, In Second Workshop on Arabic Corpus Linguistics (WACL-2), Monday 22nd July 2013, Lancaster University, UK,2013.
14 M. Eid, Manifestations Emerging on Arabic. in Arabic, A'alam Alkutub, Cairo, pp. 20, 1980.
15 J. Sinclair, "Corpus and Text - Basic Principles." In Developing Linguistic Corpora: a Guide to Good Practice, ed. M. Wynne. Oxford: Oxbow Books, 2005.
16 H. Duhainah, "Linguistic Collocations and Their Significance in Determining The Semantics of The Holy Quran A Theoretical and Applied Study." in Arabic, PhD dissertation, Al-Azhar University, Cairo, Egypt, 2007.
17 K. Church, W. Gale, P. Hanks, and D. Hindle, "Using statistics in lexical analysis." In: Uri Zernik (ed.) Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. Lawrence Erlbaum, New Jersey, pp. 115-164, 1991.
18 P. Rychly, "A lexicographer-friendly association score". In Sojka, P. & Horák, A. (eds.)Proceedings of Second Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2008 , 6-9. Brno: Masaryk University, 2008.
19 T. Dunning, "Accurate methods for the statistics of surprise and coincidence." Computational Linguistics, vol. 19, pp. 61-74, 1993.
20 C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval.Cambridge, UK: Cambridge University Press, 2008.
21 S. Boulaknadel, B. Daille and D. Aboutajdine, “A multi-word term extraction program for Arabic language”, the 6th international Conference on Language Resources and Evaluation LREC 2008, Marrakech, Morocco, pp. 1485-1488, 2008.
22 A. Saif, and M. Ab Aziz, "An Automatic Collocation Extraction from Arabic Corpus." Journal of Computer Science, vol. 7, pp. 6-11, 2011.
23 I. Bounhas, and Y. Slimani, "A hybrid approach for Arabic multi-word term extraction." In IEEE, pp. 1-8, 2009.
24 J. Weeds, and D. Weir, “Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity.” Computational Linguistics, vol. 31(4), pp. 439-475, 2005.
25 S. Gries, "Useful statistics for corpus linguistics." In Aquilino Sánchez & Moisés Almela (ed.),A mosaic of corpus linguistics: selected approaches, pp. 269-291, 2010.
Mr. Maha Alrabiah
King Saud University - Saudi Arabia
Associate Professor Nawal Alhelewh
Princess Nora bint Abdul Rahman University - Saudi Arabia
Professor AbdulMalik Al-Salman
King Saud University - Saudi Arabia
Associate Professor Eric Atwell
Leeds University - United Kingdom