Home   >   CSC-OpenAccess Library   >    Manuscript Information
Building A Sentiment Analysis Corpus With Multifaceted Hierarchical Annotation
Muazzam Ahmed Siddiqui, Mohamed Yehia Dahab, Omar Abdullah Batarfi
Pages - 11 - 25     |    Revised - 31-05-2015     |    Published - 30-06-2015
Volume - 6   Issue - 2    |    Publication Date - May / June 2015  Table of Contents
MORE INFORMATION
KEYWORDS
Multifaceted Text Categorization, Hierarchical Text Categorization, Sentiment Analysis, Corpus Linguistics, Arabic Natural Language Processing, Text Mining.
ABSTRACT
A corpus is a collection of documents. An annotated corpus consists of documents or entities annotated with some task related labels such as part of speech tags, sentiment etc. While it is customary to annotate a document for a specific task, it is also possible to annotate it for multiple tasks, resulting in a multifaceted annotation scheme. These annotations can be organized in a hierarchical fashion, if such a scheme naturally occurred in the data, resulting in a hierarchical text categorization problem. We developed a multifaceted, multilingual corpus for hierarchical sentiment analysis. The different facets include hierarchical nominal sentiment labels, a numerical sentiment score, language, and the dialect. Our corpus consists of 191K reviews of hotels in Saudi Arabia. The reviews are divided into eleven different categories. Within each category, the reviews are further divided into two positive and negative categories. The corpus contains 1.8 million tokens. Reviews are mostly written in Arabic and English but there are instances of other languages too.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
"Arabic numerals," Wikipedia, [Online]. Available: http://en.wikipedia.org/wiki/Arabic_numerals. [Accessed 20 4 2015].
"DMOZ," Open Directory Project, [Online]. Available: http://www.dmoz.org/. [Accessed 20 4 2015].
"Eastern Arabic numerals," Wikipedia, [Online]. Available: http://en.wikipedia.org/wiki/Eastern_Arabic_numerals. [Accessed 20 4 2015].
"Internet Public Library," ipl2, [Online]. Available: http://www.ipl.org/. [Accessed 20 4 2015].
A. Abbasi, C. Hsinchun and A. Salem, "Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums.," ACM Transactions on Information Systems, vol. 26, no. 2, 2008.
A. D. Gordon, "A Review of Hierarchical Classification," Journal of the Royal Statistical Society. Series A (General), vol. 150, no. 2, pp. 119-137, 1987.
A. Kapiszewski, Arab Vs Asian Migrant Workers in the GCC countries, 2006.
A. Pasha, M. Al-Badrashiny, M. Diab, A. El Kholy, R. Eskander, N. Habash, M. Pooleery, O. Rambow and R. Roth, "MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic," in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 2014.
C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Pres, 1999.
F. Mahyoub, M. Siddiqui and M. Dahab, "Building an Arabic Sentiment Lexicon Using Semisupervised Learning," Journal of King Saud University - Computer and Information Sciences, vol. 26, no. 4, pp. 417-424, 2014.
H. Elfardy and M. Diab, "Simplified guidelines for the creation of Large Scale Dialectal Arabic Annotations.," in Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), 2012.
H. Elfardy, M. Al-Badrashiny and M. Diab, "Code Switch Point Detection in Arabic," in Natural Language Processing and Information Systems, Springer, 2013, pp. 412-416.
J. Garrett, "Ajax: A New Approach to Web Applications," 18 2 2005. [Online]. Available: http://www.adaptivepath.com/ideas/ajax-new-approach-web-applications/. [Accessed 20 4 2015].
J. Wiebe, T. Wilson and C. Cardie, "Annotating Expressions of Opinions and Emotions in Language," Language Resources and Evaluation, vol. 39, no. 2-3, pp. 165-210, 2005.
M. Abdul-Mageed and M. Diab, "Subjectivity and sentiment annotation of modern standard arabic newswire.," in Proceedings of the 5th Linguistic Annotation Workshop (LAW V '11), 2011.
M. Abdul-Mageed, M. Diab and S. Kubler, "SAMAR: Subjectivity and sentiment analysis for Arabic social media.," Computer Speech & Language, vol. 28, no. 1, pp. 20-37, 2014.
M. Alrabiah , A. Al-Salman, A. Al-Salman and E. Atwell, "An Empirical Study On The Holy Quran Based On A Large Classical Arabic Corpus," International Journal of Computational Linguistics (IJCL), vol. 5, no. 1, pp. 1-13, 2014.
M. Diab, N. Habash, O. Rambow, M. Altantawy and Y. Benajiba, COLABA: Arabic dialect annotation and processing., LREC Workshop on Semitic Language Processing, 2010.
O. Zaidan and C. Callison-Burch, "The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2 (HLT '11), Vol. 2, 2011.
R. Al-Sabbagh and R. Girju, "YADAC: Yet another Dialectal Arabic Corpus," in Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), 2012.
R. Cotterell and C. Callison-Burch, "A multi-dialect, multi-genre corpus of informal written Arabic," in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 2014.
S. Alhazmi, W. Black and J. McNaught, "Arabic SentiWordNet in Relation to SentiWordNet 3.0," International Journal of Computational Linguistics (IJCL), vol. 4, no. 1, pp. 1-11, 2013.
W. Dakka, P. Ipeirotis and K. Wood, "Automatic construction of multifaceted browsing interfaces," in Proceedings of the 14th ACM international conference on Information and knowledge management , 2005.
Dr. Muazzam Ahmed Siddiqui
King Abdulaziz University - Saudi Arabia
maasiddiqui@kau.edu.sa
Dr. Mohamed Yehia Dahab
Department of Computer Science Faculty of Computing and Information Technology King Abdulaziz University Saudi Arabia - Saudi Arabia
Dr. Omar Abdullah Batarfi
Department of Information Technology Faculty of Computing and Information Technology King Abdulaziz University Saudi Arabia - Saudi Arabia