Home   >   CSC-OpenAccess Library   >    Manuscript Information
Joint Alignment of Segmentation and Labelling for Arabic Morphosyntactic Taggers
Abdulrahman Alosaimy, Eric Atwell
Pages - 1 - 12     |    Revised - 31-01-2018     |    Published - 30-04-2018
Volume - 9   Issue - 1    |    Publication Date - April 2018  Table of Contents
MORE INFORMATION
KEYWORDS
Arabic, POS-Tagging, Segmentation, Tokenisation, Morphological Alignment.
ABSTRACT
We present and compare three methods of alignment between morphemes resulting from four different Arabic POS-taggers as well as one baseline method using only provided labels. We combined four Arabic POS-taggers: MADAMIRA (MA), Stanford Tagger (ST), AMIRA (AM), Farasa (FA); and as the target output used two Classical Arabic gold standards: Quranic Arabic Corpus (QAC) and SALMA Standard Arabic Linguistics Morphological Analysis (SAL). We justify why we opt to use label for aligning instead of word form. The problem is not trivial as it is tackling six different tokenisation and labelling standards. The supervised learning using a unigram model scored the best segment alignment accuracy, correctly aligning 97% of morpheme segments. We then evaluated the alignment methods extrinsically, in terms of their effect in improving accuracy of ensemble POS-taggers, merging different combinations of the four Arabic POS-taggers. Using the best approach to align input POS taggers, ensemble tagger has correctly segmented and tagged 88.09% of morphemes. We show how increasing the number of input taggers raise the accuracy, suggesting that input taggers make different errors.
1 Google Scholar 
2 BibSonomy 
3 ResearchGate 
4 Doc Player 
5 White Rose Research Online 
6 Scribd 
7 SlideShare 
Adda, G., J. Mariani, J. Lecomte, P. Paroubek, and M. Rajman. "The GRACE French Part-of-Speech Tagging Evaluation Task.". International Conference on Language Resources and Evaluation, Granada, May. vol. 1 1998, pp. 433-441.
Alabbas, M.A.S. "Textual Entailment for Modern Standard Arabic", 2013.
Alashqar, A.M. "A Comparative Study on Arabic POS Tagging Using Quran Corpus". Informatics and Systems (INFOS), 2012, pp. NLP-29-NLP-33.
Atwell, E., J. Hughes, and C. Souter. "AMALGAM: Automatic Mapping Among Lexico-Grammatical Annotation Models". Proceedings of ACL Workshop on The Balancing Act: Combining Symbolic and Statistical Approaches to Language, 1994, pp. 11-20.
Breiman, L. "Random Forests". Machine Learning. vol. 45, 2001, pp. 5-32.
Diab, M. "Second Generation AMIRA Tools for Arabic Processing: Fast and Robust Tokenization, POS Tagging, and Base Phrase Chunking". ed. by Khalid Choukri and Bente Maegaard. Conference on Arabic Language Resources and Tools, 2009pp. 285-88.
Dukes, K., E. Atwell, and N. Habash. "Supervised Collaboration for Syntactic Annotation of Quranic Arabic". Language Resources and Evaluation, 2013.
Dyer, C., V. Chahuneau, and N.A. Smith. "A Simple, Fast, and Effective Reparameterization of Ibm Model 2", 2013.
Hughes, J., C. Souter, and E. Atwell. "Automatic Extraction of Tagset Mappings from Parallel-Annotated Corpora", 1995, pp. 8.
Katz, S., L. Lamel, and G. Adda. "Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer". IEEE Transactions on Acoustics, Speech, and Signal Processing. vol. 35, 1987,pp. 400-401.
M. Kurimo S. Virpioja, V.T.E.A. "Overview and Results of Morpho Challenge 2009". Access Evaluation, 2009.
Needleman, S.B., and C.D. Wunsch. "A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins". Journal of Molecular Biology. vol. 48, 1970, pp. 443-53.
Paroubek, P. "Evaluating Part-of-Speech Tagging and Parsing Patrick Paroubek". Evaluation of Text and Speech Systems, 2007.
Pasha, A., M. Al-Badrashiny, M. Diab, A. El Kholy, R. Eskander, N. Habash, and others. "Madamira: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic". in Proceedings of the Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland, 2014.
Sawalha, M., E. Atwell, and M. a M. Abushariah. "SALMA: Standard Arabic Language Morphological Analysis". 2013 1st International Conference on Communications, Signal Processing and Their Applications, ICCSPA, 2013, 2013.
Toutanova, K., D. Klein, and C.D. Manning. "Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network". In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1 (NAACL '03), 2003, pp. 252-59.
Zhang, Y., C. Li, R. Barzilay, and K. Darwish. "Randomized Greedy Inference for Joint Segmentation, POS Tagging and Dependency Parsing". Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015, pp. 42-52.
Mr. Abdulrahman Alosaimy
University of Leeds - United Kingdom
scama@leeds.ac.uk
Professor Eric Atwell
- United Kingdom