Home   >   CSC-OpenAccess Library   >    Manuscript Information
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Matching Technique
Tabassam Nawaz, Syed Ammar Hassan Shah Naqvi, Habib ur Rehman , Anoshia Faiz
Pages - 92 - 104     |    Revised - 05-08-2009     |    Published - 01-09-2009
Volume - 3   Issue - 3    |    Publication Date - June 2009  Table of Contents
Pattern matching , chain code creation , morphology , segmentation , training system , recognition system
The offline optical character recognition (OCR) for different languages has been developed over the recent years. Since 1965, the US postal service has been using this system for automating their services. The range of the applications under this area is increasing day by day, due to its utility in almost major areas of government as well as private sector. This technique has been very useful in making paper free environment in many major organizations as far as the backup of their previous file record is concerned. Our this system has been proposed for the Offline Character Recognition for Isolated Characters of Urdu language, as Urdu language forms words by combining Isolated Characters. Urdu is a cursive language, having connected characters making words. The major area of utility for Urdu OCR will be digitizing of a lot of literature related material already stocked in libraries. Urdu language is famous and spoken in more than 3 big countries including Pakistan, India and Bangladesh. A lot of work has been done in Urdu poetry and literature up to the recent century. Creation of OCR for Urdu language will make an important role in converting all those work from physical libraries to electronic libraries. Most of the stuff already placed on internet is in the form of images having text, which took a lot of space to transfer and even read online. So the need of an Urdu OCR is a must. The system is of training system type. It consists of the image preprocessing, line and character segmentation, creation of xml file for training purpose. While Recognition system includes taking xml file, the image to be recognized, segment it and creation of chain codes for character images and matching with already stored in xml file. The system has been implemented and it has 89% recognition accuracy with a 15 char/sec recognition rate.
CITED BY (23)  
1 Dahi, M., Semary, N. A., & Hadhoud, M. M. (2015, December). A comparative study of different approaches of primitive printed Arabic Optical Character Recognition. In 2015 11th International Computer Engineering Conference (ICENCO) (pp. 105-110). IEEE.
2 Batool, E., Mustafa, H. O., Fatima, M., & Khan, A. A. (2015). A Road Map of Urdu Layout and Recognizing its Handwritten Digits, Table of Contents and Multi-font Numerals from Scanned and Handwritten Text Images Using Different Techniques. International Journal of Computer Science and Information Security, 13(6), 85.
3 Khan, K., Khan, R. U., Alkhalifah, A., & Ahmad, N. (2015, December). Urdu text classification using decision trees. In 2015 12th International Conference on High-capacity Optical Networks and Enabling/Emerging Technologies (HONET) (pp. 1-4). IEEE.
4 Dahi, M., Semary, N. A., & Hadhoud, M. M. (2015, December). Primitive printed Arabic Optical Character Recognition using statistical features. In 2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS) (pp. 567-571). IEEE.
5 Mir, S., Zaman, S., & Anwar, M. W. (2015, December). Printed Urdu Nastalique Script Recognition Using Analytical Approach. In 2015 13th International Conference on Frontiers of Information Technology (FIT) (pp. 334-340). IEEE.
6 Nazir, S., & Javed, A. (2014). Diacritics Recognition Based Urdu Nastalique OCR System. Nucleus, 51(3), 361-367.
7 Naz, S., Hayat, K., Razzak, M. I., Anwar, M. W., Madani, S. A., & Khan, S. U. (2014). The optical character recognition of Urdu-like cursive scripts. Pattern Recognition, 47(3), 1229-1248.
8 Shaffie, A. M., & Elkobrosy, G. A. (2013). A Fast Recognition System for Isolated Printed Characters Using Center of Gravity and Principal Axis. Applied Mathematics, 4(9), 1313.
9 Satti, D. A. (2013). Offline Urdu Nastaliq OCR for printed text using analytical approach (Doctoral dissertation, Master’s thesis, Quaid-i-Azam University Islamabad, Pakistan).
10 Elzobi, M., Al-Hamadi, A., Al Aghbari, Z., & Dings, L. (2013). IESK-ArDB: a database for handwritten Arabic and an optimized topological segmentation approach. International Journal on Document Analysis and Recognition (IJDAR), 16(3), 295-308.
11 KHAN, N., ADNAN, A., & BASAR, S. Geometric Feature Extraction from Urdu Ligatures.
12 Khan, K., Ullah, R., Khan, N. A., & Naveed, K. (2012). Urdu character recognition using principal component analysis. International Journal of Computer Applications, 60(11).
13 Khan, K., Siddique, M., Aamir, M., & Khan, R. (2012). An Efficient Method for Urdu Language Text Search in Image Based Urdu Text. International Journal of Computer Science Issues, 9(2), 523-527.
14 Fareen, N., Khan, M. A., & Durrani, A. (2012). Survey of urdu OCR: an offline approach. In Proceedings of the Conference on Language & Technology (pp. 67-72).
15 Satti, D. A., & Saleem, K. (2012, November). Complexities and implementation challenges in offline urdu Nastaliq OCR. In Proceedings of the Conference on Language & Technology (pp. 85-91).
16 Khan, N. H., Adnan, A., & Basar, S. An Analysis of Off-line and On-line Approaches in Urdu Character Recognition.
17 Khan, M. A., & Durrani, a. urdu naskh typography pattern recognition of type foundries.
18 Rana, N., & Kumar, D. (2011). A Hybrid Recognition & Speech Synthesis System for Handwritten Punjabi Words. International Journal of Advanced Research in Computer Science, 2(2).
19 Nizamani, A. M., & Janjua, N. U. H. (2011). Sindhi OCR using Back propagation Neural Network. International Journal of Computer Science and Security (IJCSS), 1(3), 1.
20 Kumar, D., & Rana, N. (2011). Speech Synthesis System for Online Handwritten Punjabi Word: An Implementation of SVM & Concatenative TTS. International Journal of Computer Applications (0975–8887) Volume.
21 Wahab, A., & ul Haque, S. N. (2010). Optical Character Recognition System for Urdu. Journal of Independent Studies and Research, 8(2).
22 Sardar, S., & Wahab, A. (2010, June). Optical character recognition system for Urdu. In Information and Emerging Technologies (ICIET), 2010 International Conference on (pp. 1-5). IEEE.
23 Rastegar, S., Ghaderi, R., Ardeshipr, G., & Asadi, N. (2009). An intelligent control system using an efficient License Plate Location and Recognition Approach. International Journal of Image Processing (IJIP) Volume (3), (5), 252-264.
1 Google Scholar 
2 ScientificCommons 
3 Academic Index 
4 CiteSeerX 
5 refSeek 
6 iSEEK 
7 Socol@r  
8 ResearchGATE 
9 Bielefeld Academic Search Engine (BASE) 
10 OpenJ-Gate 
11 Scribd 
12 WorldCat 
13 SlideShare 
15 PdfSR 
Afzal, M. and Hussain, S., “Urdu Computing Standards: Urdu Zabta Takhti (UZT) 1.01”, in the Proceedings of International IEEE Multi topic Conference (INMIC), Lahore University of Management Sciences (LUMS), Lahore, Pakistan, 2001.
Alasdari McAndrew, Anne Venables, “A ‘Secondary’ Look at Digital Image Processing”.
Bhurgari, A. M. 2007. Enabling Pakistani Languages through Unicode, published at http://download.microsoft.com/download/1/4/2/142aef9f-1a74-4a24-b1f4- 782d48d41a6d/PakLang.pdf
Bozinovic, R.M.; Srihari, S.N, “Off-line cursive script word recognition”.
Dougherty. E. R. and Lotufo, R. A. [2003]. “Hands-on Morphological Image Processing”, SPIE--The International Society for Optical Engineering, Bellingham, WA.
Ethnologue, Languages of Pakistan, http://www.ethnologue.com/show_country.asp?name=Pakistan
Fast, Bruce B., Allen, Dana R. OCR image preprocessing method for image enhancement of scanned documents.
G. Nagy Rensselaer Polytechnic Institute Troy, New York, “Chinese Character Recognition A Twenty Five Year Retrospective”. Tsuyoshi Kitani t, riguchi and Masami Ilara Yoshio, “Pattern Matching in the Textract Information Extraction System”.
Ganapathy, V., Lean, C.C.H., “Optical Character Recognition Program for Images of Printed Text using a Neural Network”.
Hermilo, Ernesto, Ramon M. “Efficiency of chain codes to represent binary objects”.
Inam Shamsher, Zaheer Ahmad, Jahenzeb Khan Orakzai and Awais Adnan, “OCR For Printed Urdu Script Using Feed Forward Neural Network”.
International Journal of Pattern Recognition and Artificial Intelligence.
Khalid Saeed, “New Approaches for Cursive Languages Recognition: Machine and Hand Written Script and Texts”.
Nabeel Shahzad, Brandon Paulson, Tracy Hammond, “Urdu Qaeda: Recognition System for Isolated Urdu Characters”.
See the Unicode Consortium website at http://unicode.org
Shah, Z.A., “Ligature based optical character recognition of Urdu- Nastaleeq font”.
Soille, P. [2003]. “Morphological Image Analysis: Principles and Applications”, 2nd ed., Springer-Verlag, NY.
T. Sari and M. Sellami, “Cursive Arabic Script Segmentation and Recognition System”.
T.S El-Sheikh and R.M Guindi, “computer Recognition of Arabic Cursive Script,” Pattern Recognition, Vol.21, No, 4, 1988, pp.293-302.
Thresholding, Image Segmentation, Digital Image Processing 2/e Rafael C. Gonzalez, Richard E. Woods.
U. Pal and Anirban Sarkar, “Recognition of Printed Urdu Script”, “Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003)”.
Yong Kui Liua and Borut Žalik, “An efficient chain code with Huffman coding”.
Zaheer Ahmad, Jehanzeb Khan Orakzai, Inam Shamsher, and Awais Adnan. “Urdu Nastaleeq Optical Character Recognition”, “Proceedings of world academy of science, engineering and technology volume 26 december 2007”.
“The Origin of Urdu Language” http://www.essortment.com/all/urdulanguage_rguo.htm
Mr. Tabassam Nawaz
- Pakistan
Mr. Syed Ammar Hassan Shah Naqvi
- Pakistan
Mr. Habib ur Rehman
- Pakistan
Mr. Anoshia Faiz
- Pakistan