Home   >   CSC-OpenAccess Library   >    Manuscript Information
A Novel Approach for Bilingual (English - Oriya) Script Identification and Recognition in a Printed Document
Sanghamitra Mohanty, Himadri Nandini Das Bebartta
Pages - 175 - 191     |    Revised - 30-4-2010     |    Published - 10-06-2010
Volume - 4   Issue - 2    |    Publication Date - May 2010  Table of Contents
Script separation, Indian script, Bilingual (English-Oriya) OCR, Horizontal profiles
In most of our official papers, school text books, it is observed that English words interspersed within the Indian languages. So there is need for an Optical Character Recognition (OCR) system which can recognize these bilingual documents and store it for future use. In this paper we present an OCR system developed for the recognition of Indian language i.e. Oriya and Roman scripts for printed documents. For such purpose, it is necessary to separate different scripts before feeding them to their individual OCR system. Firstly, we need to correct the skew followed by segmentation. Here we propose the script differentiation line-wise. We emphasize on Upper and lower matras associated with Oriya and absent in English. We have used horizontal histogram for line distinction belonging to different script. After separation different scripts are sent to their individual recognition engines.
CITED BY (14)  
1 Nayak, M., & Nayak, A. K. (2015). Odia Running Text Recognition Using Moment-Based Feature Extraction and Mean Distance Classification Technique. In Intelligent Computing, Communication and Devices (pp. 497-506). Springer India.
2 Singh, P. K., Sarkar, R., & Nasipuri, M. (2015). Offline Script Identification from multilingual Indic-script documents: A state-of-the-art. Computer Science Review, 15, 1-28.
3 Bandyopadhyay, S. (2014). Maximum Common Sub-graph Based Approach For Handwritten Oriya Digits (Doctoral dissertation, Jadavpur University Kolkata).
4 Bhattacharjee, D., Tripathi, D., Debnath, R., Hanumante, V., & Roy, S. A Novel Approach for Character Recognition. International Journal of Engineering Trends and Technology (IJETT)–Volume, 10.
5 Anand, R., Khanna, R., Student, N. C., & Israna, P. (2013). International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) www. iasir. net. CYBERNETICS: SYSTEMS, 43(4).
6 Abel, K. (2013). Benefits of shifting freight delivery to night time, considering routing and environmental effects for addis ababa city (doctoral dissertation, aau).
7 Bhattacharjee, D., Tripathi, D., Debnath, R., Hanumante, V., Roy, S., & Roy, S. Nav view search.
8 ABEBAYEHU, S. (2012). Amharic-English Script Identification in Real-Life Document Images (Doctoral dissertation, aau).
9 SAMUEL, A. (2012). school of graduate studies school of information science (Doctoral dissertation, Addis Ababa University).
10 Sarangi, P. K., Sahoo, A. K., & Ahmed, P. (2012). Recognition of Isolated Handwritten Oriya Numerals using Hopfield Neural Network. International Journal of Computer Applications, 40(8), 36-42.
11 Senapati, D., Rout, S., & Nayak, M. (2012, July). A novel approach to text line and word segmentation on odia printed documents. In Computing Communication & Networking Technologies (ICCCNT), 2012 Third International Conference on (pp. 1-6). IEEE.
12 Mohanty, S., Himadri, N., & Bebartta, I. D. (2011). A Comparative Analysis of Classifiers Accuracies for Bilingual Printed Documents (Oriya-English). International Journal of Computer Science and Information Technologies, 2(2), 18.
13 Senapati, D., Rout, S., Padhi, D., & Mishra, S. (2011). Text Line Segmentation on Odiya Printed Documents. International Journal of Advanced Research in Computer Science, 2(6).
14 Patil, S. B. (2011). Neural Network based Bilingual OCR System: Experiment with English and Kannada Bilingual Documents. International Journal of Computer Applications, 13(8), 6-14.
1 Google Scholar 
2 refSeek 
3 iSEEK 
4 Socol@r  
5 Bielefeld Academic Search Engine (BASE) 
6 Scribd 
7 WorldCat 
8 SlideShare 
10 PdfSR 
A. L. Spitz. “Determination of the Script and Language Content of Document Images”. IEEE Trans. on PAMI, 235-245, 1997
A. R. Khan, D. Muhammad, “A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in Conjunction with the Neural Network”. Computers & Security, 2(3):29- 35, 2009
B. Kijsirikul and N. Ussivakul. “Multiclass support vector machines using adaptive directed acyclic Graph”. In Proceedings of International Joint Conference on Neural Networks (IJCNN 2002), 980–985, 2002
B. V. Dasarathy. “Nearest Neighbor Pattern Classification Techniques”. IEEE Computer Society Press,New York, 1991
D. Hhanya, A. G. Ramakrishna, and P. B. Pati. “ Script Identification in Printed Bilingual Documents”. Sadhana, 27(1): 73-82, 2002
D. Suganthi, Dr. S. Purushothaman, “fMRI Segmentation Using Echo State Neural Network”. Computers & Security, 2(1):1-9, 2009
F. Takahashi and S. Abe. “Optimizing Directed Acyclic Graph Support vector Machines”. ANNPR , Florence (Italy), September 2003
J. C. Platt, N. Cristianini, and J. Shawe-Taylor. “Large margin DAGs for multiclass classification”. In S. A. Solla, T. K. Leen, and K.-R. M¨uller, editors, Advances in Neural Information Processing Systems12, pages 547–553. The MIT Press, Cambridge, MA, 2000
J. Ding, L. Lam,and C. Y. Suen. “Classification of Oriental and European Scripts by using Characteristic Features”. In Proceedings of 4th ICDAR, pp. 1023-1027, 1997
J. Hochberg, P. Kelly, T. Thomas, and L. Kerns. “Automatic script Identification from Document Images using Cluster-Based Templates” IEEE Trans. on PAMI, 176-181, 1997
J. Weston and C. Watkins. Support vector machinesfor multi-class pattern recognition. In Proceedings of the Seventh European Symposium on Artificial Neural Networks (ESANN’99), pages 219–224, 1999
K. P. Bennett. Combining support vector and mathematical programming methods for classification. In B. Sch¨olkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods: Support Vector Learning, pages 307–326. The MIT Press, Cambridge, MA, 1999
R. K. Sharma, Dr. A. Singh, “Segmentation of Handwritten Text in Gurmukhi Script”. Computers & Security, 2(3):12-17, 2009
S. Abe and T. Inoue. “Fuzzy support vector machines for multiclass problems”. In Proceedings of the Tenth European Symposium on Artificial Neural Networks (ESANN”2002), 116–118, Bruges, Belgium, 2002
S. Abe. “Analysis of multiclass support vector machines”. In Proceedings of International Conference on Computational Intelligence for Modelling Control and Automation (CIMCA’2003), Vienna, Austria, 2003
S. Chanda, U. Pal, “English, Devnagari and Urdu Text Identification”. Proc. International Conference on Cognition and Recognition, 538-545, 2005
S. Mohanty, and H. K. Behera.” A complete OCR Development System for Oriya Script”. Proceedings of SIMPLE’ 04, IIT Kharagpur, 2004
S. Mohanty, H. N. Das Bebartta, and T.K . Behera. “An Efficient Blingual Optical Character Recognition (English-Oriya) System for Printed Documents”. Seventh International Conference on Advances in Pattern Recognition, ICAPR. 398-401, 2009
S. Wood, X. Yao, and K. Krishnamurthi, , L. Dang. “Language Identification for Printed Text Independent of Segmentation”. In Proc. Int’l Conf. on Image Processing. 428-431, 1995
T. N. Tan. “Rotation Invariant Texture Features and their use in Automatic Script Identification”. IEEE Trans. On PAMI, 751-756, 1998
U. H.-G. Kreßel. “Pair wise classification and support vector machines”. In B. Sch¨olkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods: Support Vector Learning, pages 255– 268. The MIT Press, Cambridge, MA, 1999
U. Pal, and B. B Chaudhuri,. “Script Line Separation from Indian Multi-Script Documents”. IETE Journal of Research, 49, 3-11, 2003
U. Pal, S. Sinha, and B. B. Chaudhuri. “Multi-Script Line identification from Indian Documents”. In Proceedings 7th ICDAR, 880--884, 2003
V. N. Vapnik. “Statistical Learning Theory”. John Wiley & Sons, New York, 1998.
V. N. Vapnik. “The Nature of Statistical LearningTheory”. Springer-Verlag, London, UK, 1995.
Mr. Sanghamitra Mohanty
- India
Himadri Nandini Das Bebartta