Using Prominent Regions in Search Space Reduction for Recognition of Printed Farsi Subwords
Subject Areas : electrical and computer engineering
1 - Tarbiat Modares University
2 - Tarbiat Modares University
Keywords: Verification shape descriptor printed subwords word shape recognition prominent regions,
Abstract :
In the most common Lexicon reduction methods, lexicon words are clustered based on their holistic shape features and then each query word image is classified into the closest cluster. As the errors at this stage propagate to the subsequent stages, relevant clusters should be selected with a high degree of accuracy. In this paper we present a novel verification method which decides on the validity of the recognized clusters based on a proposed confidence measure. The level of confidence to the selected clusters is measured using local shape features in the verification phase, where it is determined that the selected cluster is acceptable or not. For this purpose, some local shape features of the input subword image are compared to the “prominent regions” of the corresponding cluster. The prominent regions of a cluster are some local regions that discriminate the members of that cluster compared to the other clusters. The proposed verification method along with some predefined rules is used to reduce the lexicon size of Farsi subwords. The experiments conducted on a set of 6895 common Farsi subwords show that our proposed method significantly reduces the search space while preserving the accuracy in an acceptable rate.
[1] T. Adamek, N. E. Connor, and A. F. Smeaton, "Word matching using single closed contours for indexing handwritten historical documents," Int. J. of Document Analysis and Recognition, vol. 9, no. 2-4, pp. 153-165, 2007.
[2] J. R. Pinales, R. J. Rivas, and M. J. C. Bleda, "Holistic cursive word recognition based on perceptual features," Pattern Recognition Letters, vol. 28, no. 13, pp. 1600-1609, 1 Oct. 2007.
[3] A. Amin, "Recognition of printed arabic text based on global features and decision tree learning techniques," Pattern Recognition, vol. 33, no. 8, pp. 1309-1323, 2000.
[4] A. Ebrahimi and E. Kabir, "A pictorial dictionary for printed farsi sub - words," Pattern Recognition Letters, vol. 29, no. 5, pp. 656-663, 2008.
[5] K. Zagoris, K. Ergina, and N. Papamarkos, "A document image retrieval system," Engineering Application of Artificial Intelligence, vol. 23, no. 6, pp. 872-879, 2010.
[6] S. Bai, L. Li, and C. L. Tan, "Keyword spotting in document images through word shape coding," in Proc. 10th Int. Conf. on Document Analysis and Recognition, ICDAR'09, pp. 331-335, 26-29 Jul. 2009.
[7] L. Li, S. Lu, and C. L. Tan, "A fast keyword-spotting technique," in Proc. 9th Int. Conf. on Document Analysis and Recognition, ICDAR'07, pp.68-72, 23-26 Sep. 2007.
[8] S. Lu and C. L. Tan, "Document image retrieval through word shape coding," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1913-1918, Nov. 2008.
[9] J. A. Rodriguez-Serrano and F. Perronnin, "Handwritten word - spotting using hidden markov models and vocabularies," Pattern Recognition, vol. 42, no. 9, pp. 2106-2116, Sep. 2009.
[10] T. M. Rath and R. Manmatha, "Word spotting for historical documents," Int. J. on Document Analysis and Recognition, vol. 9, no. 2-4, pp. 139-152, Apr. 2007.
[11] Y. Lu and C. L. Tan, "Information retrieval in document image databases," IEEE Trans. on Knowledge and Data Engineering, vol. 16, no. 11, pp. 1398-1410, Nov. 2004.
[12] ا. ابراهیمی، استفاده از شكل كلي زير- كلمات چاپي در بازيابي تصوير مستندات و بازشناسي متون فارسي، رساله دکتری مهندسی برق- الکترونیک، دانشگاه تربیت مدرس، تهران، 1384.
[13] ح. خسروی و ا. کبیر، "ارزیابی روشهای بازشناسی متون فارسی بر مبنای شکل کلی زیر- کلمات،" نشریه مهندسی برق و کامپیوتر ایران، جلد 7، شماره 4، صص. 280-267، زمستان 1388.
[14] S. Madhvanath, G. Kim, and V. Govindaraju, "Chain code contour processing for handwritten word recognition," IEEE Trans. on Pattern Recognition and Machine Intelligence, vol. 21, no. 9, pp. 928-932, Sep. 1999.
[15] ا. ابراهیمی و ا. کبیر، "یک روش دومرحلهای برای بازشناسی زیر- کلمات چاپی،" نشریه مهندسی برق و کامپیوتر ایران، جلد 2، شماره 2، صص. 62-57، ﭘﺎﻳﻴﺰ و زﻣﺴﺘﺎن 1383.
[16] S. G. Madhvanath and V. Govindaraju, "The role of holistic paradigms in handwritten word recognition," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 149-164, Feb. 2001.
[17] A. Rehman and T. Saba, "Off - line cursive script recognition: current advances, comparisons and remaining problems," Artificial Intelligence Review, vol. 37, no. 4, pp. 261-288, 2012.
[18] L. M. Lorigo and V. Govindaraju, "Off - line arabic handwriting recognition: a survey," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 28, no. 5, pp. 712-724, May 2008.
[19] م. ش. شهرضا، تشخیص کلمات و ارقام دستنویس فارسی به وسیله شبکههای عصبی (خط نسخ)، رساله دکتری مهندسی برق و کامپیوتر، دانشگاه امیرکبیر، تهران، 1374.
[20] ر. عزمی، بازشناسی متون چاپی فارسی، رساله دکتری مهندسی برق- الکترونیک، دانشگاه تربیت مدرس، تهران، 1378.
[21] M. Dehghan, K. Faez, M. Ahmadi, and M. Shridhar, "Handwritten farsi (arabic) word recognition: a holistic approach using discrete HMM," Pattern Recognition, vol. 34, no. 5, pp. 1057-1065, 2001.
[22] M. H. Shirali - Shahreza, K. Faez, and A. Khotanzad, "Recognition of handwritten farsi numerals by zernike moments features and a set of class - specific neural network classifiers," in Proc. on Int. Conf. of Signal Processing Applications and Technology, pp. 998-1003, 18-20 Oct. 1994.
[23] P. Shilane and T. Funkhouser, "Distinctive regions of 3D surfaces," ACM Trans. on Graphics, vol. 26, no. 2, Article 7, Jun. 2007.
[24] C. Harris and M. Stephens, "A combined corner and edge detector," in Proc. of 4th Alvey Vision Conf., pp. 147-151, 1988.
[25] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 886-893, 2005.
[26] ﻫ. داودی و ا. کبیر، "تعیین بخشهای مهم در شکل زیر- کلمات چاپی،" بیستمین کنفرانس مهندسی برق ایران، 2447-2442 صص.، تهران، 28-26 ارديبهشت 1391.