A Two Step Method for the Recognition of Printed Subwords

Subject Areas : electrical and computer engineering

1 - Tarbiat Modares University
2 - Tarbiat Modares University

Received: 2004-08-15 Accepted : 2005-03-19 Published : 2004-09-21

Keywords: Printed textFarsi subwordclusteringclassificationrecognitioncharacteristic locik-meansand fourier descriptors,

Abstract :

In this paper a two step method for the recognition of printed subwords is proposed. Using characteristic loci features, the set of printed subwords are clustered into 300 clusters by k-means algorithm. Each cluster is represented by its mean. In the first step, each input is classified into 300 categories by minimum Euclidian distance from the cluster centers, and 10 closest clusters are found. In the second step, Fourier descriptors of the subword contour are used to classify the input subword into the members of these 10 clusters. The training set consists of 12700 Farsi subwords in 4 different fonts, Lotus, Mitra, Yagut and Zar, and 3 sizes of 10, 12 and 14. In a test, a set of 500 subwords was used. Considering the first class, top five and top ten classes, 71.4%, 95%, and 98.2% of these subwords were correctly classified. In the post processing, dots of the subword and their positions were used to improve the recognition results. This improved the recognition rate to 92.6%.

References:

[1] S. Mori, C. Y. Suen, and K. Yamamoto, "Histogram review of OCR research and development," in Proc. of IEEE, vol. 80, no. 7, pp. 1029-1058, Jul. 1992.
[2] A. Amin, A. Kaced, J. P. Haton, and R. Mohr, "Handwritten Arabic character recognition by the IRAC system," in Proc. of the Fifth Int. Conf. on Pattern Recognition, pp. 729-731, Miami Beach, FL, US, 1980.
[3] K. Badie and M. Shimura, "Machine recognition of Arabic cursive scripts," in Proc. of Int. Workshop on Pattern Recognition in Practice, pp. 315-323, Amsterdam, Netherlands, 1980.
[4] B. Parhami and M. Taraghi, "Automatic recognition of printed Farsi texts," Pattern Recognition, vol. 14, no. 1-6, pp. 395-403, 1981.
[5] T. K. Ho, J. J. Hull, and S. N. Srihari, "A word shape analysis approach to recognition of degraded word images," in Proc. of the 4th USPS Advanced Technology Conference, pp. 217-231, 1990.
[6] T. K. Ho, J. J. Hull, and S. N. Srihari, "A hypothesis testing approach to word recognition using dynamic feature selection," in Proc. 11th Int. Conf. on Pattern Recognition, pp. 586-589, 1992.
[7] W. Huang, C. Tan, S. Sung, and Y. Xu, "Word shape recognition for image-based document retrieval," in Proc. of Int. Conf. on Image Processing (ICIP01), pp. 1114-1117, 2001.
[8] J. J. Hull and S. N. Srihari, "A computational approach to visual word recognition: hypothesis generation and testing," Computer Vision and Pattern Recognition, IEEE, pp. 156-161, 1986.
[9] J. J. Hull, "Hypothesis testing in a computational theory of visual word recognition," in Proc. of the Sixth National Conf. on Artificial Intelligence (AAAI), pp. 718-722, Washington, 1987.
[10] T. K. Ho, J. J. Hull, and S. N. Srihari, "A computational model for recognition of multifont word images," Machine Vision and Applications, vol. 5, no. 3, pp. 157-168, Summer 1992.
[11] T. K. Ho, J. J. Hull, and S. N. Srihari, "Word recognition with multilevel contextual knowledge," in Proc. of the First Int. Conf. on Document Analysis and Recognition, pp. 905-915, Saint-Malo, France, 1991.
[12] A. L. Spitz, "Shape-based word recognition," Int. Journal of Document Analysis and Recognition, vol. 1, no. 4, pp. 178-190, May 1999.
[13 ] ر. عزمي، بازشناسي متون چاپي فارس ي، رساله دكتري مهندسي برق - الكترونيك، دانشگاه تربيت مدرس،.
1378 [14 ] م. شيرعلي شهرضا و ك . فائز، تشخيص كلمات و ارقام دستنويس فارسي بوسيله شبكه هاي عصبي(خط نسخ)، رساله دكتراي مهندسي برق - كامپيوتر، دانشگاه صنعتي امير كبير، 1374.
[15] E. J., Erlandson, J. M., Trenkle, and R. C. Vogt, "Word-level recognition of multifont Arabic text using a feature-vector matching approach," Proceedings of the SPIE, Document Recognition III, pp.63-71, San Jose, 1996.
[16] M. S. Khorsheed and W. F, Clocksin, "Multi-Font Arabic word recognition using spectral features," in Proc. of ICPR2000, vol. 4, p. 4543, 2000.
[17] M. Dehghan, K. Faez, M. Ahmadi, and M., Shridhar, "Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM," Pattern Recognition, vol. 34, no. 5, pp. 1057-1065, May 2001.
[18] M. Dehghan, K. Faez, M. Ahmadi, and M. Shridhar, "Unconstrained Farsi handwritten word recognition using fuzzy vector quantization and hidden Markov models,” Pattern Recognition Letters, vol. 22, no. 2, pp. 209-214, Feb. 2001.
[19 ] ك. مسروري، شناسايي برون خط كلمات دستنويس فارسي در يك مجموعه محدود، رساله دكتري مهندسي برق- الكترونيك، دانشگاه تربيت مدرس،. تابستان 1379.
[20 ] ا. ابراهيمي و ا. كبير، "خوشه بندي تصاوير زير- كلمات چاپي فارسي با استفاده از ميانگين،" ارسال شده به مجله دانشكده -k ويژگيهاي مكان مشخصه و الگوريتم فني دانشگاه تبريز.
[21 ] ا. ابراهيمي و ا. كبير، "استفاده از يك روش دو مرحله اي براي طبقه بندي زير كلمات چاپي فارسي،" ششمين كنفرانس سيستمهاي هوشمند، دانشگاه شهيد باهنر كرمان،4-5 آذر1383.
[22] R. C. Gonzalez, Digital Image Processing, Addison-Wesley, 1972.

Share To

Article Url

A Two Step Method for the Recognition of Printed Subwords

Rimag

Links

Related Centers

Technical Support

Official pages