Concept Detection in Images Using SVD Features and Multi-Granularity Partitioning and Classification
الموضوعات :Kamran Farajzadeh 1 , Esmail Zarezadeh 2 , Jafar Mansouri 3
1 - Islamic Azad University, North Tehran branch
2 - Amir Kabir University
3 - Ferdowsi university of Mashhad
الکلمات المفتاحية: High-dimensional data , multi-granularity partitioning and classification , multiplicative distance , semantic concept detection , static visual features , SVD,
ملخص المقالة :
New visual and static features, namely, right singular feature vector, left singular feature vector and singular value feature vector are proposed for the semantic concept detection in images. These features are derived by applying singular value decomposition (SVD) "directly" to the "raw" images. In SVD features edge, color and texture information is integrated simultaneously and is sorted based on their importance for the concept detection. Feature extraction is performed in a multi-granularity partitioning manner. In contrast to the existing systems, classification is carried out for each grid partition of each granularity separately. This separates the effect of classifications on partitions with and without the target concept on each other. Since SVD features have high dimensionality, classification is carried out with K-nearest neighbor (K-NN) algorithm that utilizes a new and "stable" distance function, namely, multiplicative distance. Experimental results on PASCAL VOC and TRECVID datasets show the effectiveness of the proposed SVD features and multi-granularity partitioning and classification method
[1] J. Tian, Y. Huang, Z. Guo, X. Qi, Z. Chen, T. Huang, "A multi-modal topic model for image annotation using text analysis," IEEE Signal Process. Lett., vol. 22, no. 7, pp. 886–890, Jul. 2015.#
[2] E. Hassan, S. Chaudhury, and M. Gopal, "Word shape descriptor-based document image indexing: a new DBH-based approach," International Journal on Document Analysis and Recognition (IJDAR), vol. 16, no. 3, pp 227-246, Sep. 2013. #
[3] Moving Picture Expert Group. [Online]. Available: http://www.chiariglione.org/mpeg#
[4] Y.-G. Jiang, J. Yang, C.-W. Ngo, and A. Hauptmann, “Representations of keypoint-based semantic concept detection: A comprehensive study,” IEEE Trans. Multimedia, vol. 12, no. 1, pp. 42–53, Jan. 2010.#
[5] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is nearest neighbors meaningful?” in Proc. Seventh Int’l Conf. Databasse Theory (ICDT ’99), 1999, vol. 1540, pp. 217–235.#
[6] M. Ledoux, The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs, Vol. 89, American Mathematical Society, 2005. #
[7] C.-M. Hsu and M.-S. Chen, “On the design and applicability of distance functions in high-dimensional data space,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 4, pp. 523–536, Apr. 2009.#
[8] R. J. Durrant and A. Kaban, “When is ‘nearest neighbour’ meaningful: A converse theorem and implications,” J. of Complexity, vol. 25, no. 4, pp. 385–397, 2009.#
[9] J. Mansouri and M. Khademi, "Multiplicative Distance: A method to alleviate distance instability for high-dimensional data," Knowledge and Information Systems, vol. 45, no. 3, pp. 783-805, 2015.#
[10] Y, Han, Y. Yang, Y. Yang, Z. Ma, N. Sebe, and X. Zhou, "Semisupervised feature selection via spline regression for video semantic recognition," IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 2, pp. 252–264, Feb. 2014. #
[11] L. Duan, I. W. Tsang, and D. Xu, "Domain transfer multiple kernel learning," IEEE Trans. Pattern Anal. Machine Intell., vol. 34, no. 3, pp. 465–479, Mar. 2012.#
[12] X. Chen, X. Yang, R. Zhang, A. Liu, and S. Zheng, "Edge region color autocorrelogram: A new low-level feature applied in CBIR," in Proc. IEEE Int. Symp. Broadband Multimedia Systems and Broadcasting (BMSB), 2010, pp. 1–4.#
[13] X. Zhang and C. Liu, "Image understanding based on histogram of contrast," Signal, Image and Video Processing, pp. 1–10, Nov. 2014. #
[14] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” Proc. IEEE Conf.Computer Vision and Pattern Recognition, vol. 2, pp. 2169-2178, 2006.#
[15] G. Csurka and F. Perronnin, "Fisher vectors: beyond bag-of-visual-words image representations," Computer Vision, Imaging and Computer Graphics. Theory and Applications, vol. 229, pp. 28-42, 2011.#
[16] R. Liu, Y. Chen, X. Zhu, K. Hou, "Image classification using label constrained sparse coding," Multimed Tools Appl, pp. 1-15, 2015.#
[17] J. Chen, Q, Li, Q. Peng, K.H. Wong, "CSIFT based locality-constrained linear coding for image classification," Pattern Analysis and Applications, vol. 18, no. 2, pp. 441-450, May 2015. #
[18] X.Y. Ou, H.F. Ling, L.Y. Yan and M.L. Liu, "Convolutional neural codes for image retrieval," Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association (APSIPA), pp. 1-10, 2014. #
[19] G. W. Stewart, Matrix Algorithms, Volume II: Eigensystems, Philadelphia: Siam, 2001.#
[20] G. Wang and Q. M. J. Wu, Advances in Pattern Recognition: Guide to Three Dimensional Structure and Motion Factorization, London: Springer, 2011.#
[21] H. Yanai, K. Takeuchi, and Y. Takane, Projection Matrices, Generalized Inverse Matrices, and Singular Value Decomposition, New York: Springer, 2011.#
[22] D. Skillicorn, Understanding Complex Datasets: Data Mining with Matrix Decompositions, Boca Raton: CRC Press, 2007.#
[23] M. Narwaria and W. Lin, "SVD-based quality metric for image and video using machine learning," IEEE Trans. Syst., Man, Cybern. B,
Cybern., vol. 42, no. 2, pp. 347–364, Apr. 2012.#
[24] R. Constantini, L. Sbaiz, and S. Süsstrunk, “Higher order SVD analysis for dynamic texture synthesis,” IEEE Trans. Image Process., vol. 17, no. 1, pp. 42–52, Jan. 2008.#
[25] A.K. Jha, R. Gupta, and D. Saini, "Face recognition: A Fourier transform and SVD based approach," in Proc. 5th Int. Conf. Computational Intelligence and Communication Networks," 2013, pp. 205–209.#
[26] M. Radovanovi'c, A. Nanopoulos, and M. Ivanovi´c, "On the existence of obstinate results in vector space models," in Proc. 33rd Int. ACM SIGIR conference on Research and development in information retrieval, New York, 2010, pp. 186–193. #
[27] J. Hegde, "Time course of visual perception: Coarse-to-fine processing and beyond," Progress in Neurobiology, vol. 84, pp. 405–439, 2008.#
[28] M.D. Menz and R.D. Freeman, "Stereoscopic depth processing in the visual cortex: A coarse-to-fine mechanism," Nat. Neurosci., vol. 6, pp. 59–65, 2003. #
[29] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. (2007). The PASCAL Visual Object Classes Challenge Results [Online]. Available: http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html#
[30] M. Sanderson, “Test collection based evaluation of information retrieval systems,” Foundations Trends Inform. Retrieval, vol. 4, no. 4, pp. 247–375, 2010.#
[31] P. Over, G.M. Awad, W. Keraaij, and A.F. Smeaton, "TRECVID 2007-overview," in TRECVid 2007 - Text REtrieval Conference TRECVid Workshop, Gaithersburg, Maryland, Nov. 2007.#