Human Activity Recognition based on Deep Belief Network Classifier and Combination of Local and Global Features
محورهای موضوعی : Image Processing
1 - Islamic Azad University Shiraz
کلید واژه: BoW, DBN, GIST, HOG, Human Activity Recognition, SIFT,
چکیده مقاله :
During the past decades, recognition of human activities has attracted the attention of numerous researches due to its outstanding applications including smart houses, health-care and monitoring the private and public places. Applying to the video frames, this paper proposes a hybrid method which combines the features extracted from the images using the ‘scale-invariant features transform’ (SIFT), ‘histogram of oriented gradient’ (HOG) and ‘global invariant features transform’ (GIST) descriptors and classifies the activities by means of the deep belief network (DBN). First, in order to avoid ineffective features, a pre-processing course is performed on any image in the dataset. Then, the mentioned descriptors extract several features from the image. Due to the problems of working with a large number of features, a small and distinguishing feature set is produced using the bag of words (BoW) technique. Finally, these reduced features are given to a deep belief network in order to recognize the human activities. Comparing the simulation results of the proposed approach with some other existing methods applied to the standard PASCAL VOC Challenge 2010 database with nine different activities demonstrates an improvement in the accuracy, precision and recall measures (reaching 96.39%, 85.77% and 86.72% respectively) for the approach of this work with respect to the other compared ones in the human activity recognition.
During the past decades, recognition of human activities has attracted the attention of numerous researches due to its outstanding applications including smart houses, health-care and monitoring the private and public places. Applying to the video frames, this paper proposes a hybrid method which combines the features extracted from the images using the ‘scale-invariant features transform’ (SIFT), ‘histogram of oriented gradient’ (HOG) and ‘global invariant features transform’ (GIST) descriptors and classifies the activities by means of the deep belief network (DBN). First, in order to avoid ineffective features, a pre-processing course is performed on any image in the dataset. Then, the mentioned descriptors extract several features from the image. Due to the problems of working with a large number of features, a small and distinguishing feature set is produced using the bag of words (BoW) technique. Finally, these reduced features are given to a deep belief network in order to recognize the human activities. Comparing the simulation results of the proposed approach with some other existing methods applied to the standard PASCAL VOC Challenge 2010 database with nine different activities demonstrates an improvement in the accuracy, precision and recall measures (reaching 96.39%, 85.77% and 86.72% respectively) for the approach of this work with respect to the other compared ones in the human activity recognition.
[1] S. Ranasinghe, F. Al Machot, and H.C. Mayr, "A review on applications of activity recognition systems with regard to performance and evaluation," International Journal of Distributed Sensor Networks, vol. 12, no. 8, p. 1550147716665520, 2016.#
[2] S.S. Agaian, J. Tang, and J. Tan, "Electronic imaging applications in mobile healthcare," 2019.#
[3] Y. Wang, H. Jiang, M.S. Drew, Z.N. Li, and G. Mori, "Unsupervised discovery of action classes," in Proceedings of CVPR, pp. 17-22.#
[4] S. Yan, J.S. Smith, W. Lu, and B. Zhang, "Multibranch Attention Networks for Action Recognition in Still Images," IEEE Transactions on Cognitive and Developmental Systems, vol. 10, no. 4, pp. 1116-1125, 2017.#
[5] Y. Wang, Y. Li, X. Ji, "Human action recognition based on global gist feature and local patch coding," International Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 8, no. 2, pp. 235-246, 2015.#
[6] E. Park, X. Han, T.L. Berg, and A.C. Berg, "Combining multiple sources of knowledge in deep cnns for action recognition," in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1-8, 2016.#
[7] H.A. Qazi, U. Jahangir, B.M. Yousuf, and A. Noor, "Human action recognition using SIFT and HOG method," in 2017 International Conference on Information and Communication Technologies (ICICT), pp. 6-10, 2017.#
[8] H.F. Nweke, Y.W. Teh, G. Mujtaba, and M. Al-Garadi, "Data fusion and multiple classifier systems for human activity detection and health monitoring: Review and open research directions," Information Fusion, vol. 46, pp. 147-170, 2019.#
[9] N. Ikizler, R.G. Cinbis, S. Pehlivan, and P. Duygulu, "Recognizing actions from still images," in 2008 19th International Conference on Pattern Recognition, pp. 1-4, 2008.#
[10] L.J. Li, and L. Fei-Fei, "What, where and who? classifying events by scene and object recognition," In 2007 IEEE 11th international conference on computer vision, pp. 1-8, 2007.#
[11] C. Thurau and V. Hlavác, "Pose primitive based human action recognition in videos or still images," in 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008.#
[12] P. Li, J. Ma, and S. Gao, "Actions in still web images: visualization, detection and retrieval," in International Conference on Web-Age Information Management, pp. 302-313, 2011.#
[13] N. Shapovalova, W. Gong, M. Pedersoli, F.X. Roca, and J. Gonzalez, "On importance of interactions and context in human action recognition," in Iberian conference on pattern recognition and image analysis, pp. 58-66, 2011.#
[14] V. Delaitre, J. Sivic, and I. Laptev, "Learning person-object interactions for action recognition in still images," in Advances in neural information processing system, pp. 1503-1511, 2011.#
[15] Y. Zheng, Y.J. Zhang, X. Li, and B.D. Liu, "Action recognition in still images using a combination of human pose and context information," in 2012 19th IEEE International Conference on Image Processing, pp. 785-788, 2012.#
[16] F. Sener, C. Bas, and N. Ikizler-Cinbis, "On recognizing actions in still images via multiple features," in European Conference on Computer Vision, 2012, pp. 263-272.#
[17] G. Sharma, F. Jurie, and C. Schmid, "Discriminative spatial saliency for image classification," in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3506-3513, 2012.#
[18] S. Maji, L. Bourdev, and J. Malik, "Action recognition from a distributed representation of pose and appearance," in CVPR 2011, pp. 3177-3184, 2011.#
[19] B. Yao, X. Jiang, A. Khosla, A.L. Lin, L. Guibas, and L. Fei-Fei, "Human action recognition by learning bases of action attributes and parts," in 2011 International Conference on Computer Vision, pp. 1331-1338, 2011.#
[20] A. Prest, C. Schmid, and V. Ferrari, "Weakly supervised learning of interactions between humans and objects," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 3, pp. 601-614, 2011.#
[21] F.S. Khan, R.M. Anwer, J. Van De Weijer, A.D. Bagdanov, and M. Felsberg, "Coloring action recognition in still images," International journal of computer vision, vol. 105, no. 3, pp. 205-221, 2013.#
[22] F.S. Khan, J. Van De Weijer, R.M. Anwer, M. Felsberg, and C. Gatta, "Semantic pyramids for gender and action recognition," IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3633-3645, 2014.#
[23] F.S. Khan, J. Van De Weijer, R.M. Anwer, A.D. Bagdanov, M. Felsberg, and J. Laaksonen, "Scale coding bag of deep features for human attribute and action recognition," Machine Vision and Applications, vol. 29, no. 1, pp. 55-71, 2018.#
[24] T. Watanabe, S. Ito, and K. Yokoi, "Co-occurrence histograms of oriented gradients for pedestrian detection," in Pacific-Rim Symposium on Image and Video Technology, pp. 37-47, 2009.#
[25] A. Oliva and A. Torralba, "Modeling the shape of the scene: A holistic representation of the spatial envelope," International journal of computer vision, vol. 42, no. 3, pp. 145-175, 2001.#
[26] A. Oliva and A. Torralba, "Building the gist of a scene: The role of global image features in recognition," Progress in brain research, vol. 155, pp. 23-36, 2006.#
[27] G. Lowe, "SIFT-The Scale Invariant Feature Transform," Int. J, vol. 2, pp. 91-110, 2004.#
[28] D.G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, vol. 60, pp. 91-110, 2004.#
[29] J. Sivic and A. Zisserman, "Video Google: A text retrieval approach to object matching in videos," in null, p. 1470, 2003.#
[30] L. Fei-Fei and P. Perona, "A bayesian hierarchical model for learning natural scene categories," in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp. 524-531, 2005.#
[31] M.A. Carreira-Perpinan and G.E. Hinton, "On contrastive divergence learning," in Aistats, pp. 33-40, 2005.#
[32] G.E. Hinton, "Training products of experts by minimizing contrastive divergence," Neural computation, vol. 14, no.8, pp. 1771-1800, 2002.#
[33] N. Le Roux, and Y. Bengio, "Deep belief networks are compact universal approximators," Neural computation, vol. 22, no. 8, pp. 2192-2207, 2010.#
[34] R. Salakhutdinov and G. Hinton, "Deep boltzmann machines," in Artificial Intelligence and Statistics, pp. 448-455, 2009.#
[35] R. Hecht-Nielsen, "Theory of the backpropagation neural network," in Neural Networks for Perception, ed: Elsevier, pp. 65-93, 1992.#
[36] I. Sutskever and G.E. Hinton, "Deep, narrow sigmoid belief networks are universal approximators," Neural computation, vol. 20, no. 11, pp. 2629-2636, 2008.#
[37] M. Everingham, L. Van Gool, C.K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (voc) challenge," International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010.#