Recognition of Facial and Vocal Emotional Expressions by SOAR Model
الموضوعات :Matin Ramzani Shahrestani 1 , Sara Motamed 2 , Mohammadreza Yamaghani 3
1 - Department of Computer Engineering, Rasht Branch, Islamic Azad University, Rasht, Iran
2 - Department of Computer, Fouman & Shaft Branch, Islamic Azad University, Fouman, Iran
3 - Department of Computer, Lahijan Branch, Islamic Azad University, Lahijan, Iran
الکلمات المفتاحية: Emotion Recognition, Facial and Vocal Emotional Expressions, Cognitive Model, Soar Model.,
ملخص المقالة :
Todays, facial and vocal emotional expression recognition is considered one of the most important ways of human communication and responding to the ambient and the attractive fields of machine vision. This application can be used in different cases, including emotion analysis. This article uses six basic emotional expressions (anger, disgust, fear, happiness, sadness, and surprise), and its main goal is to present a new method in cognitive science, based on the functioning of the human brain system. The stages of the proposed model include four main parts: pre-processing, feature extraction, feature selection, and classification. In the pre-processing stage, facial images and verbal signals are extracted from videos taken from the enterface’05 dataset, noise removal and resizing is performed on them. In the feature extraction stage, PCA is applied to the images, and the 3D-CNN network is used to find the best features of the images. Moreover, MFCC is applied to emotional verbal signals, and the CNN Network will also be applied to find the best features. Then, fusion is performed on the resulted features and finally Soar classification will be applied to the fused features, to calculate the recognition rate of emotional expression based on face and speech. This model will be compared with competing models in order to examine the performance of the proposed model. The highest rate of recognition based on audio-image was related to the emotional expression of disgust with a rate of 88.1%, and the lowest rate of recognition was related to fear with a rate of 73.8%.
[1] Senthilkumar, N., S. Karpakam, M. Gayathri Devi, R. Balakumaresan, and P. Dhilipkumar. "Speech emotion recognition based on Bi-directional LSTM architecture and deep belief networks." Materials Today: Proceedings 57 (2022): 2180-2184.
[2] Crisp, Nicholas H., Peter CE Roberts, Sabrina Livadiotti, A. Macario Rojas, Vitor Toshiyuki Abrao Oiko, Steve Edmondson, S. J. Haigh et al. "In-orbit aerodynamic coefficient measurements using SOAR (Satellite for Orbital Aerodynamics Research)." Acta Astronautica 180 (2021): 85-99.
[3] Vallejo, Carlos, Jun Ho Jang, Carlo Finelli, Efreen Montaño Figueroa, Lalita Norasetthada, Rodrigo T. Calado, Mehmet Turgut et al. "Efficacy and Safety of Eltrombopag Combined with Cyclosporine As First-Line Therapy in Adults with Severe Acquired Aplastic Anemia: Results of the Interventional Phase 2 Single-Arm Soar Study." Blood 138 (2021): 2174.
[4] Whittaker, Jackie L., Linda K. Truong, Trish Silvester-Lee, Justin M. Losciale, Maxi Miciak, Andrea Pajkic, Christina Y. Le et al. "Feasibility of the SOAR (stop OsteoARthritis) program." Osteoarthritis and Cartilage Open 4, no. 1 (2022): 100239.
[5] Laird, John Edwin, Keegan R. Kinkade, Shiwali Mohan, and Joseph Z. Xu. "Cognitive robotics using the soar cognitive architecture." In Workshops at the twenty-sixth AAAI conference on artificial intelligence. 2012.
[6] Stavros, J., and G. Saint. "SOAR: Chapter 18: Linking strategy and OD to sustainable performance." WJ Rothwell, JM Stavros, R. Sullivan, and A. Sullivan, Practicing organization development: A guide for leading change. San Francisco, CA: Jossey-Bass (2010).
[7] Lucey, Simon, Ahmed Bilal Ashraf, and Jeffrey F. Cohn. Investigating spontaneous facial action recognition through aam representations of the face. Vol. 2. INTECH Open Access Publisher, 2007.
[8] Chang, Ya, Changbo Hu, Rogerio Feris, and Matthew Turk. "Manifold based analysis of facial expression." Image and Vision Computing 24, no. 6 (2006): 605-614.
[9] Pantic, Maja, and Ioannis Patras. "Dynamics of facial expression: Recognition of facial actions and their temporal segments from face profile image sequences." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 36, no. 2 (2006): 433-449.
[10] Guo, Guodong, and Charles R. Dyer. "Learning from examples in the small sample case: face expression recognition." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 35, no. 3 (2005): 477-488.
[11] Anderson, Keith, and Peter W. McOwan. "A real-time automated system for the recognition of human facial expressions." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 36, no. 1 (2006): 96-105.
[12] Whitehill, Jacob, and Christian W. Omlin. "Haar features for FACS AU recognition." In 7th international conference on automatic face and gesture recognition (FGR06), pp. 5-pp. IEEE, 2006.
[13] Pantic, Maja, and Ioannis Patras. "Dynamics of facial expression: Recognition of facial actions and their temporal segments from face profile image sequences." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 36, no. 2 (2006): 433-449.
[14] Zhao, Guoying, and Matti Pietikainen. "Dynamic texture recognition using local binary patterns with an application to facial expressions." IEEE transactions on pattern analysis and machine intelligence 29, no. 6 (2007): 915-928.
[15] Dhall, Abhinav, Akshay Asthana, Roland Goecke, and Tom Gedeon. "Emotion recognition using PHOG and LPQ features." In 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 878-883. IEEE, 2011.
[16] Black, Michael J., and Yaser Yacoob. "Recognizing facial expressions in image sequences using local parameterized models of image motion." International Journal of Computer Vision 25, no. 1 (1997): 23-48.
[17] Soleymani, Mohammad, David Garcia, Brendan Jou, Björn Schuller, Shih-Fu Chang, and Maja Pantic. "A survey of multimodal sentiment analysis." Image and Vision Computing 65 (2017): 3-14.
[18] Rosenbloom, Paul S., John E. Laird, Allen Newell, and Robert McCarl. "A preliminary analysis of the Soar architecture as a basis for general intelligence." Artificial Intelligence 47, no. 1-3 (1991): 289-325.
[19] Livanos, Nicole. "Mobility for Healthcare Professional Workforce Continues to Soar." Journal of Nursing Regulation 10, no. 4 (2020): 54-56.
[20] Ngai, Wang Kay, Haoran Xie, Di Zou, and Kee-Lee Chou. "Emotion recognition based on convolutional neural networks and heterogeneous bio-signal data sources." Information Fusion 77 (2022): 107-117.
[21] Lu, Cheng, Yuan Zong, Wenming Zheng, Yang Li, Chuangao Tang, and Björn W. Schuller. "Domain invariant feature learning for speaker-independent speech emotion recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022): 2217-2230.
[22] Cecchi, Ariel S. "Cognitive penetration of early vision in face perception." Consciousness and Cognition 63 (2018): 254-266. [23] Torfi, Amirsina, Seyed Mehdi Iranmanesh, Nasser Nasrabadi, and Jeremy Dawson. "3d convolutional neural networks for cross audio-visual matching recognition." IEEE Access 5 (2017): 22081-22091.
[24] Wu, Xun, Wei-Long Zheng, Ziyi Li, and Bao-Liang Lu. "Investigating EEG-based functional connectivity patterns for multimodal emotion recognition." Journal of neural engineering 19, no. 1 (2022): 016012.
[25] Jin, Qin, and Junwei Liang. "Video description generation using audio and visual cues." In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp. 239-242. 2016.
[26] Zhang, Shiqing, Shiliang Zhang, Tiejun Huang, Wen Gao, and Qi Tian. "Learning affective features with a hybrid deep model for audio–visual emotion recognition." IEEE Transactions on Circuits and Systems for Video Technology 28, no. 10 (2017): 3030-3043.
[27] Neverova, Natalia, Christian Wolf, Graham Taylor, and Florian Nebout. "Moddrop: adaptive multi-modal gesture recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence 38, no. 8 (2015): 1692-1706.
[28] Koromilas, Panagiotis, and Theodoros Giannakopoulos. "Deep multimodal emotion recognition on human speech: A review." Applied Sciences 11, no. 17 (2021): 7962.
[29] Pantic, Maja, and Leon JM Rothkrantz. "Facial action recognition for facial expression analysis from static face images." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34, no. 3 (2004): 1449-1461.
[30] Martin, Olivier, Irene Kotsia, Benoit Macq, and Ioannis Pitas. "The eNTERFACE'05 audio-visual emotion database." In 22nd International Conference on Data Engineering Workshops (ICDEW'06), pp. 8-8. IEEE, 2006.
[31] Farhoudi, Zeinab, and Saeed Setayeshi. "Fusion of deep learning features with mixture of brain emotional learning for audio-visual emotion recognition." Speech Communication 127 (2021): 92-103.
[32] Bloch, Isabelle. "Information combination operators for data fusion: A comparative review with classification." IEEE Transactions on systems, man, and cybernetics-Part A: systems and humans 26, no. 1 (1996): 52-67.
[33] Zhang, Yongmian, and Qiang Ji. "Active and dynamic information fusion for facial expression understanding from image sequences." IEEE Transactions on pattern analysis and machine intelligence 27, no. 5 (2005): 699-714.
[34] James, Alex Pappachen, and Belur V. Dasarathy. "Medical image fusion: A survey of the state of the art." Information fusion 19 (2014): 4-19.
[35] Chen, JunKai, Zenghai Chen, Zheru Chi, and Hong Fu. "Emotion recognition in the wild with feature fusion and multiple kernel learning." In Proceedings of the 16th International Conference on Multimodal Interaction, pp. 508-513. 2014.
[36] Teissier, Pascal, Jordi Robert-Ribes, J-L. Schwartz, and Anne Guérin-Dugué. "Comparing models for audiovisual fusion in a noisy-vowel recognition task." IEEE Transactions on Speech and Audio Processing 7, no. 6 (1999): 629-642.
[37] Kumari, Jyoti, Reghunadhan Rajesh, and K. M. Pooja. "Facial expression recognition: A survey." Procedia computer science 58 (2015): 486-491.
[38] Ekman, Paul, and Wallace V. Friesen. "Facial action coding system." Environmental Psychology & Nonverbal Behavior (1978).
[39] Nazari, Elham, Rizwana Biviji, Amir Hossein Farzin, Parnian Asgari, and Hamed Tabesh. "Advantages and challenges of information fusion technique for big data analysis: proposed framework." Journal of Biostatistics and Epidemiology 7, no. 2 (2021): 189-216.
[40] Su, Yangfeng, Jian Wang, Xuan Zeng, Zhaojun Bai, Charles Chiang, and Dian Zhou. "SAPOR: Second-order Arnoldi method for passive order reduction of RCS circuits." In IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004., pp. 74-79. IEEE, 2004.
[41] Sadeghi, Hamid, Abolghasem-Asadollah Raie, and Mohammad-Reza Mohammadi. "Facial expression recognition using texture description of displacement image." Journal of Information Systems and Telecommunication (JIST) 2, no. 4 (2014): 205-212.
[42] Nikpoor, Mohsen, Mohammad Reza Karami-Mollaei, and Reza Ghaderi. "A new Sparse Coding Approach for Human Face and Action Recognition." Journal of Information Systems and Telecommunication (JIST) 1, no. 17 (2017): 1.
[43] Navraan, Mina, Nasrollah Moghadam Charkari, and Muharram Mansoorizadeh. "Automatic Facial Emotion Recognition Method Based on Eye Region Changes." Journal of Information Systems and Telecommunication (JIST) 4, no. 4 (2016): 221-231.
[44] Motamed, Sara, Saeed Setayeshi, Azam Rabiee, and Arash Sharifi. "Speech emotion recognition based on fusion method." Journal of Information Systems and Telecommunication 5 (2017): 50-56.