Comparing the Semantic Segmentation of High-Resolution Images Using Deep Convolutional Networks: SegNet, HRNet, CSE-HRNet and RCA-FCN
محورهای موضوعی : Machine learningNafiseh Sadeghi 1 , Homayoun Mahdavi-Nasab 2 , Mansoor Zeinali 3 , Hossein Pourghasem 4
1 - Department of Electrical Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran
2 - Department of Electrical Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran
3 - Department of Electrical Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran
4 - Department of Electrical Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran
کلید واژه: Semantic Segmentation, Convolutional Neural Network, Deep Neural Network, High-Resolution Image Processing.,
چکیده مقاله :
Semantic segmentation is a branch of computer vision, used extensively in image search engines, automated driving, intelligent agriculture, disaster management, and other machine-human interactions. Semantic segmentation aims to predict a label for each pixel from a given label set, according to semantic information. Among the proposed methods and architectures, researchers have focused on deep learning algorithms due to their good feature learning results. Thus, many studies have explored the structure of deep neural networks, especially convolutional neural networks. Most of the modern semantic segmentation models are based on fully convolutional networks (FCN), which first replace the fully connected layers in common classification networks with convolutional layers, getting pixel-level prediction results. After that, a lot of methods are proposed to improve the basic FCN methods results. With the increasing complexity and variety of existing data structures, more powerful neural networks and the development of existing networks are needed. This study aims to segment a high-resolution (HR) image dataset into six separate classes. Here, an overview of some important deep learning architectures will be presented with a focus on methods producing remarkable scores in segmentation metrics such as accuracy and F1-score. Finally, their segmentation results will be discussed and we would see that the methods, which are superior in the overall accuracy and overall F1-score, are not necessarily the best in all classes. Therefore, the results of this paper lead to the point to choose the segmentation algorithm according to the application of segmentation and the importance degree of each class.
Semantic segmentation is a branch of computer vision, used extensively in image search engines, automated driving, intelligent agriculture, disaster management, and other machine-human interactions. Semantic segmentation aims to predict a label for each pixel from a given label set, according to semantic information. Among the proposed methods and architectures, researchers have focused on deep learning algorithms due to their good feature learning results. Thus, many studies have explored the structure of deep neural networks, especially convolutional neural networks. Most of the modern semantic segmentation models are based on fully convolutional networks (FCN), which first replace the fully connected layers in common classification networks with convolutional layers, getting pixel-level prediction results. After that, a lot of methods are proposed to improve the basic FCN methods results. With the increasing complexity and variety of existing data structures, more powerful neural networks and the development of existing networks are needed. This study aims to segment a high-resolution (HR) image dataset into six separate classes. Here, an overview of some important deep learning architectures will be presented with a focus on methods producing remarkable scores in segmentation metrics such as accuracy and F1-score. Finally, their segmentation results will be discussed and we would see that the methods, which are superior in the overall accuracy and overall F1-score, are not necessarily the best in all classes. Therefore, the results of this paper lead to the point to choose the segmentation algorithm according to the application of segmentation and the importance degree of each class.
[1] K. Farajzadeh, E. Zarezadeh, J. Mansouri, "Concept detection in images using SVD features and multi-granularity partitioning and classification", Journal of Information Systems & Telecommunication (JIST), 2017, pp. 172.
[2] M.J. Hasan, M. Sohaib, J.M. Kim, “An explainable ai-based fault diagnosis model for bearings”, Sensors, 2021, Vol. 21, No. 12, pp. 4070.
[3] M. Ahmad, S. F. Qadri, S. Qadri, I. A. Saeed, S. S. Zareen, Z. Iqbal, A. Alabrah, H. M. Alaghbari, M. Rahman, S. A. Md, "A lightweight convolutional neural network model for liver segmentation in medical diagnosis", Computational Intelligence and Neuroscience, 2022.
[4] M. S. Al-Rakhami, M. M. Islam, M. Z. Islam, A. Asraf, A. H. Sodhro, and W. Ding, "Diagnosis of COVID-19 from X-rays using combined CNN-RNN architecture with transfer learning", MedRxiv, 2020, pp. 20181339.
[5] M. Islam, "An efficient human computer interaction through hand gesture using deep convolutional neural network", SN Computer Science, 2020, Vol. 1, No. 4, pp. 1-9.
[6] W. Li. R. Zhang, H. Deng, L. Wang, W. Lin, S. Ji, and D. Shen, "Deep convolutional neural networks for multi-modality isointense infant brain image segmentation", NeuroImage, 2015, Vol. 108, pp. 214-224.
[7] A. Sandooghdar, F. Yaghmaee, "Deep Learning Approach for Cardiac MRI Images", Journal of Information Systems and Telecommunication (JIST), 2022, Vol. 1, No. 37, pp. 61.
[8] E. Gholam, S.R. Kamel Tabbakh, "Diagnosis of Gastric Cancer via Classification of the Tongue Images using Deep Convolutional Networks", Journal of Information Systems and Telecommunication (JIST), 2021, Vol. 3, No. 35, pp. 191.
[9] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradientbased learning applied to document recognition", Proceedings of the IEEE, 1998, Vol. 86, No. 11, pp. 2278-2324.
[10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition", Proceedings of the IEEE, 1998, VOL. 86, No. 11, pp. 2278-2324.
[11] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Region-based convolutional networks for accurate object detection and segmentation", IEEE transactions on pattern analysis and machine intelligence, 2015, Vol. 38, No. 1, pp. 142-158.
[12] N. Audebert, B. Le Saux, and S. Lef`evre, "Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks", ISPRS Journal of Photogrammetry and Remote Sensing, 2018, Vol. 140, pp. 20-32.
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks", Advances in neural information processing systems, 2012, Vol. 25.
[14] K. Simonyan, and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", arXiv preprint arXiv:1409.1556, 2014.
[15] Y. Mo, Y. Wu, X. Yang, F. Liu, and Y. Liao, "Review the state-of-the-art technologies of semantic segmentation based on deep learning", Neurocomputing, 2022, Vol. 493, pp. 626-646.
[16] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning", in Thirty-first AAAI conference on artificial intelligence, 2017.
[17] V. Badrinarayanan, A. Handa, and R. Cipolla, "Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling", arXiv preprint arXiv: 1505.07293, 2015.
[18] V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: A deep convolutional encoder-decoder architecture for image segmentation", IEEE transactions on pattern analysis and machine intelligence, 2017, Vol. 39, No.12, pp. 2481-2495.
[19] K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang, W. Liu, and J. Wang, "High-resolution representations for labeling pixels and regions", arXiv preprint arXiv:1904.04514, 2019.
[20] K. Sun, B. Xiao, D. Liu, and J. Wang, "Deep high-resolution representation learning for human pose estimation", in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5693-5703.
[21] D. Marmanis, J. D. Wegner, S. Galliani, K. Schindler, M. Datcu, and U. Stilla, "Semantic segmentation of aerial images with an ensemble of CNSS. ISPRS Annals of the Photogrammetry", Remote Sensing and Spatial Information Sciences, 2016, Vol. 3, pp. 473-480.
[22] S. Ioffe, and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift", In International conference on machine learning, 2015, pp. 448-456.
[23] V. Badrinarayanan, B. Mishra, and R. Cipolla, "Understanding symmetries in deep networks", arXiv preprint arXiv:1511.01029, 2015.
[24] H. Zamanian, H. Farsi, S. Mohamadzadeh, "Improvement in accuracy and speed of image semantic segmentation via convolution neural network encoder-decoder", Information Systems & Telecommunication (JIST), 2018, Vol. 6, No. 3, pp. 128-135.
[25] F. Wang, S. Piao, and J. Xie, "CSE-HRNet: A context and semantic enhanced high-resolution network for semantic segmentation of aerial imagery", IEEE Access, 2020, Vol. 8, No. 2, pp. 182475-182489.
[26] L. Mou, Y. Hua, and X. X. Zhu, "Relation matters: Relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images", IEEE Transactions on Geoscience and Remote Sensing, 2020, Vol. 58, No. 11, pp. 7557-7569.
[27] H. Luo, C. Chen, L. Fang, X. Zhu, and L. Lu, "High-resolution aerial images semantic segmentation using deep fully convolutional network with channel attention mechanism", IEEE journal of selected topics in applied earth observations and remote sensing, 2019, Vol. 12, No. 9, pp. 3492-3507.
[28] N. Mboga, S. Georganos, T. Grippa, M. Lennert, S. Vanhuysse, and E. Wolff, "Fully convolutional networks and geographic object-based image analysis for the classification of VHR imagery", Remote Sensing, 2019, Vol. 11, No. 5, pp. 597.
[29] G. Zhang, T. Lei, Y. Cui, and P. Jiang, "A dual-path and lightweight convolutional neural network for high-resolution aerial image segmentation", ISPRS International Journal of Geo-Information, 2019, Vol. 8, No. 12, pp. 582.
[30] Z. Tu, X. Chen, A. L. Yuille, and S. C. Zhu, "Image parsing: Unifying segmentation, detection, and recognition", International Journal of computer vision, 2005, Vol. 63, No. 2, pp. 113-140.
[31] B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman, "Using multiple segmentations to discover objects and their extent in image collections", In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), 2006, Vol. 2, pp. 1605-1614. [32] E. Borenstein, and S. Ullman, "Combined top-down/bottom-up segmentation", IEEE Transactions on pattern analysis and machine intelligence, 2008, Vol. 30, No. 12, pp. 2109-2125.
[33] J. Wu, J. Zhu, and Z. Tu, "Reverse Image Segmentation: A High-Level Solution to a Low-Level Task", In BMVC, 2014.
[34] Q. Zhao, and L. D. Griffin, "Better image segmentation by exploiting dense semantic predictions", arXiv preprint arXiv:1606.01481, 2016.
[35] R. Socher, C. C. Lin, A. Y. Ng, and C. D. Manning, "Parsing natural scenes and natural language with recursive neural networks", In Proc. IEEE Int. Conf. Mach. Learn. (ICML), 2011, pp. 129-136.
[36] J. Yao, S. Fidler, and R. Urtasun, "Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation", In IEEE conference on computer vision and pattern recognition, 2012, pp. 702-709.
[37] A. Kae, K. Sohn, H. Lee, and E. Learned-Miller, "Augmenting CRFs with Boltzmann machine shape priors for image labeling", In Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 2019-2026.
[38] H. Myeong, and K. M. Lee, "Tensor-based high-order semantic relation transfer for semantic scene segmentation", In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3073-3080.
[39] J. J. Corso, "Toward parts-based scene understanding with pixel-support parts-sparse pictorial structures", Pattern Recognition Letters, 2013, Vol. 34, No. 7, pp. 762-769.
[40] Q. Li, Y. Shi, and X. Huang, "Building footprint generation by integrating convolution neural network with feature pairwise conditional random field (FPCRF)", IEEE Transactions on Geoscience and Remote Sensing, 2020, Vol. 58, No. 11, pp. 7502-7519.
[41] M. Cramer, "The DGPF-test on digital airborne camera evaluation overview and test design", Photogrammetrie-Fernerkundung-Geoinformation, 2010, pp. 73-82.
[42] M.J. Hasan, J.M. Kim, "Bearing fault diagnosis under variable rotational speeds using stockwell transform-based vibration imaging and transfer learning", Applied Sciences, Vol. 8, No. 12, pp. 2357.
[43] M.J. Hasan, J. Uddin, S.N. Pinku, "A novel modified SFTA approach for feature extraction", In 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), 2016, pp. 1-5.
[44] M. Ghasemi, M. Kelarestaghi, F. Eshghi, A. Sharifi, "D 3 FC: deep feature-extractor discriminative dictionary-learning fuzzy classifier for medical imaging", Applied Intelligence, 2022, pp. 1-17.