Energy-Efficient Fixed-Point Hardware Accelerator for Embedded DNNs
Subject Areas : AI and RoboticsMarzie Mastalizade 1 , Ali Ansarmohammadi 2 , Najme Nazari 3 , Mostafa Salehi 4
1 - Computer Architecture, faculty of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
2 - PHD. student, Faculty of Electrical and Computer Engineering, Tehran University, Tehran, Iran
3 - PHD. student, Faculty of Electrical and Computer Engineering, Tehran University, Tehran, Iran
4 - University of Tehran
Keywords: Deep Neural Network, Embedded Systems, Energy-Efficiency, Fixed-point Quantization,
Abstract :
Deep Neural Networks (DNNs) have demonstrated remarkable performance in various application domains, such as computer vision, pattern recognition, and natural language processing. However, deploying these models on edge-computing devices poses a challenge due to their extensive memory requirements and computational complexity. These factors make it difficult to deploy DNNs on low-power and limited-resource devices. One promising technique to address this challenge is quantization, particularly fixed-point quantization. Previous studies have shown that reducing the bit-width of weights and activations, such as to 3 or 4 bits, through fixed-point quantization can preserve the classification accuracy of full-precision neural networks. Despite extensive research on the compression efficiency of fixed-point quantization techniques, their energy efficiency, a critical metric in evaluating embedded systems, has not been thoroughly explored. Therefore, this research aims to assess the energy efficiency of fixed-point quantization techniques while maintaining accuracy. To accomplish this, we present a model and design an architecture for each quantization method. Subsequently, we compare their area and energy efficiency at the same accuracy level. Our experimental results indicate that incorporating scaling factors and offsets into LSQ, a well-known quantization method, improves DNN accuracy by 0.1%. However, this improvement comes at the cost of a 3× decrease in hardware energy efficiency. This research highlights the significance of evaluating fixed-point quantization techniques not only in terms of compression efficiency but also in terms of energy efficiency when applied to edge-computing device.
[1] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, "Efficient processing of deep neural networks: A tutorial and survey," Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, 2017.
[2] Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang, "Filter pruning via geometric median for deep convolutional neural networks acceleration," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4340-4349.
[3] Z. Zhuang et al., "Discrimination-aware channel pruning for deep neural networks," Advances in neural information processing systems, vol. 31, 2018.
[4] Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell, "Rethinking the value of network pruning," arXiv preprint arXiv:1810.05270, 2018.
[5] C. Baskin et al., "Nice: Noise injection and clamping estimation for neural network quantization," Mathematics, vol. 9, no. 17, p. 2144, 2021.
[6] Y. Bhalgat, J. Lee, M. Nagel, T. Blankevoort, and N. Kwak, "Lsq+: Improving low-bit quantization through learnable offsets and better initialization," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 696-697.
[7] S.-E. Chang et al., "RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5251-5260.
[8] J. Choi, Z. Wang, S. Venkataramani, P. I.-J. Chuang, V. Srinivasan, and K. Gopalakrishnan, "Pact: Parameterized clipping activation for quantized neural networks," arXiv preprint arXiv:1805.06085, 2018.
[9] M. Courbariaux, Y. Bengio, and J.-P. David, "Training deep neural networks with low precision multiplications," arXiv preprint arXiv:1412.7024, 2014.
[10] Z. Dong, Z. Yao, A. Gholami, M. W. Mahoney, and K. Keutzer, "Hawq: Hessian aware quantization of neural networks with mixed-precision," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 293-302.
[11] S. K. Esser, J. L. McKinstry, D. Bablani, R. Appuswamy, and D. S. Modha, "Learned step size quantization," arXiv preprint arXiv:1902.08153, 2019.
[12] M. Ghasemzadeh, M. Samragh, and F. Koushanfar, "ReBNet: Residual binarized neural network," in 2018 IEEE 26th annual international symposium on field-programmable custom computing machines (FCCM), 2018: IEEE, pp. 57-64.
[13] T. Chen et al., "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," ACM SIGARCH Computer Architecture News, vol. 42, no. 1, pp. 269-284, 2014.
[14] Y. Chen et al., "Dadiannao: A machine-learning supercomputer," in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014: IEEE, pp. 609-622.
[15] P. Gysel, J. Pimentel, M. Motamedi, and S. Ghiasi, "Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks," IEEE transactions on neural networks and learning systems, vol. 29, no. 11, pp. 5784-5789, 2018.
[16] P. Gysel, M. Motamedi, and S. Ghiasi, "Hardware-oriented approximation of convolutional neural networks," arXiv preprint arXiv:1604.03168, 2016.
[17] S. Sharify et al., "Laconic deep learning inference acceleration," in 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA), 2019: IEEE, pp. 304-317.
[18] S. Ghodrati, H. Sharma, C. Young, N. S. Kim, and H. Esmaeilzadeh, "Bit-parallel vector composability for neural acceleration," in 2020 57th ACM/IEEE Design Automation Conference (DAC), 2020: IEEE, pp. 1-6.
[19] S. Jung et al., "Learning to quantize deep networks by optimizing quantization intervals with task loss," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4350-4359.
[20] C. Gong et al., "µl2q: An ultra-low loss quantization method for DNN compression," in 2019 International Joint Conference on Neural Networks (IJCNN), 2019: IEEE, pp. 1-8.
[21] R. Gong et al., "Differentiable soft quantization: Bridging full-precision and low-bit neural networks," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4852-4861.
[22] M. Nikolić et al., "Bitpruning: Learning bitlengths for aggressive and accurate quantization," arXiv preprint arXiv:2002.03090, 2020.
[23] N. Nazari, M. Loni, M. E. Salehi, M. Daneshtalab, and M. Sjodin, "Tot-net: An endeavor toward optimizing ternary neural networks," in 2019 22nd Euromicro Conference on Digital System Design (DSD), 2019: IEEE, pp. 305-312.
[24] S. A. Mirsalari, N. Nazari, S. A. Ansarmohammadi, M. E. Salehi, and S. Ghiasi, "E2BNet: MAC-free yet accurate 2-level binarized neural network accelerator for embedded systems," Journal of Real-Time Image Processing, vol. 18, pp. 1285-1299, 2021.
[25] S. A. Mirsalari, N. Nazari, S. A. Ansarmohammadi, S. Sinaei, M. E. Salehi, and M. Daneshtalab, "ELC-ECG: Efficient LSTM Cell for ECG classification based on quantized architecture," in 2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021: IEEE, pp. 1-5.
[26] M. E. Salehi, "Binary neural networks," 2020.
[27] N. Nazari, S. A. Mirsalari, S. Sinaei, M. E. Salehi, and M. Daneshtalab, "Multi-level binarized lstm in eeg classification for wearable devices," in 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2020: IEEE, pp. 175-181.
[28] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," in International conference on machine learning, 2015: PMLR, pp. 1737-1746.
[29] F. Asim, J. Park, A. Azamat, and J. Lee, "Centered Symmetric Quantization for Hardware-Efficient Low-Bit Neural Networks," 2022: British Machine Vision Association (BMVA).
[30] P. Judd et al., "Reduced-precision strategies for bounded memory in deep neural nets," arXiv preprint arXiv:1511.05236, 2015.
[31] X. Zhao, Y. Wang, X. Cai, C. Liu, and L. Zhang, "Linear symmetric quantization of neural networks for low-precision integer hardware," 2020.
[32] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[33] J. L. McKinstry et al., "Discovering low-precision networks close to full-precision networks for efficient inference," in 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), 2019: IEEE, pp. 6-9.