بررسی سربارهای سخت‌افزاری و بهره‌وری انرژی در پیاده‌سازی انواع چندی‌سازی ممیزثابت در شتاب‌دهنده شبکه عصبی عمیق

محورهای موضوعی : AI and Robotics

مرضیه مستعلی زاده ¹ , سید علی انصارمحمدی ² , نجمه نظری ³ , مصطفی ارسالی صالحی نسب ⁴

1 - مهندسی معماری کامپیوتر، دانشکده برق و کامپیوتر، دانشگاه تهران، تهران، ایران
2 - دانشجو مقطع دکتری، دانشکده مهندسی برق و کامپیوتر، دانشگاه تهران، تهران، ایران
3 - مهندسی معماری کامپیوتر، دانشکده برق و کامپیوتر، دانشگاه تهران، تهران، ایران
4 - دانشگاه تهران

تاریخ دریافت : 1402/05/04 تاریخ پذیرش : 1402/09/26 تاریخ انتشار : 1403/09/17

کلید واژه: شبکه‌های عصبی عمیق, سیستم‌های نهفته, بهره‌وری انرژی, کوانتیزاسیون ممیزثابت,

چکیده مقاله :

یکی از کارآمدترین راه‌کارهای فشرده‌سازی و کاهش انرژی مصرفی شبکه‌های عصبی عمیق در دستگاه‌های نهفته، کوانتیزاسیون با استفاده از نمایش اعداد ممیز ثابت است. در سال‌های اخیر، روش‌های متنوعی برای بهبود صحت شبکه‌های کوانتیزه‌شده مطرح شده است که اغلب سربارهای محاسباتی زیادی به شبکه تحمیل می‌کنند، اگرچه این موضوع تاکنون از دید طراحان شبکه‌های عصبی عمیق پنهان مانده‌است. در این پژوهش، روش‌های مختلف کوانتیزاسیون ممیزثابت، بر اساس مولفه‌های تاثیرگذار در سربارهای سخت افزاری، طبقه‌بندی و مدل شده است. پس از آن، معماری‌های سخت‌افزاری ارائه‌شده برای هریک از مدل‌ها به صورت عادلانه، با در نظرگرفتن هزینه‌فایده‌ی بین صحت شبکه و بهره‌وری انرژی سخت‌افزار، بررسی و مقایسه می‌شوند. نتایج نشان می‌دهد تکنیک‌هایی که برای کاهش خطای روش‌های کوانتیزاسیون به کار گرفته می‌شود، اگرچه به افزایش صحت شبکه‌های عصبی منجر می‌شود اما از طرف دیگر بهره‌وری انرژی سخت‌افزار را کاهش می‌دهد. براساس نتایج شبیه‌سازی، افزودن ضریب مقیاس و آفست به کوانتیزاسیون ممیزثابت LSQ، صحت شبکه را حدود 1/0 افزایش می‌دهد اما بهره‌وری انرژی سخت‌افزار حدود 3 برابر کمتر شده است. این موضوع لزوم توجه به سربارهای سخت‌افزاری را به‌خصوص در سیستم‌های نهفته، بیش از پیش نشان می‌دهد.

چکیده انگلیسی:

Deep Neural Networks (DNNs) have demonstrated remarkable performance in various application domains, such as computer vision, pattern recognition, and natural language processing. However, deploying these models on edge-computing devices poses a challenge due to their extensive memory requirements and computational complexity. These factors make it difficult to deploy DNNs on low-power and limited-resource devices. One promising technique to address this challenge is quantization, particularly fixed-point quantization. Previous studies have shown that reducing the bit-width of weights and activations, such as to 3 or 4 bits, through fixed-point quantization can preserve the classification accuracy of full-precision neural networks. Despite extensive research on the compression efficiency of fixed-point quantization techniques, their energy efficiency, a critical metric in evaluating embedded systems, has not been thoroughly explored. Therefore, this research aims to assess the energy efficiency of fixed-point quantization techniques while maintaining accuracy. To accomplish this, we present a model and design an architecture for each quantization method. Subsequently, we compare their area and energy efficiency at the same accuracy level. Our experimental results indicate that incorporating scaling factors and offsets into LSQ, a well-known quantization method, improves DNN accuracy by 0.1%. However, this improvement comes at the cost of a 3× decrease in hardware energy efficiency. This research highlights the significance of evaluating fixed-point quantization techniques not only in terms of compression efficiency but also in terms of energy efficiency when applied to edge-computing device.

منابع و مأخذ:

[1] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, "Efficient processing of deep neural networks: A tutorial and survey," Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, 2017.
[2] Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang, "Filter pruning via geometric median for deep convolutional neural networks acceleration," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4340-4349.
[3] Z. Zhuang et al., "Discrimination-aware channel pruning for deep neural networks," Advances in neural information processing systems, vol. 31, 2018.
[4] Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell, "Rethinking the value of network pruning," arXiv preprint arXiv:1810.05270, 2018.
[5] C. Baskin et al., "Nice: Noise injection and clamping estimation for neural network quantization," Mathematics, vol. 9, no. 17, p. 2144, 2021.
[6] Y. Bhalgat, J. Lee, M. Nagel, T. Blankevoort, and N. Kwak, "Lsq+: Improving low-bit quantization through learnable offsets and better initialization," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 696-697.
[7] S.-E. Chang et al., "RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5251-5260.
[8] J. Choi, Z. Wang, S. Venkataramani, P. I.-J. Chuang, V. Srinivasan, and K. Gopalakrishnan, "Pact: Parameterized clipping activation for quantized neural networks," arXiv preprint arXiv:1805.06085, 2018.
[9] M. Courbariaux, Y. Bengio, and J.-P. David, "Training deep neural networks with low precision multiplications," arXiv preprint arXiv:1412.7024, 2014.
[10] Z. Dong, Z. Yao, A. Gholami, M. W. Mahoney, and K. Keutzer, "Hawq: Hessian aware quantization of neural networks with mixed-precision," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 293-302.
[11] S. K. Esser, J. L. McKinstry, D. Bablani, R. Appuswamy, and D. S. Modha, "Learned step size quantization," arXiv preprint arXiv:1902.08153, 2019.
[12] M. Ghasemzadeh, M. Samragh, and F. Koushanfar, "ReBNet: Residual binarized neural network," in 2018 IEEE 26th annual international symposium on field-programmable custom computing machines (FCCM), 2018: IEEE, pp. 57-64.
[13] T. Chen et al., "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," ACM SIGARCH Computer Architecture News, vol. 42, no. 1, pp. 269-284, 2014.
[14] Y. Chen et al., "Dadiannao: A machine-learning supercomputer," in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014: IEEE, pp. 609-622.
[15] P. Gysel, J. Pimentel, M. Motamedi, and S. Ghiasi, "Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks," IEEE transactions on neural networks and learning systems, vol. 29, no. 11, pp. 5784-5789, 2018.
[16] P. Gysel, M. Motamedi, and S. Ghiasi, "Hardware-oriented approximation of convolutional neural networks," arXiv preprint arXiv:1604.03168, 2016.
[17] S. Sharify et al., "Laconic deep learning inference acceleration," in 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA), 2019: IEEE, pp. 304-317.
[18] S. Ghodrati, H. Sharma, C. Young, N. S. Kim, and H. Esmaeilzadeh, "Bit-parallel vector composability for neural acceleration," in 2020 57th ACM/IEEE Design Automation Conference (DAC), 2020: IEEE, pp. 1-6.
[19] S. Jung et al., "Learning to quantize deep networks by optimizing quantization intervals with task loss," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4350-4359.
[20] C. Gong et al., "µl2q: An ultra-low loss quantization method for DNN compression," in 2019 International Joint Conference on Neural Networks (IJCNN), 2019: IEEE, pp. 1-8.
[21] R. Gong et al., "Differentiable soft quantization: Bridging full-precision and low-bit neural networks," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4852-4861.
[22] M. Nikolić et al., "Bitpruning: Learning bitlengths for aggressive and accurate quantization," arXiv preprint arXiv:2002.03090, 2020.
[23] N. Nazari, M. Loni, M. E. Salehi, M. Daneshtalab, and M. Sjodin, "Tot-net: An endeavor toward optimizing ternary neural networks," in 2019 22nd Euromicro Conference on Digital System Design (DSD), 2019: IEEE, pp. 305-312.
[24] S. A. Mirsalari, N. Nazari, S. A. Ansarmohammadi, M. E. Salehi, and S. Ghiasi, "E2BNet: MAC-free yet accurate 2-level binarized neural network accelerator for embedded systems," Journal of Real-Time Image Processing, vol. 18, pp. 1285-1299, 2021.
[25] S. A. Mirsalari, N. Nazari, S. A. Ansarmohammadi, S. Sinaei, M. E. Salehi, and M. Daneshtalab, "ELC-ECG: Efficient LSTM Cell for ECG classification based on quantized architecture," in 2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021: IEEE, pp. 1-5.
[26] M. E. Salehi, "Binary neural networks," 2020.
[27] N. Nazari, S. A. Mirsalari, S. Sinaei, M. E. Salehi, and M. Daneshtalab, "Multi-level binarized lstm in eeg classification for wearable devices," in 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2020: IEEE, pp. 175-181.
[28] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," in International conference on machine learning, 2015: PMLR, pp. 1737-1746.
[29] F. Asim, J. Park, A. Azamat, and J. Lee, "Centered Symmetric Quantization for Hardware-Efficient Low-Bit Neural Networks," 2022: British Machine Vision Association (BMVA).
[30] P. Judd et al., "Reduced-precision strategies for bounded memory in deep neural nets," arXiv preprint arXiv:1511.05236, 2015.
[31] X. Zhao, Y. Wang, X. Cai, C. Liu, and L. Zhang, "Linear symmetric quantization of neural networks for low-precision integer hardware," 2020.
[32] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[33] J. L. McKinstry et al., "Discovering low-precision networks close to full-precision networks for efficient inference," in 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), 2019: IEEE, pp. 6-9.

اشتراک گذاری

آدرس مقاله

بررسی سربارهای سخت‌افزاری و بهره‌وری انرژی در پیاده‌سازی انواع چندی‌سازی ممیزثابت در شتاب‌دهنده شبکه عصبی عمیق

رایمگ

پیوندهای سایت

مراکز مرتبط

پشتیبانی

صفحات رسمی