چندیسازی غیریکنواخت سه حالتی جهت بهبود تنکی و محاسبات شبکههای عصبی عمیق در کاربردهای نهفته
الموضوعات :حسنا معنوی مفرد 1 , سید علی انصارمحمدی 2 , مصطفی ارسالی صالحی نسب 3
1 - دانشجو
2 - دانشجو مقطع دکتری، دانشکده مهندسی برق و کامپیوتر، دانشگاه تهران، تهران، ایران
3 - دانشگاه تهران
الکلمات المفتاحية: شبکههای عصبی عمیق, چندیسازی غیریکنواخت سه حالتی, شبکه عصبی تنک, هرس کردن, دستگاههای نهفته,
ملخص المقالة :
شبکههای عصبی عمیق به دلیل موفقیت در کاربردهای مختلف، به جذابیت فوقالعادهای دست یافتهاند. اما پیچیدگی محاسبات و حجم حافظه از موانع اصلی برای پیادهسازی آنها در بسیاری از دستگاههای نهفته تلقی میشود. از مهمترین روشهای بهینهسازی که در سالهای اخیر برای برطرف نمودن این موانع ارائه شده، میتوان به چندیسازی و هرس کردن اشاره کرد. یکی از روشهای معروف چندیسازی، استفاده از نمایش اعداد غیریکنواخت دو حالتی است که علاوه بر بهرهبردن از محاسبات بیتی، افت صحت شبکههای دو حالتی را در مقایسه با شبکههای دقت کامل کاهش میدهد. اما به دلیل نداشتن قابلیت نمایش عدد صفر در آنها، مزایای تنکی دادهها را از دست میدهند. از طرفی، شبکههای عصبی عمیق به صورت ذاتی تنک هستند و با تنک کردن پارامترهای شبکه عصبی عمیق، حجم دادهها در حافظه کاهش مییابد و همچنین به کمک روشهایی میتوان انجام محاسبات را تسریع کرد. در این مقاله میخواهیم هم از مزایای چندیسازی غیریکنواخت و هم از تنکی دادهها بهره ببریم. برای این منظور چندیسازی غیریکنواخت سه حالتی برای نمایش اعداد ارائه میدهیم که علاوه بر افزایش صحت شبکه نسبت به شبکه غیریکنواخت دو حالتی، قابلیت هرس کردن را به شبکه میدهد. سپس میزان تنکی در شبکه چندی شده را با استفاده از هرس کردن افزایش میدهیم. نتایج نشان میدهد که تسریع بالقوه شبکه ما در سطح بیت و کلمه میتواند به ترتیب 15 و 45 برابر نسبت به شبکه غیریکنواخت دو حالتی پایه افزایش یابد.
[1] LeCun, Y., Y. Bengio, and G. Hinton, Deep learning. nature, 2015. 521(7553): p. 436-444.
[2] Zhang, D., et al. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. in Proceedings of the European conference on computer vision (ECCV). 2018.
[3] Yang, L., Z. He, and D. Fan. Harmonious coexistence of structured weight pruning and ternarization for deep neural networks. in Proceedings of the AAAI Conference on Artificial Intelligence. 2020.
[4] Burrello, A., et al., Dory: Automatic end-to-end deployment of real-world dnns on low-cost iot mcus. IEEE Transactions on Computers, 2021. 70(8): p. 1253-1268.
[5] de Prado, M., et al., Automated Design Space Exploration for Optimized Deployment of DNN on Arm Cortex-A CPUs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020. 40(11): p. 2293-2305.
[6] Howard, A.G., et al., Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
[7] Iandola, F.N., et al., SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360, 2016.
[8] Courbariaux, M., Y. Bengio, and J.-P. David, Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in neural information processing systems, 2015. 28.
[9] Cai, Z., et al. Deep learning with low precision by half-wave gaussian quantization. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[10] Li, F., B. Zhang, and B. Liu, Ternary weight networks. arXiv preprint arXiv:1605.04711, 2016.
[11] He, Y., X. Zhang, and J. Sun. Channel pruning for accelerating very deep neural networks. in Proceedings of the IEEE international conference on computer vision. 2017.
[12] Han, S., et al., Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 2015. 28.
[13] Luo, J.-H., J. Wu, and W. Lin. Thinet: A filter level pruning method for deep neural network compression. in Proceedings of the IEEE international conference on computer vision. 2017.
[14] Maji, P., et al. Efficient winograd or cook-toom convolution kernel implementation on widely used mobile cpus. in 2019 2nd Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2). 2019. IEEE.
[15] Andri, R., et al. YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights. in 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). 2016. IEEE.
[16] Jacob, B., et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[17] Gysel, P., et al., Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE transactions on neural networks and learning systems, 2018. 29(11): p. 5784-5789.
[18] Sharma, H., et al. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network. in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 2018. IEEE.
[19] Hubara, I., et al., Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research, 2017. 18(1): p. 6869-6898.
[20] Li, Y. and F. Ren. BNN Pruning: Pruning binary neural network guided by weight flipping frequency. in 2020 21st International Symposium on Quality Electronic Design (ISQED). 2020. IEEE.
[21] Jin, C., H. Sun, and S. Kimura. Sparse ternary connect: Convolutional neural networks using ternarized weights with enhanced sparsity. in 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC). 2018. IEEE.
[22] Zhu, C., et al., Trained ternary quantization. arXiv preprint arXiv:1612.01064, 2016.
[23] Chin, T.-W., et al. One weight bitwidth to rule them all. in European Conference on Computer Vision. 2020. Springer.
[24] Ghasemzadeh, M., M. Samragh, and F. Koushanfar. ReBNet: Residual binarized neural network. in 2018 IEEE 26th annual international symposium on field-programmable custom computing machines (FCCM). 2018. IEEE.
[25] Zhao, Y., et al., Focused quantization for sparse cnns. Advances in Neural Information Processing Systems, 2019. 32.
[26] Long, X., et al., Learning sparse convolutional neural network via quantization with low rank regularization. IEEE Access, 2019. 7: p. 51866-51876.
[27] Long, X., et al. Low Bit Neural Networks with Channel Sparsity and Sharing. in 2022 7th International Conference on Image, Vision and Computing (ICIVC). 2022. IEEE.
[28] Gadosey, P.K., Y. Li, and P.T. Yamak. On pruned, quantized and compact CNN architectures for vision applications: an empirical study. in Proceedings of the International Conference on Artificial Intelligence, Information Processing and Cloud Computing. 2019.
[29] Albericio, J., et al., Cnvlutin: Ineffectual-neuron-free deep neural network computing. ACM SIGARCH Computer Architecture News, 2016. 44(3): p. 1-13.
[30] Han, S., et al., EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News, 2016. 44(3): p. 243-254.
[31] Parashar, A., et al., SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH computer architecture news, 2017. 45(2): p. 27-40.
[32] Zhang, S., et al. Cambricon-X: An accelerator for sparse neural networks. in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 2016. IEEE.
[33] Delmas, A., et al., DPRed: Making typical activation and weight values matter in deep learning computing. arXiv preprint arXiv:1804.06732, 2018.
[34] Aimar, A., et al., NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE transactions on neural networks and learning systems, 2018. 30(3): p. 644-656.
[35] Li, J., et al., SqueezeFlow: A sparse CNN accelerator exploiting concise convolution rules. IEEE Transactions on Computers, 2019. 68(11): p. 1663-1677.
[36] Albericio, J., et al. Bit-pragmatic deep neural network computing. in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 2017.
[37] Sharify, S., et al. Laconic deep learning inference acceleration. in 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). 2019. IEEE.
[38] 38. Chen, Y.-H., et al., Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits, 2016. 52(1): p. 127-138.
[39] Sandler, M., et al. Mobilenetv2: Inverted residuals and linear bottlenecks. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[40] Zhang, X., et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[41] Tan, M., et al. Mnasnet: Platform-aware neural architecture search for mobile. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
[42] Wu, B., et al. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
[43] Elsken, T., J.H. Metzen, and F. Hutter, Efficient multi-objective neural architecture search via lamarckian evolution. arXiv preprint arXiv:1804.09081, 2018.
[44] Rastegari, M., et al. Xnor-net: Imagenet classification using binary convolutional neural networks. in European conference on computer vision. 2016. Springer.
[45] Zhang, J., F. Franchetti, and T.M. Low. High performance zero-memory overhead direct convolutions. in International Conference on Machine Learning. 2018. PMLR.
[46] Lavin, A. and S. Gray. Fast algorithms for convolutional neural networks. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[47] De Prado, M., N. Pazos, and L. Benini. Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems. in 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). 2019. IEEE.
[48] Rovder, S., J. Cano, and M. O’Boyle, Optimising convolutional neural networks inference on low-powered GPUs. 2019.
[49] Lin, X., C. Zhao, and W. Pan, Towards accurate binary convolutional neural network. Advances in neural information processing systems, 2017. 30.
[50] Gholami, A., et al., A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630, 2021.
[51] Hawks, B., et al., Ps and qs: Quantization-aware pruning for efficient low latency neural network inference. Frontiers in Artificial Intelligence, 2021. 4: p. 676564.
[52] Hubara, I., et al., Binarized neural networks. Advances in neural information processing systems, 2016. 29.