طراحی شتاب‌دهنده تقریبی کم‌توان بر بستر تراشه‌های FPGA برای کاربردهای هوش مصنوعی

الموضوعات : مهندسی برق و کامپیوتر

نادیا سهرابی ¹ , امیر باوفای طوسی ² , مهدی صدیقی ³

1 - دانشکده مهندسی کامپیوتر، دانشگاه صنعتی امیرکبیر، تهران، ایران
2 - دانشكده كامپيوتر، دانشگاه سجاد، مشهد، ایران،
3 - دانشکده مهندسی کامپیوتر، دانشگاه صنعتی امیرکبیر، تهران، ایران

تاريخ الإرسال : 16 الخميس , ربيع الأول, 1446 تاريخ التأكيد : 04 الإثنين , شعبان, 1446 تاريخ الإصدار : 18 الثلاثاء , صفر, 1447

الکلمات المفتاحية: جمع‌کننده تقریبی, شبکه عصبی کانولوشنی, طراحی شبکه عصبی تشخیص ارقام دست‌نویس, محاسبات تقریبی.,

ملخص المقالة :

یکی از روش‌های یادگیری ماشین شبکه‌های عصبی می‌باشند که در کاربردهایی نظیر پردازش تصویر به کار می‌روند. یکی از چالش‌های شبکه‌های عصبی، حجم بالای محاسبات آنهاست. به همین دلیل معماری‌های زیادی برای این گونه کاربردها ارائه‌ شده که راه‌حل‌هایی برای محاسبات پیچیده آنها ارائه می‌دهند. معمولاً برای تسریع الگوریتم‌های شبکه‌های عصبی از شتاب‌دهنده‌های سخت‌افزاری قابل بازپیکربندی مانند تراشه‌های FPGA استفاده می‌شود؛ اما مشکل اصلی این تراشه‌ها توان ‌مصرفی نسبتاً بالای آنهاست. برای کاهش توان مصرفی در تراشه‌های FPGA از تکنیک محاسبات تقریبی می‌توان استفاده کرد. ایده اصلی محاسبات تقریبی این است که با ایجاد تغییراتی در مدار یا کد، بین دقت و مصرف انرژی مصالحه‌ای برقرار شود. در این پژوهش یک شبکه عصبی کانولوشنی برای تشخیص ارقام دست‌نویس به‌صورت دقیق و تقریبی با هدف بهبود توان مصرفی طراحی و پیاده‌سازی شده است. ایده تقریب‌سازی در بخش محاسبات جمع‌کننده شبکه عصبی ارائه ‌شده است. این روش با جلوگیری از انتشار رقم نقلی در بیت‌های پایین جمع‌کننده، توان مصرفی را کاهش می‌دهد. نتایج مقایسه شبکه عصبی به‌صورت دقیق و تقریبی نشان می‌دهد که با تقریب‌سازی 6 بیت وزن پایین جمع‌کننده، توان مصرفی 75_/43% کاهش می‌یابد و هیچ خطایی رخ نمی‌دهد.

المصادر:

[1] Y. Qian, et al., "Approximate logic synthesis in the loop for designing low-power neural network accelerator," in Proc. IEEE Int. Symp. on Circuits and Systems, 5 pp., Daegu, Korea, 22-28 May 2021.
[2] M. S. Ansari, B. F. Cockburn, and J. Han, "An improved logarithmic multiplier for energy efficient neural computing," IEEE Trans. on Computers, vol. 70, no. 4, pp. 614-625, Apr. 2020.
[3] www.altera.com
[4] M. Hamdan, "VHDL auto-generation tool for optimized hardware acceleration of convolutional neural networks on FPGA (VGT)," A thesis submitted to the graduate faculty, Iowa State University, 2018.
[5] C. L. Giles and C. W. Omlin, "Pruning recurrent neural networks for improved generalization performance," IEEE Trans. on Neural Networks, vol. 5, no. 5, pp. 848-851, Sept. 1994.
[6] M. S. Ansari, B. F. Cockburn, and J. Han, "An improved logarithmic multiplier for energy-efficient neural computing," IEEE Trans. on Computers, vol. 70, no. 4, pp. 614-625, Apr. 2021.
[7] F. Li, Y. Lin, and L. He, "FPGA power reduction using configurable dual-Vdd," in Proc. of the 41st Annual Design Automation Conf., pp. 735-740, San Diego, CA, USA, 7-11 Jun. 2004.
[8] K. Yin Kyaw, W. Ling Goh, and K. Seng Yeo, "Low-power high-speed multiplier for error-tolerant application," in ¬Proc. IEEE Int. Conf. of Electron Devices and Solid-State Circuits, 4 pp., Hong Kong, China, 15-17 Dec. 2010.
[9] S. S. P. Goswami, B. Paul, S. Dutt, and G. Trivedi, "Comparative review of approximate multipliers," in ¬Proc. 30th Int. Conf. Radioelektronika, 6 pp., Bratislava, Slovakia, 15-16 Apr.
2020. [10] M. Vasudevan and C. Chakrabarti, "In image processing using approximate datapath units," in ¬Proc. IEEE Int. Symp. on Circuits and Systems, pp. 1544-1547, Melbourne, Australia, 1-5 Jun. 2014.
[11] S. Ullah, et al., "Area-optimized low-latency approximate multipliers for FPGA-based hardware accelerators," in ¬Proc. 55th ACM/ESDA/IEEE Design Automation Conf., 6 pp., San Francisco, CA, USA 24-28 Jun. 2018.
[12] S. Ullah, S. Rehman, M. Shafique, and A. Kumar, "High-performance accurate and approximate multipliers for FPGA-based hardware accelerators," IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 2, pp. 211-224, Feb. 2021.
[13] K. Nepal, Y. Li, R. I. Bahar, and S. Reda, "Automated high-level synthesis of low power/area approximate computing circuits," First Workshop on Approximate Computing Across the System Stack, 6 pp., Salt Lake City, UT, USA, 2-2 Mar. 2014.
[14] Y. Kim, Y. Zhang, and P. Li, "An energy efficient approximate adder with carry skip for error resilient neuromorphic VLSI systems," in ¬Proc. IEEE/ACM Int. Conf. on Computer-Aided Design, pp. 130-137, San Jose, CA, USA, 18-21 Nov. 2013.
[15] R. Venkatesan, A. Agarwal, K. Roy, and A. Raghunathan, "MACACO: modeling and analysis of circuits for approximate computing," in Proc. IEEE/ACM Int. Conf. on Computer-Aided Design, pp. 667-673, 7-10 Nov. 2011.
[16] D. P. Williamson and D. B. Shmoys, The Design of Approximation Algorithms, Cambridge University Press, vol. 1, pp. 14-15, 2011.
[17] H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Architecture support for disciplined approximate programming," in Proc. Intl. Conf. Architectural Support for Programming Languages and Operating Systems, pp. 301-312, London, UK, 3-7 Mar. 2012.
[18] K. Lengwehasatit and A. Ortega, "Scalable variable complexity approximate forward DCT," IEEE Trans. on Circuits and Systems for Video Technology, vol. 14, no. 11, pp. 1236-1248, Nov. 2004.
[19] Z. Li, et al., "Laius: an 8-bit fixed-point CNN hardware inference engine," in Proc. IEEE Int. Symp. on Parallel and Distributed Processing with Applications and IEEE Int. Conf. on Ubiquitous Computing and Communications, pp. 143-150, Guangzhou, China, 12-15 Dec. 2017.
[20] T. Yang, T. Sato, and T. Ukezono, "An accuracy-configurable adder for low-power applications," IEICE Trans. on Electronics, vol. E103-C, no. 3, pp. 68-76, 2020.
[21] M. Sano, et al., "An accuracy-controllable approximate adder for FPGAs," in Proc. 4th Int. Symp. on Advanced Technologies and Applications in the Internet of Things, pp. 60-66, Ibaraki, Japan 24-26 Aug. 2022.
[22] D. Piyasena, R. Wickramasinghe, D. Paul, S. Lam, and M. Wu, "Reducing dynamic power in streaming CNN hardware accelerators by exploiting computational redundancies," in Proc. 29th Int. Conf. on Field Programmable Logic and Applications, pp. 354-359, Barcelona, Spain, 8-12 Sept. 2019.

شارک

عنوان URL للمقالة

طراحی شتاب‌دهنده تقریبی کم‌توان بر بستر تراشه‌های FPGA برای کاربردهای هوش مصنوعی

رایمگ

الروابط

المراكز ذات الصلة

دعامة

الصفحات الرسمية