مدلی مبتنی بر آنتروپی و اتوماتاهاي یادگیر برای حل بازی‌های تصادفی

الموضوعات : مهندسی برق و کامپیوتر

1 - دانشگاه آزاد اسلامی واحد قزوین
2 - دانشگاه صنعتی امیرکبیر

تاريخ الإرسال : 13 الأربعاء , صفر, 1437 تاريخ التأكيد : 16 السبت , صفر, 1437 تاريخ الإصدار : 09 الإثنين , رجب, 1431

الکلمات المفتاحية: آنتروپی اتوماتاهاي يادگير بازی‌های تصادفی سيستم‌هاي چندعامله,

ملخص المقالة :

بازی‌های غیر قطعی (تصادفی) به‌عنوان توسعه‌ای از فرآیندهای تصادفی مارکوف با چندین عامل در سیستم‌های چندعامله و مدل‌سازی آنها حائز اهمیت بوده و به‌عنوان چارچوبی مناسب در تحقیقات یادگیری تقویتی چند‌عامله به‌کار رفته‌اند. در حال حاضر اتوماتاهای یادگیر به‌عنوان ابزاری ارزشمند در طراحی الگوریتم‌های یادگیری چندعامله به‌کار رفته‌اند. در این مقاله مدلی مبتنی بر اتوماتای یادگیر و مفهوم آنتروپی برای حل بازی‌های غیر قطعی و پیداکردن سیاست بهینه در این بازی‌ها ارائه شده است. در مدل پیشنهادی به‌ازای هر عامل در هر حالت از محیط بازی یک اتوماتای یادگیر با ساختار متغیر از نوع S قرار داده شده است که اعمال بهینه را در هر حالت یاد می‌گیرند. تعداد اعمال هر اتوماتا با توجه به همسایگان مجاور هر حالت تعیین شده و ترکیب اعمال اتوماتاها حالت بعدی محیط را انتخاب می‌کند. در مدل پیشنهادی از آنتروپی بردار احتمالات اتوماتای یادگیر حالت جدید برای کمک به پاداش‌دهی اتوماتاها و بهبود یادگیری استفاده شده است. برای بررسی و تحلیل رفتار الگوریتم یادگیری پارامتری به‌نام آنتروپی کلی تعریف گردیده که میزان همگرایی را در الگوریتم یادگیری بیان می‌کند. در نهایت الگوریتمی اصلاح‌یافته با ایجاد تعادل بین جستجو و استناد بر تجربیات پیشنهاد شده است. نتایج آزمایش‌ها نشان می‌دهد الگوريتم ارائه‌شده از کارایی مناسبی از هر دو جنبه هزينه و سرعت رسيدن به راه حل بهينه برخوردار است.

المصادر:

[1] G. Weiss, Multi - Agent Systems: A Modern Approach to Distributed Artificial Intelligence, Cambridge, MA: MIT Press, 1999.
[2] H. Van Dyke Parunak, "A practitioners' review of industrial agent applications, autonomous agents and multi-agent systems," Autonomous Agents and Multi-Agent Systems, vol. 3, no. 4, pp. 389-407, Dec. 2000.
[3] A. K. Goel, V. Kummar, and S. Sirivasan, "Application of multi -agent system & agent coordination," in Proc. 2nd National Conf. Mathematical Techiniques: Emerging Paradigms for Electronics and IT Industries, MATEIT''2008, pp. 328-337, New Dehli, India, Sep. 2008.
[4] L. Busniu, R. Babuska, and B. Schutter, "A comprehensive survey of multi-agent reinforcement learning," IEEE Trans. on System, Man, Cybern., vol. 38, no. 2, pp. 156-171, Mar. 2008.
[5] M. L. Littman, "Markov games as a framework for multi-agent reinforcement learning," in Proc. ICML'94, pp. 157-163, Mar. 1994.
[6] J. Osborne and A. Rubinstein, A Course in Game Theory, Cambridge, MA: MIT Press, 1994.
[7] J. Hu and M. P. Wellman, "Online learning about other agents in a dynamic multi-agent system," J. of Cognitive System Research, vol. 2, no. 1, pp. 67-79, Apr. 2001.
[8] J. Hu and M. P. Wellman, "Nash Q-learning for general - sum stochastic games," J. of Machine Learning Research, vol. 4, pp. 1039-1069, Nov. 2003.
[9] A. Greenwald and K. Hall, "Correlated Q - learning," in Proc. of the Twentieth Int. Conf. on Machine Learning, ICML'2003, pp. 242-249, Aug. 2003.
[10] M. Song, G. Gu, and G. Zhang, "Pareto-Q learning algorithm for cooperative agents in general-sum games," in Proc. CEEMAS 2005, pp. 576-578, Sep. 2005.
[11] M. Song, J. Bai, and R. Chen, "A new learning algorithm for cooperative agents in general - sum games," in Proc. of the Sixth Int. Conf. on Machine Learning and Cybernetics, pp. 50-54, Hong Kong, Aug. 2007.
[12] M. R. Khojasteh and M. R. Meybodi, "Evaluating learning automata as a model for cooperation in complex multi - agent domains," Lecture Notes in Artificial Intelligence, Springer Verlag, LNAI 4434, pp. 409-416, ???. 2007.
[13] A. Now´e, K. Verbeeck, and M. Peeters, "Learning automata as a basis for multi - agent reinforcement learning," Lecture Notes in Computer Science, vol. 3898, pp. 71-85, ???. 2006.
[14] P. Vrancx, K. Verbeeck, and A. Nowe, "Decentralized learning in markov games," IEEE Trans. on Systems, Man and Cybernetics- pt. B: Cybernetics, vol. 38, no. 4, pp. 976-981, Aug.. 2008.
[15] P. S. Sastry, V. Phansalkar, and M. A. L. Thathachar, "Decentralized learning of Nash equilibria in multi-person stochastic games with incomplete information," IEEE Trans. on Systems, Man and Cybernetics, vol. 24, no. 5, pp. 769-777, May 1994.
[16] R. A. Howard, Dynamic Programming and Markov Processes, MIT Press, 1960.
[17] R. S. Sutton and A. G. Barto, Reinforcement Learning I: an Introduction, MIT Press, 1998.
[18] F. Thusijsman, Optimality and Equilibria in Stochastic Games, CWI-tract 82, Centre for Mathematics and Computer Science, Amsterdam, 1992.
[19] J. F. Nash Jr., "Non-cooperative games," Annals of Mathematics, vol. 54, no. 2, pp. 286-295, Sep. 1951.
[20] A. M. Fink, "Equilibrium in a stochastic N-person game," J. of Science in Hiroshima University, Series A - I, vol. 28, pp. 89-93, 1964.
[21] H. Qio, F. Szidarovszky, J. Rozenblit and L. Yong, "Multi-agent learning model with bargaining," in Proc. of the 38th Conf. on Winter Simulation, pp. 934-940, Dec. 2006.
[22] K. S. Narendra and M. A. L Thathachar, Learning Automata: an Introduction, Prentice Hall, 1989.
[23] M. A. L. Thathachar and P. S. Sastry, "Varieties of learning automata: an overview," IEEE Trans. on Systems, Man, and Cybernetics - Pt B.: Cybernetics, vol. 32, no. 6, pp. 711-722, Dec. 2002.
[24] M. A. L. Thathachar and P. S. Sastry, Networks of Learning Automata: Techniques for Online Stochastic Optimization, Kluwer Academic Publishers, 2004.
[25] M. Costa, A. L. Goldberger, and C. K. Peng, "Multi-scale entropy analysis of complex physiologic time series," Physical Review Letters, vol. 89, pp. 68101-68104, Aug. 2002.
[26] Z. Dianhu, F. Shaohui, and D. Xiaojun, "Entropy - a measure of uncertainty of random variable," Systems Engineering and Electronics, vol. 11, pp. 1-3, Dec. 1997.

شارک

عنوان URL للمقالة

مدلی مبتنی بر آنتروپی و اتوماتاهاي یادگیر برای حل بازی‌های تصادفی

رایمگ

الروابط

المراكز ذات الصلة

دعامة

الصفحات الرسمية