ارائه روشی جدید برای کسب مهارت در یادگیری تقویتی با کمک خوشه‌بندی گراف

الموضوعات :

مرضیه داودآبادی فراهانی ¹ , ناصر مزینی ²

1 - دانشگاه علم و صنعت ایران
2 - دانشگاه علم و صنعت ایران

تاريخ الإرسال : 05 السبت , جمادى الثانية, 1438 تاريخ التأكيد : 02 الأحد , جمادى الثانية, 1439 تاريخ الإصدار : 20 الأحد , محرم, 1440

الکلمات المفتاحية: یادگیری تقویتی سلسله‌مراتبیگزینهانتزاع زمانیمهارتارزیابی مهارت‌هاخوشه‌بندی گراف,

ملخص المقالة :

یادگيري تقويتي، يكي از انواع يادگيري ماشين است كه در آن عامل با استفاده از تراکنش با محيط، به شناخت محیط و بهبود رفتار خود می‎پردازد. يكي از مشكلات اصلي الگوريتم‎هاي استاندارد يادگيري تقويتي مانند یادگیری Q اين است که نمی‎توانند مسایل بزرگ را در زمان قابل قبولی حل کنند. کسب خودکار مهارت‌ها می‌تواند به شکستن مسأله به زيرمسأله‎هاي کوچک‌تر و حل سلسله‌مراتبی آن کمک کند. با وجود نتایج امیدوارکننده استفاده از مهارت‌ها در یادگیری تقویتی سلسله‌مراتبی، در برخی تحقیقات دیگر نشان داده شد که بر اساس وظیفه مورد نظر، اثر مهارت‌ها بر کارایی یادگیری می‌تواند کاملاً مثبت یا منفی باشد و اگر به درستی انتخاب نشوند می‌توانند پیچیدگی حل مسأله‌ را افزایش دهند. از این رو یکی از نقاط ضعف روش‌های قبلی کسب خودکار مهارت‌ها، عدم ارزیابی هر یک از مهارت‌های کسب‌شده می‌باشد. در این مقاله روش‌های جدیدی مبتنی بر خوشه‌بندی گراف برای استخراج زیرهدف‌ها و کسب مهارت‌ها ارائه می‌گردد. همچنین معیارهای جدید برای ارزیابی مهارت‌ها مطرح می‌شود که با کمک آنها، مهارتهای نامناسب برای حل مسأله‌ حذف می‌گردند. استفاده از این روش‌ها در چندین محیط آزمایشگاهی افزایش سرعت یادگیری را به شکل قابل ملاحظه‌ای نشان می‌دهد.

المصادر:

[1] A. G. Barto and S. Mahadevan, "Recent advances in hierarchical reinforcement learning," Discrete Event Dynamic Systems, vol. 13, no. 4, pp. 341-379, Jan. 2003.
[2] R. S. Sutton and A. G. Barto, "Reinforcement learning: an introduction," IEEE Trans. on Neural Networks, vol. 9, no. 5, pp. 1054-1054, Sep. 1998.
[3] W. Moerman, Hierarchical Reinforcement Learning: Assignment of Behaviours to Subpolicies by Self-Organization, Ph.D Thesis, Cognitive Artificial Intelligence, Utrecht University, 2009.
[4] J. Pfau, Plans as a Means for Guiding Reinforcement Learner, Master Thesis, Department of Information Systems, University of Melbourn, 2008.
[5] T. Mann and S. Mannor, "Scaling up approximate value iteration with options: better policies with fewer iterations," in Proc. of the 31st Int. Conf. on Machine Learning, vol. 1, pp. 127-137, Beijing, China, 21-26 Jun. 2014.
[6] A. McGovern and R. S. Sutton, Macro-Actions in Reinforcement Learning: An Empirical Analysis, University of Massachusetts, Department of Computer Science, Tech. Rep, pp. 98-70, 1998.
[7] N. K. Jong, T. Hester, and P. Stone, "The utility of temporal abstraction in reinforcement learning," in Proc. of the 7th Int. Joint Conf. on Autonomous Agents and Multiagent Systems, vol. 1, pp. 299-306, Estoril, Portugal, 12-16 May 2008.
[8] R. S. Sutton, D. Precup, and S. Singh, "Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning," Artificial Intelligence, vol. 112, no. 1-2, pp. 181-211, Aug. 1999.
[9] T. Dietterich, "An overview of MAXQ hierarchical reinforcement learning," Abstraction, Reformulation, and Approximation, pp. 26-44, 2000.
[10] M. Stolle, Automated Discovery of Options in Reinforcement Learning, M.Sc Thesis, McGill University, 2004.
[11] A. McGovern and A. G. Barto, "Automatic discovery of subgoals in reinforcement learning using diverse density," in Proc. Int. Workshop Conf. Machine Learning, pp. 361-368, 28 Jun-1 Jul. 2001.
[12] S. Mannor, I. Menache, A. Hoze, and U. Klein, "Dynamic abstraction in reinforcement learning via clustering," in Proc. of the 21st Int. Conf. on Machine Learning, p. 71, Banff, Alberta, Canada, 4-8 Jul. 2004.
[13] O. Simsek and A. Barto, "Identifying useful subgoals in reinforcement learning by local graph partitioning," in Proc. of the 22nd Int. Conf. on Machine Learning, pp. 816-823, Bonn, Germany, 7-11 Aug. 2005.
[14] A. Jonsson and A. Barto, "A causal approach to hierarchical decomposition of factored MDPs," in Proc. of the 22nd Int. Conf. on Machine Learning, pp. 401-408, Bonn, Germany, 7-11 Aug. 2005.
[15] N. Mehta, S. Ray, P. Tadepalli, and T. Dietterich, "Automatic discovery and transfer of MAXQ hierarchies," in Proc. of the 25th Int. Conf. on Machine Learning, pp. 648-655, Helsinki, Finland, 5-9 Jul. 2008.
[16] P. Zang, P. Zhou, D. Minnen, and C. Isbell, "Discovering options from example trajectories," in Proc. of the 26th Annual Int. Conf. on Machine Learning, pp. 1217-1224, Montreal, Quebec, Canada, 14-18 Jun. 2009.
[17] I. Menache, S. Mannor, and N. Shimkin, "Q-cut-dynamic discovery of sub-goals in reinforcement learning," in Proc. 13th European Conf. on Machine Learning, ECML'02, pp. 187-195, 19-23 Aug. 2002.
[18] O. Simsek, Behavioral Building Blocks for Autonomous Agents: Description, Identification, and Learning, Ph.D. Thesis, Department of Computer Science, University of Massachusetts Amherst, 2008.
[19] V. Mathew, K. Peeyush, and B. Ravindran, "Abstraction in reinforcement learning in terms of metastability," in Proc. of the 10th European Workshop on Reinforcement Learning, EWRL'12, pp. 1-14, 2012.
[20] C. C. Chiu and V. W. Soo, "Automatic complexity reduction in reinforcement learning," Computational Intelligence, vol. 26, no. 1, pp. 1-25, Feb. 2010.
[21] J. H. Metzen and F. Kirchner, "Incremental learning of skill collections based on intrinsic motivation," Frontiers in Neurorobotics, vol. 7, p. 11, Jan. 2013.
[22] P. Bacon and D. Precup, "Using label propagation for learning temporally abstract actions in reinforcement learning," in Proc. of the Workshop on Multiagent Interaction Networks, pp. 357-368, 2013.
[23] M. Davoodabadi and H. Beigy, "A new method for discovering subgoals and constructing options in reinforcement learning," in Proc. IICAI, pp. 441-450, 2011.
[24] O. Simsek and A. G. Barto, "Skill characterization based on betweenness," in Proc. 21th Int. Conf. on Neural Information Processing Systems, NIPS'08, pp. p.1497-1504, Vancouver, Canada, 8-10 Dec. 2008.
[25] A. A. Rad, M. Hasler, and P. Moradi, "Automatic skill acquisition in reinforcement learning using connection graph stability centrality," in Proc. of 2010 IEEE Int. Symp. on Circuits and Systems, ISCAS'10, pp. 697-700, Paris, France, 30 May-2 Jun. 2010.
[26] L. J. Lin, "Self-improving reactive agents based on reinforcement learning, planning and teaching," Machine Learning, vol. 8, no. 3-4, pp. 293-321, May 1992.
[27] R. S. Sutton, D. Precup, and S. Singh, "Intra-option learning about temporally abstract actions," in Proc. of the Fifteenth Int. Conf. on Machine Learning, pp. 556-564, 1998.
[28] M. E. J. Newman and M. Girvan, "Finding and evaluating community structure in networks," Physical Review E, vol. 69, no. 2, p. 026113, Feb. 2004.
[29] U. N. Raghavan, R. Albert, and S. Kumara, "Near linear time algorithm to detect community structures in large-scale networks," Physical Review E, vol. 76, no. 3, p. 036106, Sep. 2007.
[30] M. E. J. Newman, "Fast algorithm for detecting community structure in networks," Physical Review E, vol. 69, no. 6, p. 066133, Jun. 2004.
[31] K. Merrick, Modelling Motivation for Experience-Based Attention Focus in Reinforcement Learning, Ph.D. Thesis, School of Information Technologies, the University of Sydney, 2007.
[32] M. Davoodabadi and N. Mozayani, "Automatic construction and evaluation of options in reinforcement learning," Submitted in Artificial Intelligence, https://www.journals.elsevier.com/artificial-intelligence/
[33] A. G. Barto, S. Singh, and N. Chentanez, "Intrinsically motivated learning of hierarchical collections of skills," in Proc. of the 3rd Int. Conf. on Development and Learning, ICDL’04, pp. 112-119, Salk Institute, San Diego, USA, Oct. 2004.
[34] J. Murata, "Controlled use of subgoals in reinforcement learning," Robotics, Automation and Control, pp. 167-182, 2008.

شارک

عنوان URL للمقالة

ارائه روشی جدید برای کسب مهارت در یادگیری تقویتی با کمک خوشه‌بندی گراف

رایمگ

الروابط

المراكز ذات الصلة

دعامة

الصفحات الرسمية