استخراج گذرگاه‌ها با استفاده از تشخیص اشیا در یادگیری تقویتی

الموضوعات : electrical and computer engineering

بهزاد غضنفری ¹ , ناصر مزینی ² , محمدرضا جاهد مطلق ³

1 - دانشگاه علم و صنعت ایران
2 - دانشگاه علم و صنعت ایران
3 - دانشگاه علم و صنعت ایران

تاريخ الإرسال : 17 الأحد , صفر, 1437 تاريخ التأكيد : 17 الأحد , صفر, 1437 تاريخ الإصدار : 01 الخميس , شعبان, 1433

الکلمات المفتاحية: يادگيري تقويتي خوشه‌بندي اشيا يادگيري تقويتي سلسله مراتبي اقدامات گسترش‌يافته زماني,

ملخص المقالة :

اين مقاله روش جديدي را مطرح مي‌کند که قادر به استخراج گذرگاه‌ها به‌صورت اتوماتيک براي عامل يادگيري تقويتي است. روش پيشنهادي از سيستم‌هاي بيولوژيکي، رفتار و مسيريابي حيوانات الهام گرفته شده است و به‌واسطه تعاملات عامل با محيط پيراموني‌اش عمل مي‌کند. عامل با استفاده از خوشه‌بندي و تشخيص اشيا به‌صورت سلسله مراتبي، نشانه‌هايي را پيدا مي‌کند. اگر اين نشانه‌ها در فضاي اقدام به هم نزديک باشند، گذرگاه‌ها با استفاده از حالت‌هاي بين آنها استخراج مي‌شوند. نتايج آزمايش‌ها بهبود قابل ملاحظه‌اي را در فرايند يادگيري تقويتي در مقايسه با ساير روش‌هاي مشابه نشان مي‌دهد.

المصادر:

[1] L. Kaelbling, M. Littman, and A. Moore, "Reinforcement learning: a survey," J. of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.
[2] M. Ghavamzadeh, S. Mahadevan, and R. Makar, "Hierarchical multi-agent reinforcement learning," Autonomous Agents and Multi-Agent Systems, vol. 13, no. 2, pp. 197-229, Sep. 2006.
[3] A. Barto and S. Mahadevan, "Recent advances in hierarchical reinforcement learning markov and semi-markov decision processes," Discrete Event Dynamic Systems, vol. 13, pp. 41-77, 2003.
[4] R. S. Sutton, D. Precup, and S. Singh, "Between MDPs and semi - MDPs: a framework for temporal abstraction in reinforcement learning," Artificial Intelligence, vol. 112, no. 1-2, pp. 181-211, Aug. 1999.
[5] R. Parr and S. Russell, "Reinforcement learning with hierarchies of machines," in Proc. Conf. on Advances in Neural Information Processing Systems, pp. 1043-1049, 1997.
[6] T. G. Dietterich, "Hierarchical reinforcement learning with the MAXQ value function decomposition," J. of Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
[7] G. Kheradmandian and M. Rahmati, "Automatic abstraction in reinforcement learning using data mining techniques," Robotics and Autonomous Systems, vol. 57, no. 11, pp. 1119-1128, Nov. 2009.
[8] S. Mannor, I. Menache, A. Hoze, and U. Klein, "Dynamic abstraction in reinforcement learning via clustering," in Proc. 21st Int. Conf. on Machine learning, ICML'04, p. 560-567, 2004.
[9] E. A. Mcgovern, Autonomous Discovery of Temporal Abstractions from Interaction with an Environment, Citeseer, 2002.
[10] C. Chiu and V. W. Soo, "Automatic complexity reduction in reinforcement learning," Computational Intelligence, vol. 26, no. 1, pp. 1-25, Feb. 2010.
[11] I. Menache, S. Mannor, and N. Shimkin, "Q - cut - dynamic discovery of sub-goals in reinforcement learning," in Proc. of the 13th European Conf. on Machine Learning, pp. 295-3062002.
[12] O. Simsek, A. P. Wolfe, and A. G. Barto, "Identifying useful subgoals in reinforcement learning by local graph partitioning," in Proc. of the 22nd Int. Conf. on Machine Learning , ICML'05, pp. 816-823, 2005.
[13] B. Digney, "Learning hierarchical control structures for multiple tasks and changing environments," in: Proc. of 5th Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats 5, pp. 321-330, 1998.
[14] O. Simsek and A. Barto, "Skill characterization based on betweenness," in Proc. 22nd Annual Conf. on Advances in Neural Information Processing Systems, NIPS'08, pp. 1497-1504, 2008.
[15] M. Riesenhuber and T. Poggio, "Hierarchical models of object recognition in cortex," Nature Neuroscience, vol. 2, no. 11, pp. 1019-25, Nov. 1999.
[16] T. S. Collett and P. Graham, "Animal navigation: path integration, visual landmarks, and cognitive maps," Current Biology, vol. 14, no. 12, pp. 475-457, Jun. 2004.
[17] S. Thrun, "Learning metric - topological maps for indoor mobile robot navigation," Artificial Intelligence, vol. 99, no. 1, pp. 21-71, 1998.
[18] N. Mehta, S. Ray, P. Tadepalli, and T. Dietterich, "Automatic discovery and transfer of task hierarchies in reinforcement learning," AI Magazine, vol. 32, no. 1, p. 35, 2011.
[19] A. Jonsson, A Causal Approach to Hierarchical Decomposition in Reinforcement Learning, Ph. D. Thesis, University of Massachusetts Amherst, Feb. 2006.
[20] B. Hengst, Discovering Hierarchy in Reinforcement Learning, Ph. D. Thesis, University of New South Wales, Australia, Dec. 2003.
[21] S. Thrun and A. Schwartz, "Finding structure in reinforcement learning," Proc. 5th Annual Conf. on Advances in Neural Information Processing Systems, NIPS'95, pp. 385-392, 1995.
[22] C. C. Chiu, "Subgoal identification for reinforcement learning and planning in multiagent problem solving," in Proc. of 5th German Conf. on Multiagent System Technologies, pp. 37-48, 2007.

شارک

عنوان URL للمقالة

استخراج گذرگاه‌ها با استفاده از تشخیص اشیا در یادگیری تقویتی

رایمگ

الروابط

المراكز ذات الصلة

دعامة

الصفحات الرسمية