استخراج گذرگاهها با استفاده از تشخیص اشیا در یادگیری تقویتی
الموضوعات :بهزاد غضنفری 1 , ناصر مزینی 2 , محمدرضا جاهد مطلق 3
1 - دانشگاه علم و صنعت ایران
2 - دانشگاه علم و صنعت ایران
3 - دانشگاه علم و صنعت ایران
الکلمات المفتاحية: يادگيري تقويتي خوشهبندي اشيا يادگيري تقويتي سلسله مراتبي اقدامات گسترشيافته زماني,
ملخص المقالة :
اين مقاله روش جديدي را مطرح ميکند که قادر به استخراج گذرگاهها بهصورت اتوماتيک براي عامل يادگيري تقويتي است. روش پيشنهادي از سيستمهاي بيولوژيکي، رفتار و مسيريابي حيوانات الهام گرفته شده است و بهواسطه تعاملات عامل با محيط پيرامونياش عمل ميکند. عامل با استفاده از خوشهبندي و تشخيص اشيا بهصورت سلسله مراتبي، نشانههايي را پيدا ميکند. اگر اين نشانهها در فضاي اقدام به هم نزديک باشند، گذرگاهها با استفاده از حالتهاي بين آنها استخراج ميشوند. نتايج آزمايشها بهبود قابل ملاحظهاي را در فرايند يادگيري تقويتي در مقايسه با ساير روشهاي مشابه نشان ميدهد.
[1] L. Kaelbling, M. Littman, and A. Moore, "Reinforcement learning: a survey," J. of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.
[2] M. Ghavamzadeh, S. Mahadevan, and R. Makar, "Hierarchical multi-agent reinforcement learning," Autonomous Agents and Multi-Agent Systems, vol. 13, no. 2, pp. 197-229, Sep. 2006.
[3] A. Barto and S. Mahadevan, "Recent advances in hierarchical reinforcement learning markov and semi-markov decision processes," Discrete Event Dynamic Systems, vol. 13, pp. 41-77, 2003.
[4] R. S. Sutton, D. Precup, and S. Singh, "Between MDPs and semi - MDPs: a framework for temporal abstraction in reinforcement learning," Artificial Intelligence, vol. 112, no. 1-2, pp. 181-211, Aug. 1999.
[5] R. Parr and S. Russell, "Reinforcement learning with hierarchies of machines," in Proc. Conf. on Advances in Neural Information Processing Systems, pp. 1043-1049, 1997.
[6] T. G. Dietterich, "Hierarchical reinforcement learning with the MAXQ value function decomposition," J. of Artificial Intelligence Research, vol. 13, pp. 227-303, 2000.
[7] G. Kheradmandian and M. Rahmati, "Automatic abstraction in reinforcement learning using data mining techniques," Robotics and Autonomous Systems, vol. 57, no. 11, pp. 1119-1128, Nov. 2009.
[8] S. Mannor, I. Menache, A. Hoze, and U. Klein, "Dynamic abstraction in reinforcement learning via clustering," in Proc. 21st Int. Conf. on Machine learning, ICML'04, p. 560-567, 2004.
[9] E. A. Mcgovern, Autonomous Discovery of Temporal Abstractions from Interaction with an Environment, Citeseer, 2002.
[10] C. Chiu and V. W. Soo, "Automatic complexity reduction in reinforcement learning," Computational Intelligence, vol. 26, no. 1, pp. 1-25, Feb. 2010.
[11] I. Menache, S. Mannor, and N. Shimkin, "Q - cut - dynamic discovery of sub-goals in reinforcement learning," in Proc. of the 13th European Conf. on Machine Learning, pp. 295-3062002.
[12] O. Simsek, A. P. Wolfe, and A. G. Barto, "Identifying useful subgoals in reinforcement learning by local graph partitioning," in Proc. of the 22nd Int. Conf. on Machine Learning , ICML'05, pp. 816-823, 2005.
[13] B. Digney, "Learning hierarchical control structures for multiple tasks and changing environments," in: Proc. of 5th Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats 5, pp. 321-330, 1998.
[14] O. Simsek and A. Barto, "Skill characterization based on betweenness," in Proc. 22nd Annual Conf. on Advances in Neural Information Processing Systems, NIPS'08, pp. 1497-1504, 2008.
[15] M. Riesenhuber and T. Poggio, "Hierarchical models of object recognition in cortex," Nature Neuroscience, vol. 2, no. 11, pp. 1019-25, Nov. 1999.
[16] T. S. Collett and P. Graham, "Animal navigation: path integration, visual landmarks, and cognitive maps," Current Biology, vol. 14, no. 12, pp. 475-457, Jun. 2004.
[17] S. Thrun, "Learning metric - topological maps for indoor mobile robot navigation," Artificial Intelligence, vol. 99, no. 1, pp. 21-71, 1998.
[18] N. Mehta, S. Ray, P. Tadepalli, and T. Dietterich, "Automatic discovery and transfer of task hierarchies in reinforcement learning," AI Magazine, vol. 32, no. 1, p. 35, 2011.
[19] A. Jonsson, A Causal Approach to Hierarchical Decomposition in Reinforcement Learning, Ph. D. Thesis, University of Massachusetts Amherst, Feb. 2006.
[20] B. Hengst, Discovering Hierarchy in Reinforcement Learning, Ph. D. Thesis, University of New South Wales, Australia, Dec. 2003.
[21] S. Thrun and A. Schwartz, "Finding structure in reinforcement learning," Proc. 5th Annual Conf. on Advances in Neural Information Processing Systems, NIPS'95, pp. 385-392, 1995.
[22] C. C. Chiu, "Subgoal identification for reinforcement learning and planning in multiagent problem solving," in Proc. of 5th German Conf. on Multiagent System Technologies, pp. 37-48, 2007.