Skip to main content

Compulsory Flow Q-Learning: an RL algorithm for robot navigation based on partial-policy and macro-states

Abstract

Reinforcement Learning is carried out on-line, through trial-and-error interactions of the agent with the environment, which can be very time consuming when considering robots. In this paper we contribute a new learning algorithm, CFQLearning, which uses macro-states, a low-resolution discretisation of the state space, and a partial-policy to get around obstacles, both of them based on the complexity of the environment structure. The use of macro-states avoids convergence of algorithms, but can accelerate the learning process. In the other hand, partial-policies can guarantee that an agent fulfils its task, even through macro-state. Experiments show that the CFQ-Learning performs a good balance between policy quality and learning rate.

References

  1. Bailey T and Durrant-Whyte H. Simultaneous localisation and mapping (slam): Part ii — state of the art.Robotics and Automation Magazine 2006; 13(3):1–10.

    Article  Google Scholar 

  2. Bianchi RAC.Uso de heurísticas para a aceleração do aprendizado por reforço. [PhD thesis]. São Paulo, SP: Universidade de São Paulo; 2004.

    Google Scholar 

  3. Durrant-Whyte H and Bailey T. Simultaneous localisation and mapping (slam): the essential algorithms.Robotics and Automation Magazine 2006; 13(2):1–9. (part I)

    Article  Google Scholar 

  4. Foster D and Dayan P. Structure in the space of value functions.Machine Learning 2002; 49(2/3):325–346.

    Article  MATH  Google Scholar 

  5. Jarvis R. Robot path planning: complexity, flexibility and application scope. In:Proceedings of the 2006 international symposium on Practical cognitive agents and robots; 2006; Perth, Australia. New York, SP: ACM; 2006. p. 3–14.

    Chapter  Google Scholar 

  6. Lee H, Shen Y, Yu CH, Singh G and Andrew Y. Ng: quadruped robot obstacle negotiation via reinforcement learning. In:Proceedings of the IEEE International Conference on Robotics and Automation; 2006; Orlando, Florida. Los Alamitos, CA: IEEE Computer Society Press; 2006. p. 3003–3010.

    Google Scholar 

  7. Marthi B, Russell SJ, Latham D and Guestrin C. Concurrent hierarchical reinforcement learning. In: Kaelbling LP and Saffiotti A. (Eds.).Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence; 2005; Edinburgh. San Francisco, CA: Morgan Kaufmann; 2005. p. 779–785.

    Google Scholar 

  8. Mcgovern A, Sutton RS and Fagg AH. Roles of macro-actions in accelerating reinforcement learning. In:Proceedings of the Grace Hopper Grace Hopper Celebration of Women in Computing; 1997; San Jose, CA. Palo Alto, CA: Anita Borg Institute for Women and Technology; 1997. p. 13–18.

    Google Scholar 

  9. Mitchell TM.Machine learning. San Francisco: WCB/McGraw-Hill; 1997.

    MATH  Google Scholar 

  10. Moore AW and Atkeson CG. Prioritized sweeping: reinforcement learning with less data and less real time.Machine Learning 1993; 13(1):237–285.

    Google Scholar 

  11. Munos R and Moore A. Variable resolution discretization in optimal control.Machine Learning 2002; 49(2/3):291–323.

    Article  MATH  Google Scholar 

  12. Murarka A, Sridharan M and Kuipers B. Detecting obstacles and drop-offs using stereo and motion cues for safe local motion. In:Proceedings of International Conference on Intelligent Robots and Systems; 2008; Nice, France. Los Alamitos, CA: IEEE Computer Society Press; 2008. p. 702–708.

    Google Scholar 

  13. Parr R and Russell S. Reinforcement learning with hierarchies of machines. In:Proceedings of 10 Advances in Neural Information Processing Systems; 1998; Denver, CO. Cambridge, MA: The MIT Press; 1998.

    Google Scholar 

  14. Precup D, Sutton RS and Singh SP. Theoretical results on reinforcement learning with temporally abstract behaviors. In:Proceedings of the Tenth European Conference on Machine Learning; 1998; Berlin. New York: Springer; 1998. p. 382–393.

    Google Scholar 

  15. Ramon J, Driessens K and Croonenborghs T. Transfer learning in reinforcement learning problems through partial policy recycling. In:Proceedings of the 18 European Conference on Machine Learning; 2007; Warsaw. New York, NY: Springer; 2000. p. 699–707.

    Google Scholar 

  16. Reynolds SI. Decision boundary partitioning: variable resolution model-free reinforcement learning. In:Proceedings of the 17 International Conference on Machine Learning; 2000; Palo Alto, CA. San Francisco, CA: Morgan Kaufmann; 2000. p. 783–790.

    Google Scholar 

  17. Ross SM.Applied probability models with optimization applications. San Francisco: Holden-Day; 1970.

    MATH  Google Scholar 

  18. Rummery GA and Niranjan M.On-line q-learning using connectionist systems. Cambridge: Cambridge University; 1994. (technical report CUED/F-INFENG/TR 166).

    Google Scholar 

  19. Selvatici AHP and Costa AHR. A hybrid adaptive architecture for mobile robots based on reactive behaviors. In:Proceedings of the 15 International Conference on Hybrid Intelligent Systems; 2005; Rio de Janeiro. Los Alamitos: IEEE Computer Society; 2005. p. 29–34.

    Google Scholar 

  20. Strandberg M. Robot path planning: an object oriented approach. [PhD Thesis]. Sweden: Royal Institute of Technology; 2004.

    Google Scholar 

  21. Sutton RS and Barto AG.Reinforcement learning: an introduction. Cambridge: MIT Press; 1998.

    Google Scholar 

  22. Sutton RS. Learning to predict by method of temporal differences.Machine Learning. 1988; 3(1):9–44.

    Google Scholar 

  23. Sutton RS. Integrated architectures for learning, planning and reacting based on approximating dynamic programming. In:Proceedings of the 7 International Conference on Machine Learning; 1990; Austin, TX. San Francisco, CA: Morgan Kaufmann; 1990. p. 216–224.

    Google Scholar 

  24. Watkins JCHC.Learning from Delayed Rewards. [PhD thesis]. Cambridge: University of Cambridge; 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Silva, V.F.d., Costa, A.H.R. Compulsory Flow Q-Learning: an RL algorithm for robot navigation based on partial-policy and macro-states. J Braz Comp Soc 15, 65–75 (2009). https://doi.org/10.1007/BF03194507

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03194507

Keywords