Bailey T and Durrant-Whyte H. Simultaneous localisation and mapping (slam): Part ii — state of the art.Robotics and Automation Magazine 2006; 13(3):1–10.
Article
Google Scholar
Bianchi RAC.Uso de heurísticas para a aceleração do aprendizado por reforço. [PhD thesis]. São Paulo, SP: Universidade de São Paulo; 2004.
Google Scholar
Durrant-Whyte H and Bailey T. Simultaneous localisation and mapping (slam): the essential algorithms.Robotics and Automation Magazine 2006; 13(2):1–9. (part I)
Article
Google Scholar
Foster D and Dayan P. Structure in the space of value functions.Machine Learning 2002; 49(2/3):325–346.
Article
MATH
Google Scholar
Jarvis R. Robot path planning: complexity, flexibility and application scope. In:Proceedings of the 2006 international symposium on Practical cognitive agents and robots; 2006; Perth, Australia. New York, SP: ACM; 2006. p. 3–14.
Chapter
Google Scholar
Lee H, Shen Y, Yu CH, Singh G and Andrew Y. Ng: quadruped robot obstacle negotiation via reinforcement learning. In:Proceedings of the IEEE International Conference on Robotics and Automation; 2006; Orlando, Florida. Los Alamitos, CA: IEEE Computer Society Press; 2006. p. 3003–3010.
Google Scholar
Marthi B, Russell SJ, Latham D and Guestrin C. Concurrent hierarchical reinforcement learning. In: Kaelbling LP and Saffiotti A. (Eds.).Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence; 2005; Edinburgh. San Francisco, CA: Morgan Kaufmann; 2005. p. 779–785.
Google Scholar
Mcgovern A, Sutton RS and Fagg AH. Roles of macro-actions in accelerating reinforcement learning. In:Proceedings of the Grace Hopper Grace Hopper Celebration of Women in Computing; 1997; San Jose, CA. Palo Alto, CA: Anita Borg Institute for Women and Technology; 1997. p. 13–18.
Google Scholar
Mitchell TM.Machine learning. San Francisco: WCB/McGraw-Hill; 1997.
MATH
Google Scholar
Moore AW and Atkeson CG. Prioritized sweeping: reinforcement learning with less data and less real time.Machine Learning 1993; 13(1):237–285.
Google Scholar
Munos R and Moore A. Variable resolution discretization in optimal control.Machine Learning 2002; 49(2/3):291–323.
Article
MATH
Google Scholar
Murarka A, Sridharan M and Kuipers B. Detecting obstacles and drop-offs using stereo and motion cues for safe local motion. In:Proceedings of International Conference on Intelligent Robots and Systems; 2008; Nice, France. Los Alamitos, CA: IEEE Computer Society Press; 2008. p. 702–708.
Google Scholar
Parr R and Russell S. Reinforcement learning with hierarchies of machines. In:Proceedings of 10 Advances in Neural Information Processing Systems; 1998; Denver, CO. Cambridge, MA: The MIT Press; 1998.
Google Scholar
Precup D, Sutton RS and Singh SP. Theoretical results on reinforcement learning with temporally abstract behaviors. In:Proceedings of the Tenth European Conference on Machine Learning; 1998; Berlin. New York: Springer; 1998. p. 382–393.
Google Scholar
Ramon J, Driessens K and Croonenborghs T. Transfer learning in reinforcement learning problems through partial policy recycling. In:Proceedings of the 18 European Conference on Machine Learning; 2007; Warsaw. New York, NY: Springer; 2000. p. 699–707.
Google Scholar
Reynolds SI. Decision boundary partitioning: variable resolution model-free reinforcement learning. In:Proceedings of the 17 International Conference on Machine Learning; 2000; Palo Alto, CA. San Francisco, CA: Morgan Kaufmann; 2000. p. 783–790.
Google Scholar
Ross SM.Applied probability models with optimization applications. San Francisco: Holden-Day; 1970.
MATH
Google Scholar
Rummery GA and Niranjan M.On-line q-learning using connectionist systems. Cambridge: Cambridge University; 1994. (technical report CUED/F-INFENG/TR 166).
Google Scholar
Selvatici AHP and Costa AHR. A hybrid adaptive architecture for mobile robots based on reactive behaviors. In:Proceedings of the 15 International Conference on Hybrid Intelligent Systems; 2005; Rio de Janeiro. Los Alamitos: IEEE Computer Society; 2005. p. 29–34.
Google Scholar
Strandberg M. Robot path planning: an object oriented approach. [PhD Thesis]. Sweden: Royal Institute of Technology; 2004.
Google Scholar
Sutton RS and Barto AG.Reinforcement learning: an introduction. Cambridge: MIT Press; 1998.
Google Scholar
Sutton RS. Learning to predict by method of temporal differences.Machine Learning. 1988; 3(1):9–44.
Google Scholar
Sutton RS. Integrated architectures for learning, planning and reacting based on approximating dynamic programming. In:Proceedings of the 7 International Conference on Machine Learning; 1990; Austin, TX. San Francisco, CA: Morgan Kaufmann; 1990. p. 216–224.
Google Scholar
Watkins JCHC.Learning from Delayed Rewards. [PhD thesis]. Cambridge: University of Cambridge; 1989.
Google Scholar