Acessibilidade / Reportar erro

Desempenho de algoritmos de aprendizagem por reforço sob condições de ambiguidade sensorial em robótica móvel

We analyzed the performance variation of reinforcement learning algorithms in ambiguous state situations commonly caused by the low sensing capability of mobile robots. This variation is caused by violation of the Markov condition, which is important to guarantee convergence of these algorithms. Practical consequences of this violation in real systems are not firmly established in the literature. The algorithms assessed in this study were Q-learning, Sarsa and Q(lambda), and the experiments were performed on a Magellan Pro™robot. A method to build variable resolution cognitive maps of the environment was implemented in order to provide realistic data for the experiments. The implemented learning algorithms presented satisfactory performance on real systems, with a graceful degradation of efficiency due to state ambiguity. The Q-learning algorithm accomplished the best performance, followed by the Sarsa algorithm. The Q(lambda) algorithm had its performance restrained by experimental parameters. The cognitive map learning method revealed to be quite efficient, allowing adequate algorithms assessment.

Autonomous mobile robots; reinforcement learning; map learning; neural networks


Sociedade Brasileira de Automática Secretaria da SBA, FEEC - Unicamp, BLOCO B - LE51, Av. Albert Einstein, 400, Cidade Universitária Zeferino Vaz, Distrito de Barão Geraldo, 13083-852 - Campinas - SP - Brasil, Tel.: (55 19) 3521 3824, Fax: (55 19) 3521 3866 - Campinas - SP - Brazil
E-mail: revista_sba@fee.unicamp.br