Reinforcement Learning for robot navigation applications in constrained environments

Marta Barbero
Presentation MSc presentation
Date 2018-09-04
Time 16:00
Location Carré 2G

Making a robot arm able to reach a target position with its end-effector in a constrained environment implies finding a trajectory from the initial configuration of the robot joints to the goal configuration, avoiding collisions with existing obstacles.

A practical example of this situation is the environment in which a PIRATE robot (i.e. Pipe Inspection Robot for AuTonomous Exploration) operates. Although the manipulator is able to detect the environment and obstacles using its laser sensors (or camera), this knowledge however is only approximate. One method for a robust motion path planner in these conditions is to use a learned movement policy by applying reinforcement learning algorithms. Reinforcement leaning is an automatic learning  technique which tries to determine how an agent has to select the actions to be performed, given the current state of the environment in which it is located, with the aim of maximizing a total predefined reward.

Thus, this project focuses on verifying whether an agent, i.e. a planar manipulator, is able to independently learn how to navigate in a constrained environment with obstacles applying reinforcement learning techniques. The studied algorithms are SARSA and Q-learning. To achieve that objective, a MATLAB-based simulation environment and a physical setup have been implemented, and tests were performed with different configurations.

After a deep analysis of the obtained results, it has been proven that both algorithms allow the agent to autonomously learn the required motion actions to be able to navigate inside constrained pipe-like environments. Even though, SARSA has been demonstrated to be a more "conservative" approach with respect to Q-learning: if there is a risk along the shortest path towards the goal (e.g. an obstacle), Q-learning will probably collide with it and then learn a policy exactly along that risky trajectory to minimize the needed actions to reach the target. On the other hand, SARSA will try to avoid this path completely, preferring a longer but safer trajectory. Once a full path has been learned, this acquired knowledge can be easily applied to a similar but not equal configuration of the pipe in a transfer learning perspective. In this way, the algorithms have been demonstrated to be able to quickly adapt to different pipes layouts and to different goal locations. 

Posted on Monday, July 9, 2018