Marta Barbero

Marta Barbero

Active student


Master's Thesis Project: Reinforcement Learning for robot navigation applications in constrained environments

Making a robot arm able to reach a target position with its end-effector in a constrained environment implies finding an optimal path from the initial configuration of the robot joints to the goal configuration, avoiding collisions with possible obstacles. A practical example of this situation is the environment in which a PIRATE robot (i.e. Pipe Inspection Robot for AuTonomous Exploration) must operate: it does not know in advance the environment in which it will act and it cannot detect obstacles through a camera due to low light conditions. Many researches have been done in this perspective, but some of the most interesting results have been reached applying reinforcement learning algorithms. Reinforcement leaning is an automatic learning technique which tries to determine how an agent has to select the actions to be performed, given the current state of the environment in which it is located, with the aim of maximizing a total predefined reward.

Thus, this project will focus on verifying whether an agent, i.e. a planar manipulator, is able to independently learn how to navigate in a constrained environment with obstacles applying reinforcement learning. To reach this goal, two types of reinforcement learning algorithms will be compared: first, a straightforward Q-learning algorithm will be applied, reducing and, consequently, discretizing the environment in which the manipulator will move with a grid, to allocate to memory all the required state-action pairs. Afterwards, a Deep Q-learning algorithm will be employed to operate continuous time control and avoid loss of information due to discretization, combing the cited Q-leaning approach with a value-function approximation system, i.e. deep neural networks able to estimate state-action pairs knowing the current state of the environment.