Search:

Curosity Driven Reinforcement Learning in POMDPs

Colloquium

Speaker(s)
Nasim Mafi
Date
Thursday, April 19, 2012
Time
4:15 PM – 5:30 PM
Location
Rose Hills Theater

Robots, like biological organisms, are typically equipped with multiple sensors designed to acquire many different types of information (e.g., visible light, audio, touch). Each of these sensors may have different levels of noise, effective ranges, or fields of view, and accuracy may degrade as a function of distance from the robot. Additionally, it is often not possible to run all sensors at all locations simultaneously, either because of limitations in processing power, or because certain sensors are mutually exclusive. Therefore acquiring information about the world often requires navigating around and carefully selecting different sensing actions. On the other hand we usually make robots to perform given tasks for us. So just moving around and using sensors effectively is not enough. The problem of choosing both sensing and task-specific actions can be viewed as a sequential decision problem under uncertainty in which the agent only has a partial knowledge of the current state of the world. We formulate this as a ‘partially observable Markov decision process’ (POMDP), for which ‘reinforcement learning’ (RL) can be used to learn policies from experience.

We then propose a mechanism for speeding up RL in POMDPs by using an information-based shaping reward which can be automatically derived from the belief distribution. This reward acts as an intrinsic curiosity that drives the agent to reduce uncertainty even when it is not skilled at a task yet. We show through several experiments in a virtual Market world that curiosity significantly speeds up learning and the learned policies are improved over policies learned using extrinsic rewards only.


Bio: Nassim Mafi is currently a Ph.D. student in the computer science department at the University of Arizona and holds a MS degree from the same department. She is a researcher in Arizona Robotics Research Group (ARRG). Her research interests include machine learning, autonomous agents, reinforcement learning (RL), and partially observable Markov decision processes (POMDPs). In her research, she uses information-based intrinsic reward mechanisms to learn efficient policies in POMDPs using RL, which simultaneously speeds up learning as well. The results of this research can be easily applied to any system of autonomous agents performing exploration and discovery tasks (such as Mars explorers and household assistant robots) and any other problems that can be formulated as a POMDP. So far, the achievements of this on-going research has been published in proceedings of IEEE ICDL 2010 and 2011.