Q-error as a Selection Mechanism in Modular Reinforcement-Learning Systems
Mark Ring and Tom Schaul
This paper introduces a novel multi-modular method for reinforcement learning. A multi-modular system is one which attempts to partition the learning task among a set of experts (modules), where each expert is incapable of solving the entire task by itself. There are many advantages to splitting up large tasks in this way, but existing methods face difficulties when choosing which module(s) should contribute to the agent's actions at any particular moment. We introduce a novel selection mechanism where every module, besides calculating a set of action values, also estimates its own error for the current input. The selection mechanism combines each module's estimate of long-term reward and self-error to produce a score by which the next module is chosen. As a result, the agent's modules can use their resources effectively, and efficiently divide up the task. The system is demonstrated to learn complex tasks even when the individual modules are instantiated by simple linear function approximators.