A Real-time Opponent Modeling System for Rush Football
Kennard Laviers and Gita Sukthankar
One drawback with using plan recognition in adversarial games is that often players must commit to a plan before it is possible to infer the opponent's intentions. In such cases, it is valuable to couple plan recognition with plan repair, particularly in multi-agent domains where complete replanning is not computationally feasible. This paper presents a method for learning plan repair policies in real-time using Upper Confidence Bounds for Trees (UCT). We demonstrate how these policies can be coupled with plan recognition in an American football game (Rush 2008) to create an autonomous offensive team capable of responding to unexpected changes in defensive strategy. Our real-time version of UCT learns play modifications that result in a significantly higher average yardage and fewer intercepts than either the baseline game or domain-specific heuristics. Although it is possible to use the actual game simulator to measure reward offline, to execute UCT in real-time demands a different approach; here we describe two modules for reusing data from offline UCT searches to learn accurate state and reward estimators.