Noam Brown

Equilibrium Finding for Large Adversarial Imperfect-Information Games Degree Type: Ph.D. in Computer Science
Advisor(s): Tuomas Sandholm
Graduated: December 2020

Abstract:

Imperfect-information games model strategic interactions involving multiple agents with private information. A typical goal in this setting is to approximate an equilibrium in which all agents' strategies are optimal. This thesis describes a number of advances in the computation of equilibria in large adversarial imperfect information games. Together, these new techniques made it possible for the first time for an AI agent to defeat top human professionals in full-scale poker, which had for decades served as a grand challenge problem in the fields of AI and game theory.

We begin by introducing faster equilibrium-finding algorithms based on counterfactual regret minimization (CFR), an iterative algorithmic framework that converges to a Nash equilibrium in two-player zero-sum games. We describe new variants of CFR that use discounting to dramatically speed up convergence. These discounting variants are now the state-of-the-art equilibrium-finding algorithms for large adversarial imperfect-information games. We also describe theoretically sound pruning techniques that can speed up CFR by orders of magnitude in large games. Additionally, we introduce the first general technique for warm starting CFR.

Next, we describe new ways to scale CFR to extremely large games via automated abstraction and function approximation. In particular, we introduce th first provably locally optimal algorithm for discretizing continuous action spaces in imperfect-information games. We extend this into an algorithm that converges to an equilibrium even in games with continuous action spaces. We also introduce Deep CFR, a form of CFR that uses neural network function approximation rather than bucketing-based abstractions. Deep CFR was the first non-tabular form of CFR to scale to large games and enables CFR to be deployed in settings with little domain knowledge.

We also present new search techniques for imperfect-information games that ensure the agent's search strategy cannot be exploited by an adversary. These new forms of search outperform past approaches both in theory and in practice. We describe how these search techniques were used in the construction of Libratus to defeat top humans in two-player no-limit poker for the first time. Additionally, we introduce a method for depth-limited search that is orders of magnitude more efficient than prior approaches. We describe how this depth-limited search technique was incorporated into Pluribus, which defeated top human professionals in multiplayer poker for the first time. Finally, we present an algorithm that combines deep reinforcement learning with search at both training and test time, which takes a major step toward bridging the gap between research on perfect-information games and imperfect-information games.

Thesis Committee:
Tuomas Sandholm (Chair)
Geoff Gordon
Ariel Procaccia
Satinder Singh (University of Michigan)
Michael Wellman (University of Michigan)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Keywords:
Artificial Intelligence, Machine Learning, Game Theory, Equilibrium Finding, Regret Minimization, No-Regret Learning, Imperfect-Information Games, Reinforcement Learning, Deep Reinforcement Learning, Multi-Agent Reinforcement Learning

CMU-CS-20-132.pdf (7.65 MB) ( 234 pages)
Copyright Notice