Nathaniel D. Daw

Reinforcement Learning Models of the Dopamine System and their Behavioral Implications Degree Type: Ph.D. in Computer Science
Advisor(s): David Touretzky
Graduated: August 2003

Abstract:

This thesis aims to improve theories of how the brain functions and to provide a framework to guide future neuroscientific experiments by making use of theoretical and algorithmic ideas from computer science. The work centers around the detailed understanding of the dopamine system, an important and phylogenetically venerable brain system that is implicated in such general functions as motivation, decision-making and motor control, and whose dysfunction is associated with disorders such as schizophrenia, addiction, and Parkinson’s disease. A series of influential models have proposed that the responses of dopamine neurons recorded from behaving monkeys can be identified with the error signal from temporal difference (TD) learning, a reinforcement learning algorithm for learning to predict rewards in order to guide decision-making.

Here I propose extensions to these theories that improve them along a number of dimensions simulta- neously. The new models that result eliminate several unrealistic simplifying assumptions from the original accounts; explain many sorts of dopamine responses that had previously seemed anomalous; flesh out nascent suggestions that these neurophysiological mechanisms can also explain animal behavior in conditioning ex- periments; and extend the theories’ reach to incorporate proposals about the computational function of several other brain systems that interact with the dopamine neurons.

Chapter 3 relaxes the assumption from previous models that the system tracks only short-term predictions about rewards expected within a single experimental trial. It introduces a new model based on average- reward TD learning that suggests that long-run reward predictions affect the slow-timescale, tonic behavior of dopamine neurons. This account resolves a seemingly paradoxical finding that the dopamine system is excited by aversive events such as electric shock, which had fueled several published attacks on the TD theories. These investigations also provide a basis for proposals about the functional role of interactions between the dopamine and serotonin systems, and about behavioral data on animal decision-making.

Chapter 4 further revises the theory to account for animals’ uncertainty about the timing of events and about the moment-to-moment state of an experimental task. These issues are handled in the context of a TD algorithm incorporating partial observability and semi-Markov dynamics; a number of other new or extant models are shown to follow from this one in various limits. The new theory is able to explain a number of previously puzzling results about dopamine responses to events whose timing is variable, and provides an appropriate framework for investigating behavioral results concerning variability in animals’ temporal judgments and timescale invariance properties in animal learning.

Chapter 5 departs from the thesis’ primary methodology of computational modeling to present a comple- mentary attempt to address the same issues empirically. The chapter reports the results of an experiment that record from the striatum of behaving rats (a brain area that is one of the major inputs and outputs of the dopamine system), during a task designed to probe the functional organization of decision-making in the brain. The results broadly support the contention of most versions of the TD models that the functions of action selection and reward prediction are segregated in the brain, as in ”actor/critic” reinforcement learning systems.

Thesis Committee:
David S. Touretzky (Chair)
James L. McClelland
Andrew W. Moore
William E. Skaggs (University of Arizona)
Peter Dayan (University College London)

Randy Bryant, Head, Computer Science Department
James Morris, Dean, School of Computer Science

Keywords:
Computational neuroscience, dopamine, reinforcement learning

CMU-CS-03-177.pdf (3 MB)
Copyright Notice