Barak A. Pearlmutter

An Investigation of the Gradient Descent Process in Neural Networks Degree Type: Ph.D. in Computer Science
Advisor(s): David Touretzky
Graduated: May 1996

Abstract:

Usually gradient descent is merely a way to find a minimum, abandoned if a more efficient technique is available. Here we investigate the detailed properties of the gradient descent process, and the related topics of how gradients can be computed, what the limitations on gradient descent are, and how the second-order information that governs the dynamics of gradient descent can be probed. To develop our intuitions, gradient descent is applied to a simple robot arm dynamics compensation problem, using backpropagation on a temporal windows architecture. The results suggest that smooth filters can be easily learned, but that the deterministic gradient descent process can be slow and can exhibit oscillations. Algorithms to compute the gradient of recurrent networks are then surveyed in a general framework, leading to some unifications, a deeper understanding of recurrent networks, and some algorithmic extensions. By regarding deterministic gradient descent as a dynamic system we obtain results concerning its convergence, and a quantitative theory of its behavior when a saturating or "robust" error measure is used. Since second-order structure enters into the dynamics, an efficient exact technique for multiplying a vector by the Hessian of a gradient system is derived and applied.

Thesis Committee:
David S. Touretzky (Chair)
Scott E. Fahlman
James L. McClelland
Geoffrey E. Hinton (University of Toronto)

James Morris, Head, Computer Science Department
Raj Reddy Dean, School of Computer Science

Keywords:
Neural networks, control, optimization, gradient descent, Newton method, convergence rate, Hessian

CMU-CS-96-114.pdf (12.39 MB) ( 140 pages)
Copyright Notice