[Colloq] Colloquium Talk: What does CPS have to do with deep learning?
Smith, Greg
g.smith at northeastern.edu
Thu Sep 7 16:33:06 EDT 2017
Speaker: Jeffrey M. Siskind
Date: September 11, 2017
Time: 1:30pm -3:00pm
Location: 366 West Village H
Title: What does CPS have to do with deep learning?
Abstract
Deep learning is formulated around backpropagation, a method for computing gradients of a particular class of data flow graphs to train models to minimize error by gradient descent. Backpropagation is a special case of a more general method known as reverse mode automatic differentiation (AD) for computing gradients of functions expressed as programs. Reverse mode AD imposes only a small constant-factor overhead in operation count over the original computation, but has storage requirements that grow, in the worst case, in proportion to the time consumed by the original computation. This storage blowup can be ameliorated by check pointing, a process that reorders application of classical reverse-mode AD over an execution interval to tradeoff space vs. time. Application of check pointing in a divide-and-conquer fashion to strategically chosen nested execution intervals can break classical reverse-mode AD into stages which can reduce the worst-case growth in storage from linear to logarithmic. Doing this has been fully automated only for computations of particularly simple form, with checkpoints spanning execution intervals resulting from a limited set of program constructs. We show how the technique can be automated for arbitrary computations. Doing so relies on implementing general purpose mechanisms for counting the number of instructions executed by a program, interrupting the execution after a specified number of steps, and resuming the execution with a nonstandard interpretation. We implement these general purpose mechanisms with a compiler that converts programs to continuation-passing style (CPS). The process of efficiently computing gradients with check pointing requires running, and rerunning, little bits of the program out of order. This is made easier by applying the technique to functional programs. There is a deeper and higher-level message in this talk: machine learning can benefit from advanced techniques from programming language theory.
Biography
Jeffrey M. Siskind received the B.A. degree in Computer Science from the Technion, Israel Institute of Technology, Haifa, in 1979, the S.M. degree in Computer Science from the Massachusetts Institute of Technology (M.I.T.), Cambridge, in 1989, and the Ph.D. degree in Computer Science from M.I.T. in 1992. He did a postdoctoral fellowship at the University of Pennsylvania Institute for Research in Cognitive Science from 1992 to 1993. He was an assistant professor at the University of Toronto Department of Computer Science from 1993 to 1995, a senior lecturer at the Technion Department of Electrical Engineering in 1996, a visiting assistant professor at the University of Vermont Department of Computer Science and Electrical Engineering from 1996 to 1997, and a research scientist at NEC Research Institute, Inc. from 1997 to 2001. He joined the Purdue University School of Electrical and Computer Engineering in 2002 where he is currently an associate professor. His research interests include computer vision, robotics, artificial intelligence, neuroscience, cognitive science, computational linguistics, child language acquisition, automatic differentiation, and programming languages and compilers.
Thank you.
________________________________________
Greg Smith
Assistant Director of Operations & Space Planning
College of Computer and Information Science
Northeastern University
617.373.3237 ■ g.smith at northeastern.edu<mailto:g.smith at northeastern.edu>
Facebook<https://www.facebook.com/ccisatnu/> ■ Instagram<https://www.instagram.com/ccisatnu/> ■ LinkedIn<https://www.linkedin.com/company/nuccis> ■ Twitter<https://twitter.com/CCISatNU>
More information about the Colloq
mailing list