[Pl-seminar] Semantics Seminar Schedule

Mon Aug 2 12:54:11 EDT 2004

NU Programming Languages Seminar
Monday, August 9, 2004
Room 366 West Village H (http://www.ccs.neu.edu/home/wand/directions.html)
11:45am-1:30pm

	      Algorithmic Differentiation of Functional Programs

			     Jeffrey Mark Siskind
		School of Electrical and Computer Engineering
			      Purdue University

Algorithmic Differentiation (AD) is an established enterprise that seeks to
take the derivatives of functions specified as programs through symbolic
manipulation rather than finite differencing.  AD has traditionally been
applied to imperative programs.  We have developed a novel framework for
performing AD within modern functional programming languages, treating AD
operators as first-class higher-order functions that take first-class function
objects as arguments and produce first-class function objects as results.
Doing so offers seven important advantages over the traditional imperative
framework for AD.

 - Functional programs represent the underlying mathematical notions more
   closely than imperative programs.
 - This formulation is modular and allows a library of functions to be built
   compositionally: root finders built on a derivative-taker, line search
   built on root finders, multivariate optimizers built on line search, other
   multivariate optimizers (with identical APIs) build on Hessian-vector
   multipliers, themselves built on AD operators, and so forth.
 - By allowing the callee to specify the necessary AD, rather than insisting
   that the caller provide appropriately transformed functions, internals can
   be hidden and changed, maintaining a modular structure previously not
   possible in AD codes.
 - It is straightforward to generate higher-order derivatives, i.e.
   derivatives of derivatives.
 - Differential forms become first-class higher-order functions that can be
   passed to optimizers or PDE solvers as part of an API.  This allow one to
   easily express programming patterns, i.e. algorithm templates, that can be
   instantiated with different components as fillers.  For example, one can
   construct an algorithm that needs an optimizer and leave the choice of
   optimizer unspecified, to be filled in later by passing the particular
   optimizer as a function parameter.
 - Gradients can even be taken through processes that themselves involve
   AD-based optimization or PDE solution.
 - In traditional AD formulations, the output of an AD transformation is a
   `tape' that is a different kind of entity than user-written functions.
   It must be interpreted or run-time compiled.  In contrast, in our approach,
   user-written functions, and the input and output of AD operators, are all
   the same kind of entity.  Standard compilation techniques for functional
   programs can eliminate the need for interpretation or run-time compilation
   of derivatives and generate, at compile-time, code for derivatives that is
   as efficient as code for the primal calculation.

We illustrate the above advantages with examples that run in an implemented
system.  Our techniques can be applied to problems in such diverse fields as
engineering design, scientific computing, and machine learning, where
researchers regularly need to take derivatives of complex models specified as
programs.

Joint work with Barak Pearlmutter.

Upcoming Events:

No events scheduled.  Want to volunteer?

--Mitch