[Pl-seminar] Semantics Seminar Schedule
Mitchell Wand
wand at ccs.neu.edu
Mon Aug 9 09:32:51 EDT 2004
NU Programming Languages Seminar
Monday, August 9, 2004
Room 366 West Village H (http://www.ccs.neu.edu/home/wand/directions.html)
11:45am-1:30pm
Algorithmic Differentiation of Functional Programs
Jeffrey Mark Siskind
School of Electrical and Computer Engineering
Purdue University
Algorithmic Differentiation (AD) is an established enterprise that seeks to
take the derivatives of functions specified as programs through symbolic
manipulation rather than finite differencing. AD has traditionally been
applied to imperative programs. We have developed a novel framework for
performing AD within modern functional programming languages, treating AD
operators as first-class higher-order functions that take first-class function
objects as arguments and produce first-class function objects as results.
Doing so offers seven important advantages over the traditional imperative
framework for AD.
- Functional programs represent the underlying mathematical notions more
closely than imperative programs.
- This formulation is modular and allows a library of functions to be built
compositionally: root finders built on a derivative-taker, line search
built on root finders, multivariate optimizers built on line search, other
multivariate optimizers (with identical APIs) build on Hessian-vector
multipliers, themselves built on AD operators, and so forth.
- By allowing the callee to specify the necessary AD, rather than insisting
that the caller provide appropriately transformed functions, internals can
be hidden and changed, maintaining a modular structure previously not
possible in AD codes.
- It is straightforward to generate higher-order derivatives, i.e.
derivatives of derivatives.
- Differential forms become first-class higher-order functions that can be
passed to optimizers or PDE solvers as part of an API. This allow one to
easily express programming patterns, i.e. algorithm templates, that can be
instantiated with different components as fillers. For example, one can
construct an algorithm that needs an optimizer and leave the choice of
optimizer unspecified, to be filled in later by passing the particular
optimizer as a function parameter.
- Gradients can even be taken through processes that themselves involve
AD-based optimization or PDE solution.
- In traditional AD formulations, the output of an AD transformation is a
`tape' that is a different kind of entity than user-written functions.
It must be interpreted or run-time compiled. In contrast, in our approach,
user-written functions, and the input and output of AD operators, are all
the same kind of entity. Standard compilation techniques for functional
programs can eliminate the need for interpretation or run-time compilation
of derivatives and generate, at compile-time, code for derivatives that is
as efficient as code for the primal calculation.
We illustrate the above advantages with examples that run in an implemented
system. Our techniques can be applied to problems in such diverse fields as
engineering design, scientific computing, and machine learning, where
researchers regularly need to take derivatives of complex models specified as
programs.
Joint work with Barak Pearlmutter.
Upcoming Events:
No events scheduled. Want to volunteer?
--Mitch
More information about the pl-seminar
mailing list