[Pl-seminar] Talk April 21: Kathleen Fisher, Hancock

Norman Ramsey nr at eecs.harvard.edu
Wed Apr 9 16:56:08 EDT 2003


     Hancock: A language for analyzing transactional data streams

                           Kathleen Fisher
                            AT&T Research

Massive transaction streams present a number of opportunities for data
mining techniques. The transactions in such streams might represent
calls on a telephone network, commercial credit card purchases, stock
market trades, or HTTP requests to a web server.  While historically
such data have been collected for billing or security purposes, they
are now being used to discover how ``customers'' use the associated
services, where the notion of a customer might be a telephone number,
a credit card number, a trade account number, or an IP address.

For several years, we have computed evolving profiles (called
signatures) of the customers mentioned in large data streams.  The
signature for each customer captures the salient features of his
transactions through time.  Programs for processing signatures must be
highly optimized because of the size of the data stream (several
gigabytes per day) and the number of signatures to maintain (hundreds
of millions). Originally, we wrote such programs directly in C, but
because these programs often sacrificed readability for performance, they
were difficult to verify and maintain.

Hancock is a domain-specific language we created to express
computationally efficient signature programs cleanly.  In this talk,
I will describe the obstacles to computing signatures from massive streams
and explain how Hancock addresses these problems.  For expository
purposes, I present Hancock using a running example from the
telecommunications industry; however, the language itself is general
and applies equally well to other data sources.

Joint work with Anne Rogers, Fred Smith, Corinna Cortes, Daryl Pregibon,
and Karin Hogstedt.
----------------------------------------------------------------
Monday, April 21, 12:30 PM
Harvard University
Maxwell Dworkin 319

Pizza will be served; kindly RSVP to nr at eecs.harvard.edu if you plan to attend.


More information about the pl-seminar mailing list