[Colloq] Hiring talk by Ariel Rabkin, March 11, 10:30-11:45, 366 WVH

Tue Mar 4 12:54:22 EST 2014

Hiring talk by Ariel Rabkin, March 11, 10:30-11:45, 366 WVH.

Host: Amal Ahmed

----

Growing up: How Big Data processing can cope with limited bandwidth and complex code

ABSTRACT:

We are now entering an era in which organizations collect and process
unprecedented data volumes. This "big data" is handled using
large-scale distributed systems of unprecedented scale.  My work
addresses problems in effectively deploying and managing these
systems.  In this talk, I will focus on two parts of my research.
First, I describe my work building wide-area data collection and
analytics pipelines to cope with large and variable bandwidth demands.
Second, I describe my work on better configuration management for the
increasingly complex software seen in these environments.

In wide area contexts, available bandwidth can vary over time. Current
analytics systems require users to specify in advance the data to be
collected. As a consequence, systems are provisioned for the worst
case, which is costly and inflexible. We are building a distributed
analytics system, JetStream, designed for the wide area. JetStream
lets users specify explicit policy for how the system should respond
to varying data volumes and bandwidth availability. As a result, the
system can make optimal use of available resources at each point in
time.

As data grows, so does the complexity of the software used to manage
it. Modern software stacks are increasingly complex and
correspondingly difficult to configure. Users and administrators are
left resorting to trial and error or internet searching when
difficulties arise. My research in this area tames system
configuration by applying static analysis. Analysis can determine the
dependencies between configuration options and error messages. As a
result, system failures can be quickly traced to a small set of
potentially responsible options. Users thus get immediate feedback on
how to resolve configuration errors.

BIO:

Ariel Rabkin is interested in techniques for building and debugging
complex software systems. He is currently a postdoctoral researcher at
Princeton University, working with Michael Freedman and Vivek Pai. He
received his PhD in Computer Science from UC Berkeley in May 2012,
where he was advised by Randy Katz.  He previously attended Cornell
University (AB 2006, MEng 2007). He is a contributor to several open
source projects, including Hadoop, the Chukwa log collection
framework, and the JChord program analysis toolset.