[Colloq] HIRING TALK TODAY 11:45AM 149CN
Rachel Bates
rachelb at ccs.neu.edu
Mon, 29 Apr 2002 11:22:05 -0400
College of Computer Science Colloquium
presents
Donghui Zhang
who will speak on
Efficient Aggregation over Objects with Extent
Monday, April 29, 2002
11:45am
149 Cullinane Hall
Northeastern University
ABSTRACT
Aggregation is an important but costly operator for analyzing large
datasets. Recent work on computing multi-dimensional aggregates has
concentrated on point objects (i.e. objects with zero extent). However,
many spatial and/or spatio-temporal applications create objects that have
extent in various dimensions. In this talk we examine the problem of
efficiently computing SUM/COUNT/AVG aggregates over objects with extent. An
example query is: “find the total volume of rainfall in Orange County during
1999”. The aggregation predicate is typically described by a
multi-dimensional query box. We examine two variations of the problem. In
the simple case an object’s value contributes to the aggregation result as
long as the object intersects the query box. For various applications,
however, this aggregation may not provide an intuitive result; objects that
have large values but marginally touch the query box can drastically affect
the aggregation result. More natural is the “functional” aggregation, where
objects participate in the computation proportionally to the size of their
intersection with the query box. One straightforward approach to solve both
these problems is to use a traditional multi- dimensional index (e.g., an
R-tree) and compute the aggregate by locating all objects intersecting the
query box. This approach can be further optimized by adding aggregate
information on the index nodes. Nevertheless, the query performance remains
linear to the number of objects intersecting the query box. Instead we take
a different approach and design a specialized aggregation index. We first
show that both problems can be reduced to “dominance-sum” queries. We then
introduce the BA-tree, a novel dynamic, disk-based index that efficiently
solves dominance-sums. We run experiments comparing the performance of the
BA-tree and a traditional R-tree (augmented with aggregation information)
for simple and functional aggregations. Our evaluation reaffirms that the
BA-tree has more robust performance: it offers drastic improvement
(ten-fold) in query performance at the expense of some limited space
overhead.
Host:
Betty Salzberg