[Colloq] Thesis Defense: Processing Theta-Joins on Shared-Nothing Systems - Wednesday, March 5 at 2pm, 166 WVH

Andrew W. Fong awfong at ccs.neu.edu
Mon Mar 3 14:46:10 EST 2014


Date: Wednesday, March 5, 2014
Time: 2pm
Place: 166WVH

Thesis Defense:
Title: Processing Theta-Joins on Shared-Nothing Systems
Speaker: Alper Okcan

Abstract:
Joins are essential for many large-scale data analysis tasks, and a variety of join conditions must be supported for many applications such as advertising, marketing, social networks and data-driven science. Efficient parallel execution of joins is crucial to cope with the large volumes of data being collected and generated in many disciplines.

We study how to efficiently process theta-joins in distributed shared-nothing systems when the goal is to minimize response time. We propose a join model that simplifies creation of and reasoning about parallel join algorithms. Using this model, we introduce a randomized algorithm whose response time is provably within a small constant factor of the lower bound for a variety of join problems. For other popular classes of joins where this does not apply, we develop efficient heuristics.

Certain join queries may incur high input replication rate based on the join condition, input data distribution, available statistics, and cluster properties. We propose lightweight encoding and decoding strategies in order to reduce the amount of data transferred across the network. These strategies are also applicable to large and diverse spectrum of applications executed using MapReduce programming model. Data transfer reduction is achieved by dynamically and adaptively performing mapper-side tasks on the reducers.

We integrate our optimization techniques to Scolopax, a novel system that supports exploratory analysis for data-driven science. Scolopax supports flexible join predicates used by scientists to compute relationships and correlations in high-dimensional scientific data.


Thesis committee:
Mirek Riedewald - Advisor
Rajmohan Rajaraman
Alan Mislove
Yanlei Diao - External member, University of Massachusetts Amherst


-- 
Andrew W. Fong 
Program Assistant 

Northeastern University 
College of Computer and Information Science 
360 Huntington Avenue 
202 West Village H 
Boston, MA 02115 
617-373-8493 
awfong at ccs.neu.edu 




More information about the Colloq mailing list