[Colloq] Thesis Defense: Processing Theta-Joins on Shared-Nothing Systems - Wednesday, March 5 at 2pm - 366 WVH

Fong, Andy a.fong at neu.edu
Wed Mar 5 12:35:43 EST 2014


Date: Wednesday, March 5, 2014

Time: 2pm

Place: 366WVH



Thesis Defense:

Title: Processing Theta-Joins on Shared-Nothing Systems

Speaker: Alper Okcan



Abstract:

Joins are essential for many large-scale data analysis tasks, and a variety of join conditions must be supported for many applications such as advertising, marketing, social networks and data-driven science. Efficient parallel execution of joins is crucial to cope with the large volumes of data being collected and generated in many disciplines.



We study how to efficiently process theta-joins in distributed shared-nothing systems when the goal is to minimize response time. We propose a join model that simplifies creation of and reasoning about parallel join algorithms. Using this model, we introduce a randomized algorithm whose response time is probably within a small constant factor of the lower bound for a variety of join problems. For other popular classes of joins where this does not apply, we develop efficient heuristics.



Certain join queries may incur high input replication rate based on the join condition, input data distribution, available statistics, and cluster properties. We propose lightweight encoding and decoding strategies in order to reduce the amount of data transferred across the network. These strategies are also applicable to large and diverse spectrum of applications executed using MapReduce programming model. Data transfer reduction is achieved by dynamically and adaptively performing mapper-side tasks on the reducers.



We integrate our optimization techniques to Scolopax, a novel system that supports exploratory analysis for data-driven science. Scolopax supports flexible join predicates used by scientists to compute relationships and correlations in high-dimensional scientific data.





Thesis committee:

Mirek Riedewald - Advisor

Rajmohan Rajaraman

Alan Mislove

Yanlei Diao - External member, University of Massachusetts Amherst




Andrew W. Fong
Program Assistant

Northeastern University
College of Computer and Information Science
360 Huntington Avenue
202 West Village H
Boston, MA 02115
617-373-8493
a.fong at neu.edu



More information about the Colloq mailing list