[Colloq] Thesis Proposal by Alper Okcan "Processing Theta-Joins on Large Clusters" 5/4/12 at 1:00pm Room 366 WVH

Nicole Bekerian nicoleb at ccs.neu.edu
Fri Apr 27 14:02:04 EDT 2012


The College of Computer and Information Science presents:

Thesis Proposal by Alper Okcan
Date/Time: Friday May 4th, 2012
Location: 366 WVH

Thesis title: "Processing Theta-Joins on Large Clusters"

Abstract:
Joins are essential for large-scale data analysis in many applications such as advertising, marketing, social networks and data-driven science. 
We present techniques that enable efficient parallel execution of theta-joins on large clusters.

In the first part of the proposal, we show how to process theta-joins efficiently in parallel when the goal is to minimize response time. We propose a cost model that is applicable to any theta-join condition and use it to derive lower bounds. Then we introduce a simple randomized algorithm whose response time is provably within a small constant factor of the lower bound for a variety of join problems. For other popular classes of joins where this does not apply, we develop efficient heuristics.

Parallel computation often requires input replication, which is essential for effective parallel join execution and many other applications such as search-log analysis. In the second part of the proposal, we present a replication strategy for highly scalable data intensive computing platforms such as MapReduce. Our goal is to reduce the total cost by increasing input sharing among the tasks processed on the same node and by developing appropriate local data processing techniques. We show how to distribute the input across the cluster in order to increase data sharing, while maintaining load balance among the nodes.

We integrate our results into Scolopax, a novel system that supports exploratory analysis for data-driven science. Our techniques are used for correlation analysis on massive high-dimensional scientific data.

Thesis committee:
Mirek Riedewald - Advisor
Rajmohan Rajaraman
Alan Mislove
Yanlei Diao - External member, University of Massachusetts Amherst


-- 




Best, 
Nicole 

______________________________________________________________ 

Nicole Bekerian 
Administrative Assistant 

Northeastern University 
College of Computer and Information Science 
360 Huntington Ave. 
202 West Village H 
Boston, MA 02115 

Phone: 617.373.2462 
Fax: 617.373.5121 




More information about the Colloq mailing list