[Colloq] Tomorrow - Hiring Talk - Eugene Wu - Closing the Loop on Data Analysis - March 25, 10:30am, 366 WVH

Jessica Biron bironje at ccs.neu.edu
Mon Mar 24 13:45:41 EDT 2014


Closing the Loop on Data Analysis 

Tuesday March 25th, 2014 10:30am - 11:30am 

366 WVH 

Eugene Wu 

Although data processing systems now execute queries faster than ever before, 
they only address first half of the data analysis cycle. The latter half — 
presenting and interpreting the results in order to clean the data, formulating 
new queries, generating hypothesis, and summarizing and presenting results 
— is currently ill-served by existing systems. In this talk, I will 
describe two examples of systems that "close the loop" by letting 
users query the results of their data analysis. 

The first, Scorpion, answers "why are these results outliers?"; in 
the context of aggregation queries. Aggregation is commonly used to reduce large 
data sets to a managable size, but also obscures the input records that are 
correlated with outliers from those that are uncorrelated. Scorpion identifies 
the input records that most contributed to an outlier value and generates 
predicates that describe their common properties. 

The second, SubZero, answers "what records generated this result?" 
in the context of scientific workflows. For example, astronomers want to know 
which pixels in the set of all input images were used to detect an interesting 
star. Naively storing input-output relationships (lineage) for every pixel in 
each step of the workflow can incur significant storage and runtime costs. 
SubZero is a workflow system that efficiently tracks lineage information while 
also meeting user specified storage and runtime overhead constraints. 


Eugene Wu is a Ph.D. student in the database group at MIT, advised by Samuel 
Madden and Michael Stonebraker. He is broadly interested in building systems for 
data management and has contributed to research in a wide variety of areas 
including data cleaning, core database performance, human computation, and 
complex event processing. 


_______________________________________________ 
Colloq mailing list 
Colloq at lists.ccs.neu.edu 
https://lists.ccs.neu.edu/bin/listinfo/colloq 



More information about the Colloq mailing list