[Colloq] Thesis Defense - Exploratory Analysis with Imprecise Queries - Bahar Qarabaqi - Thursday July 30th 3:30 pm WVH 366

DiFazio, Danielle d.difazio at neu.edu
Thu Jul 23 14:47:04 EDT 2015


Title: Exploratory Analysis with Imprecise Queries

Abstract:
Very large amounts of data are available in various domains, including scientific research disciplines, online communities and marketing. Unfortunately, the ability to analyze big data and extract interesting information is limited by the user's knowledge and expertise in databases. In particular, she has to be able to compose a precise query that specifies exactly what she wants. We have developed Merlin, a new framework for querying big data. Merlin provides new functionality for exploratory search in large databases to find entities of interest even when the user is not sure about some of their properties.

The user interacts with Merlin by specifying probability distributions over attributes, which express imprecise conditions. Merlin helps the user home in on the right query conditions by addressing three key challenges: (1) efficiently computing results for an imprecise query, (2) providing feedback about the sensitivity of the result to changes of individual conditions, and (3) suggesting new conditions. We formally introduce the notion of sensitivity and prove structural properties that enable efficient algorithms for quantifying the effect of uncertainty in user-specified conditions. To support interactive responses, we also develop techniques that can deliver probability estimates within a given realtime limit and are able to adapt automatically as interactive query refinement proceeds. Due to the interactive nature of the process, query conditions are added incrementally. We show that utilizing a prediction model that is specifically trained for the given query delivers better probability estimates. This creates a trade-off between cost of storing models and the quality of predicted probabilities. We show how to overcome this trade-off by training the appropriate models incrementally on-the-fly without violating the response-time requirement.

Committee:
- Mirek Riedewald (advisor)
- Jay Aslam
- Yizhou Sun
- Miyoko Chu (external member, Cornell University)





More information about the Colloq mailing list