[Colloq] PhD Defense - Peter Golbus - Frameworks for Evaluating and Meta-Evaluating Search Engines - Jun 19th, 11am - 366 WVH
Jessica Biron
bironje at ccs.neu.edu
Tue May 20 11:39:14 EDT 2014
Title: Frameworks for Evaluating and Meta-Evaluating Search Engines
June 19th, 11am, 366 WVH
Committee:
Javed A. Aslam (advisor)
David A. Smith
Guevara Noubir
Charles L.A. Clarke (University of Waterloo)
Abstract:
In this work, we address information retrieval evaluation and the methods and metrics used for such evaluations. We consider the relative lack of understanding of this area to be the crucial problem in advancing information retrieval. To that end, we introduce several frameworks for meta-evaluation and describe how their unification with evaluation measures can lead to improvements in assessing the
quality of information retrieval systems.
For example, many queries, especially in the context of the web, have multiple interpretations; they are ambiguous or underspecified. To account for this, much recent research has focused on creating systems that produce diverse ranked lists that cover as many interpretations in as few documents as possible. Ideally, measures
that evaluate these systems would distinguish between them by how many interpretations they cover and how quickly. Unfortunately, diversity is also a function of the collection over which the system is run and a system’s ability to retrieve documents relevant to any interpretation. To ensure that we are assessing systems by their diversity, we develop (1) a family of evaluation measures that take into account the diversity of the collection and (2) a meta-evaluation measure that explicitly controls for a system’s ability to retrieve relevant documents. We demonstrate experimentally that our new measures can achieve substantial improvements in sensitivity to diversity without reducing discriminative power.
Furthermore, we propose a probabilistic framework whose utility encompasses both evaluation and meta-evaluation. This allows us to develop new information-theoretic evaluation and meta-evaluation metrics that will, hopefully, be more easyto unify in a fashion similar to our family of diversity measures. We demonstrate that these new metrics are powerful and generalizable, enabling evaluations heretofore not possible.
More information about the Colloq
mailing list