[Colloq] Thesis Defense: Keshi Day on Monday April 2nd at 10:00am
Nicole Bekerian
nicoleb at ccs.neu.edu
Wed Mar 28 13:21:37 EDT 2012
The College of Computer and Information Science presents a PhD thesis defense:
Speaker: Keshi Dai
Date: Monday, April 2, 2012
Time: 10:00am
Location: 366 WVH
Tittle:
Modeling Score Distributions for Information Retrieval
Abstract:
When user submit a query to a search engine, the search engine computes a score for each document according to its relevance to the query, and ranks the documents based on their scores. Due to the complexity of the modern search engine, the score itself is not sufficient for the information retrieval application requiring combining different ranked lists. Inferring the score distributions for relevant and non-relevant documents and estimating the probability of relevance becomes imperative. In this thesis, we address two major research questions: (1) How to model score distributions in a more accurate manner for relevant and non-relevant documents? (2) How can be score distributions better inferred in practice when the relevance information is absent?
In the first part of the thesis, we show the existing problems of today’s most widely used score distribution model, and propose to model the relevant document scores by a mixture of Gaussians and the non-relevant scores by a Gamma distribution. Score distributions are further modeled in a more systematic manner. With a basic assumption of the distribution of terms in a document, the distribution of the produced scores for retrieved documents can be derived through the transformations applied on term frequencies. Meanwhile, the score distribution of relevant documents can also be derived through a general mathematical frame- work given the score distribution for all retrieved documents.
The second part of the thesis presents a new framework for inferring score distributions when the relevance information is unavailable. The new inference process extends the expectation maximization algorithm by simultaneously considering the ranked lists of documents returned by multiple retrieval systems, and en- codes the constraint that the same document retrieved by multiple systems should have the same, global, probability of relevance. Combined, we demonstrate that it is more effective when it is applied on the task of metasearch.
Committee:
* Javed Aslam (Advisor)
* Harriet Fell
* Rajmohan Rajaraman
* Avi Arampatzis (External Member)
--
Best,
Nicole
______________________________________________________________
Nicole Bekerian
Administrative Assistant
Northeastern University
College of Computer and Information Science
360 Huntington Ave.
202 West Village H
Boston, MA 02115
Phone: 617.373.2462
Fax: 617.373.5121
More information about the Colloq
mailing list