[Colloq] Thesis Defense: Keshi Day on Monday April 2nd at 10:00am

Nicole Bekerian nicoleb at ccs.neu.edu
Wed Mar 28 13:21:37 EDT 2012



The College of Computer and Information Science presents a PhD thesis defense:

Speaker: Keshi Dai

Date: Monday, April 2, 2012
Time: 10:00am
Location: 366 WVH

Tittle: 
Modeling Score Distributions for Information Retrieval

Abstract: 
When user submit a query to a search engine, the search engine computes a score for each document according to its relevance to the query, and ranks the documents based on their scores. Due to the complexity of the modern search engine, the score itself is not sufficient for the information retrieval application requiring combining different ranked lists. Inferring the score distributions for relevant and non-relevant documents and estimating the probability of relevance becomes imperative. In this thesis, we address two major research questions: (1) How to model score distributions in a more accurate manner for relevant and non-relevant documents? (2) How can be score distributions better inferred in practice when the relevance information is absent?

In the first part of the thesis, we show the existing problems of today’s most widely used score distribution model, and propose to model the relevant document scores by a mixture of Gaussians and the non-relevant scores by a Gamma distribution. Score distributions are further modeled in a more systematic manner. With a basic assumption of the distribution of terms in a document, the distribution of the produced scores for retrieved documents can be derived through the transformations applied on term frequencies. Meanwhile, the score distribution of relevant documents can also be derived through a general mathematical frame- work given the score distribution for all retrieved documents.

The second part of the thesis presents a new framework for inferring score distributions when the relevance information is unavailable. The new inference process extends the expectation maximization algorithm by simultaneously considering the ranked lists of documents returned by multiple retrieval systems, and en- codes the constraint that the same document retrieved by multiple systems should have the same, global, probability of relevance. Combined, we demonstrate that it is more effective when it is applied on the task of metasearch.

Committee:
* Javed Aslam (Advisor)
* Harriet Fell
* Rajmohan Rajaraman
* Avi Arampatzis (External Member)




-- 




Best, 
Nicole 

______________________________________________________________ 

Nicole Bekerian 
Administrative Assistant 

Northeastern University 
College of Computer and Information Science 
360 Huntington Ave. 
202 West Village H 
Boston, MA 02115 

Phone: 617.373.2462 
Fax: 617.373.5121 




More information about the Colloq mailing list