[Colloq] Master's thesis defense - Sam Scarano - Applying Unsupervised Grammar Induction to OCR Error Correction - Wednesday December 17- 12:45 PM - 164 WVH

Fong, Andy a.fong at neu.edu
Wed Dec 17 10:57:27 EST 2014


Topic: Master's thesis defense
Title: Applying Unsupervised Grammar Induction to OCR Error Correction
Speaker: Sam Scarano

Date: 12/17 12:45 PM
Location: 164 WVH

Committee: David Smith, Javed Aslam

Abstract:

OCR software is typically a commercial black box, which applies crude language knowledge, if any, to minimizing errors. This thesis presents a system for correcting OCR errors. Unlike previous work, which uses n-gram models, this system uses a structured language model. Also, unlike previous uses of structured language models in noisy-channel correction tasks, the present model is trained without structure annotation. The system is evaluated using a corpus of pre-modern English. The results are competitive with previous findings in supervised structured language modeling for automatic speech recognition. However, targeted experimentation indicates that most of the improvement is attributable to the use of induced word classes in the model, not structure. Comparisons to previous work in language modeling and relevant properties of the model's parses will be discussed.


Andrew W. Fong
Assistant Director for Graduate Admissions and Enrollment

Northeastern University
College of Computer and Information Science
360 Huntington Avenue
202 West Village H
Boston, MA 02115
617-373-8493
a.fong at neu.edu

Follow us on Twitter - @CCISGrad<https://twitter.com/CCISGrad>
Like us on Facebook - CCIS Graduate School<https://www.facebook.com/CCISGradSchool?ref_type=bookmark>



More information about the Colloq mailing list