[Colloq] PhD Proposal by Mingyan Shao - Friday, October 30, 1:30pm

Rachel Kalweit rachelb at ccs.neu.edu
Wed Oct 28 16:20:23 EDT 2009


Ph.D Candidate: Mingyan Shao

When: Oct. 30th, 1:30PM - 3:30PM

Where: West Village H 366

PhD Thesis Proposal 

Title: Diagrams: Feature Analysis and Classification

Abstract

Diagrams are an important part of scientific articles because of the large amount of information they carry, however most of the research on diagrams concern about the relationship between diagrams and text in the documents. Our research instead focuses on diagrams themselves, in particular, vector diagrams that consist of a list of geometrical primitives such as lines and rectangles, complementary to raster images that are represented in an array of pixels. We approach the problem of diagrams from the perspectives of feature analysis and classification.

We define and identify novel content features of diagrams, graphemes: the elementary yet meaningful unit of diagrams. Grapheme bridges the semantic gap in diagram research where only low level content features such as color and texture are studied. A variety of graphemes are defined and extracted from diagrams and serve to distinguish five major diagram classes. Our research will allow us to achieve insights into diagrams from the point of view of machine learning, and will build a solid foundation for a diagram retrieval system which has valuable potential in both research and commercial applications.

We start by acquiring vector diagrams in PDF documents, and represent their graphic primitives with independent and self-contained graphic objects. Using constraint-based specifications, graphemes are identified and extracted from the graphic objects. The collection of graphemes in a diagram forms the feature set characterizing the diagram, and is brought to machine learning algorithms for unsupervised and supervised learning. Our published results and ongoing work show that this approach succeeds in quantitative terms, at least equal to those achieved in raster (non-vector) image retrieval systems. 


Proposal Committee: 

Prof. Jay Aslam
Computer Science, Northeastern University

Prof. Margrit Betke
Computer Science, Boston University

Prof. Harriet Fell
Computer Science, Northeastern University

Prof. Robert Futrelle (Advisor)
Computer Science, Northeastern University

Prof. Ronald Williams
Computer Science, Northeastern University

******************************************************************************




More information about the Colloq mailing list