[Colloq] Master's thesis defense - Poonam Bhide - Exploiting Implicit and Explicit Network Structures for Text Classification - Thursday April 23rd- 3 PM - 164 WVH

DiFazio, Danielle d.difazio at neu.edu
Wed Apr 22 16:33:30 EDT 2015


Topic: Master's thesis defense
Title: Exploiting Implicit and Explicit Network Structures for Text Classification
Speaker: Poonam Bhide

Date: 4/23 3 pm
Location: 164 WVH

Committee: David Smith, Alan Mislove

Abstract:

The problem of text classification is of great importance and has been studied extensively in the field of Computer Science and Machine Learning. Proliferation of digital media has led to collection of massive data in the form of documents, articles, reviews, and news. While these documents contain much important information, it is difficult to manually examine such massive sets of documents and extract the relevant information from each. Every dataset has different properties and correlations between its attributes. Network structures formed by correlating some useful attributes can be very helpful for categorizing text.
This thesis aims at exploiting the implicit and explicit network structures in legislative bills of the U.S. Congress, to aid in categorizing bill sections. When investigating legislative bills, it becomes clear that bills share a lot of common language. Text reuse amongst bills has resulted in many implicit local alignments amongst bills, where the bill authors do not explicitly mark the content they are copying from previous bills.  Many policies also have explicit references to the United States Code. Such implicit and explicit structures in the corpus can aid the text classification of bill sections.
An important problem for political scientists aiming to analyze the legislative process is detecting instances of policy proposals. A bill section that either conveys a new policy or changes brought about to an older version of the policy can be classified as a policy idea. On the other hand, statements such as those made in adherence to laws, and references made to historical events or laws are non-policy ideas. The primary evaluation in this thesis is performed on a database of bill sections annotated by political scientists for this kind of policy content. 
We present analysis of implicit and explicit network structures that may have some impact on the text classification. In future, successful development of efficient classifier algorithm for complex text cases will enable rapid data search and information retrieval. This project aims at contributing in the groundwork for the ongoing research, by taking a classification problem along with network structures.



More information about the Colloq mailing list