[Colloq] Talk: Jeff Satterley - NLP-NG: A New NLP System for Biomedical Text Analysis - Wed. 6/16

Rachel Kalweit rachelb at ccs.neu.edu
Tue Jun 15 12:33:14 EDT 2010


The College of Computer and Information Science Presents:

Jeff Satterley, CCIS PhD Student

Date: Wednesday, June 16
Time: 1:00pm
Location: 366 West Village H

Title: NLP-NG: A New NLP System for Biomedical Text Analysis

Abstract:

Most of the language processing done in the biomedical research domain (BioNLP) has focused on identifying entities and facts relating those entities.  However, biologists reading research articles are interested in the entire scientific process, which includes developing new hypotheses and experiments from previous knowledge, setting up and measuring experiments, and inferring knowledge about the natural world through experimental results.  These processes cannot be found when looking at single sentences in complete isolation.  Instead, we propose to identify relationships between adjacent or nearly-adjacent sentences, called inter-clausal relations, that identify the reasoning process of the scientists.  We find these relationships through a process called Normalization, which simplifies the structure of sentences in biology texts, to find common constructions used by scientists.

We have developed a new language processing system, called NLP-NG, with a focus on doing normalization over a large corpus of text, and finding inter-clausal relations.  NLP-NG consists of three major components:  NG-CORE is an object-oriented language processing system, which preprocesses, annotates, and performs statistical analysis on corpora of documents.  It is designed to handle both unstructured and semi-structured (e.g., XML) text.  NG-DB stores data in relational databases, for long-term persistence.  NG-SEE is a web-based client for interacting with and visualizing data from corpora in NG-CORE.  It was created using the Google Web Toolkit (GWT) and Ajax technologies.    






More information about the Colloq mailing list