[Colloq] MONDAY - Distinguished Lecture Series - October 20 - Andrew McCallum - 20 WVF - Lunch Served
Biron, Jessica
j.biron at neu.edu
Fri Oct 17 09:19:36 EDT 2014
Northeastern University
College of Computer and Information Science Distinguished Lecture Series
Andrew McCallum
University of Massachusetts Amherst
Monday, October 20, 2014
12:00pm
20 West Village F
Lunch served after event
Title:
Probabilistic Databases for Large-scale Knowledge-base Construction
Abstract:
When building large-scale knowledge bases we want to account for uncertainty in order to perform joint inference and accurately integrate new evidence. However, reasoning about data at this scale quickly involves more random variables than can fit in machine memory. For this reason we have become interested in probabilistic databases, which we use not only for storing and querying the results of an information extraction (IE) system, but also for aiding the performance of IE joint inference itself---managing the many random variables and intermediate results of IE. In this approach only raw textual and tabular evidence is presented to the database, and IE inference is performed "inside the database." Thus we have taken to calling this an Epistemological Database, indicating that the database doesn't directly observe the truth about entities and relations; it must infer the truth from available evidence [VLDB 2010; AKBC 2012]. After describing these ideas I will present two pieces of recent work: first, large-scale, non-greedy, Monte Carlo entity resolution running with distributed processing, which also supports probabilistic reasoning about crowd-sourced human edits; and second, an approach to "schema-less" relation extraction based on tensor factorization which we call "universal schema." All of the above are implemented on top of our probabilistic programming framework FACTORIE, a Scala library for factor graphs and natural language processing.
Joint work with Michael Wick, Sameer Singh, Karl Schultz, Sebastian Riedel, Limin Yao, Ari Kobren, Luke Vilnis and Gerome Miklau.
Bio:
Andrew McCallum is a Professor and Director of the Information Extraction and Synthesis Laboratory in the School of Computer Science at University of Massachusetts Amherst. He has published over 250 papers in many areas of AI, including natural language processing, machine learning, data mining and reinforcement learning, and his work has received over 37,000 citations. He obtained his PhD from University of Rochester in 1995 with Dana Ballard and a postdoctoral fellowship from CMU with Tom Mitchell and Sebastian Thrun. In the early 2000's he was Vice President of Research and Development at at WhizBang Labs, a 170-person start-up company that used machine learning for information extraction from the Web. He is a AAAI Fellow, the recipient of the UMass Chancellor's Award for Research and
Creative Activity, the UMass NSM Distinguished Research Award, the UMass Lilly Teaching Fellowship, and research awards from Google, IBM and Microsoft. He was the General Chair for the International Conference on Machine Learning (ICML) 2012, and is the current
president of the International Machine Learning Society, as well as member of the editorial board of the Journal of Machine Learning Research. For the past ten years, McCallum has been active in research on statistical machine learning applied to text, especially information extraction, entity resolution, semi-supervised learning, topic models, and social network analysis. His work on open peer review can be found at http://openreview.net. McCallum's web page is http://www.cs.umass.edu/~mccallum.
Host:
Carla Brodley
More information about the Colloq
mailing list