题 目 (TITLE):Liberal Information Extraction on the Fly
讲 座 人 (SPEAKER):Prof. Heng Ji (Rensselaer Polytechnic Institute)
主 持 人 (CHAIR): Prof. Jun Zhao
时 间 (TIME):March 21(Monday), 2016, 9:30AM
地 点 (VENUE):No.1 Conference Room (3rd floor), Intelligence Building
报告摘要(ABSTRACT):
We propose a brand new ``Liberal" Information Extraction (IE) paradigm to combine the merits of traditional IE (high quality and fine granularity) and Open IE (high scalability). Liberal IE aims to discover schemas and extract facts from any input corpus, without any annotated training data or predefined schema. Using event extraction as a case study, we present a pilot Liberal IE framework which incorporates symbolic semantics (Abstract Meaning Representation) and distributional semantics to detect and represent rich event structure, and adopts a joint typing framework to simultaneously discover types of events and participants as well as schema which is customized for the input corpus. Experiments demonstrate that Liberal IE can construct high-quality schemas, discover a high proportion of fine-grained typed events in manually defined schemas, achieve comparable performance as supervised models trained from a large amount of labeled data for pre-defined event types, as well as accurately extract many new event types and argument roles.
We will then proceed to discuss applying this paradigm to low-resource incident languages (ILs). We will use name tagging problem in an emergent setting as a case study - the tagger needs to be complete within a few hours for a new IL using very few resources. Inspired by observing how human annotators attack this challenge, we propose a new expectation-driven learning framework. In this framework we rapidly acquire, categorize, structure and zoom in on IL-specific expectations (rules, features, patterns, gazetteers, etc.) from various non-traditional sources: consulting and encoding linguistic knowledge from native speakers, mining and projecting patterns from both mono-lingual and cross-lingual corpora, and typing based on cross-lingual entity linking. We also propose a cost-aware combination approach to compose expectations. Experiments on seven low-resource languages demonstrate the effectiveness and generality of this framework: we are able to setup a name tagger for a new IL within two hours, and achieve promising results.
报告人简介(BIOGRAPHY):
Heng Ji is Edward P. Hamilton Development Chair Associate Professor in Computer Science Department of Rensselaer Polytechnic Institute. She received her Ph.D. in Computer Science from New York University. Her research interests focus on Natural Language Processing and its connections with Vision, Data Mining, Network Science, Social Cognitive Science and Security. She received "AI's 10 to Watch" Award by IEEE Intelligent Systems in 2013, NSF CAREER award in 2009, Google Research Awards in 2009 and 2014, Sloan Junior Faculty Award in 2012, IBM Watson Faculty Award in 2012 and 2014, PACLIC2012 Best Paper Runner-up, "Best of SDM2013" paper and "Best of ICDM2013" paper. She coordinated the NIST TAC Knowledge Base Population task in 2010, 2011, 2014, 2015 and 2016, served as the Information Extraction area chair for NAACL2012, ACL2013, EMNLP2013, NLPCC2014, EMNLP2015, NAACL2016 and ACL2016, the vice Program Committee Chair for IEEE/WIC/ACM WI2013 and CCL2015, Content Analysis Track Chair of WWW2015, the Financial Chair of IJCAI2016 and the Program Committee Chair of NLPCC2015. Her research is funded by the U.S. government (DARPA, ARL, AFRL, NSF, IARPA and DHS) and industry (Google, Disney, IBM and Bosch).