Towards Generalizable Information Extraction with Limited Supervision

Wang, Sijia

Towards Generalizable Information Extraction with Limited Supervision

Files

Wang_S_D_2024.pdf (8.4 MB)

Downloads: 99

Supporting documents (13.96 KB)

Downloads: 13

Date

2024-09-18

Authors

Wang, Sijia

Publisher

Virginia Tech

Abstract

Supervised approaches, especially those employing deep neural networks, have showcased impressive performance, relying on a significant volume of manual annotations. However, their effectiveness encounters challenges when attempting to generalize to new languages, domains, or types, particularly in the absence of sufficient annotations. Current methods fall short in effectively addressing information extraction (IE) under limited supervision. In this dissertation, we approach information extraction with limited supervision from three perspectives. Firstly, we refine the previous classification-based extraction paradigm by introducing a query-and-extract framework, which uses target information as natural language queries to extract candidate information from the input text. Additionally, we leverage the excellent generation capability of large language models (LLMs) to produce high-quality annotation data, enriching IE semantics within limited annotation data. We also utilize LLMs' instruction-following capability to iteratively refine and optimize solutions through a debating process. Beyond text-only IE, we define a new multimodal IE task that links an entity mention within heterogeneous information sources to a knowledge base with limited annotation data. We demonstrate that excellent multimodal IE performance can be achieved, even with limited annotation data, by leveraging monomodal external information. These combined efforts aim to make optimal use of limited knowledge, ensuring more robust and generalizable solutions.

Keywords

Information Extraction, Limited Supervision, Event Extraction, Entity Linking

Persistent link

https://hdl.handle.net/10919/121157

Collections

Doctoral Dissertations

Full item page

Towards Generalizable Information Extraction with Limited Supervision

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections