Towards Generalizable Information Extraction with Limited Supervision

Wang, Sijia

Towards Generalizable Information Extraction with Limited Supervision

dc.contributor.author	Wang, Sijia	en
dc.contributor.committeechair	Huang, Lifu	en
dc.contributor.committeemember	Zhou, Dawei	en
dc.contributor.committeemember	Reddy, Chandan K.	en
dc.contributor.committeemember	Wang, Xuan	en
dc.contributor.committeemember	Yu, Mo	en
dc.contributor.committeemember	Lourentzou, Ismini	en
dc.contributor.department	Computer Science and#38; Applications	en
dc.date.accessioned	2024-09-19T08:00:12Z	en
dc.date.available	2024-09-19T08:00:12Z	en
dc.date.issued	2024-09-18	en
dc.description.abstract	Supervised approaches, especially those employing deep neural networks, have showcased impressive performance, relying on a significant volume of manual annotations. However, their effectiveness encounters challenges when attempting to generalize to new languages, domains, or types, particularly in the absence of sufficient annotations. Current methods fall short in effectively addressing information extraction (IE) under limited supervision. In this dissertation, we approach information extraction with limited supervision from three perspectives. Firstly, we refine the previous classification-based extraction paradigm by introducing a query-and-extract framework, which uses target information as natural language queries to extract candidate information from the input text. Additionally, we leverage the excellent generation capability of large language models (LLMs) to produce high-quality annotation data, enriching IE semantics within limited annotation data. We also utilize LLMs' instruction-following capability to iteratively refine and optimize solutions through a debating process. Beyond text-only IE, we define a new multimodal IE task that links an entity mention within heterogeneous information sources to a knowledge base with limited annotation data. We demonstrate that excellent multimodal IE performance can be achieved, even with limited annotation data, by leveraging monomodal external information. These combined efforts aim to make optimal use of limited knowledge, ensuring more robust and generalizable solutions.	en
dc.description.abstractgeneral	This dissertation explores the development of information extraction (IE) algorithms and systems that work effectively with limited supervision. Information extraction is a complex and challenging task that involves extracting structured data from plain text. Traditional IE systems are often tailored to specific tasks and domains where ample annotated data is available, limiting their ability to adapt to new domains. This research focuses on developing IE systems that can generalize to new domains with limited supervision, reducing the reliance on extensive annotations. The proposed solutions demonstrate the potential to transfer knowledge from existing annotations to new tasks and domains, emphasizing the importance of learning from limited data and improving knowledge transfer to previously unknown domains.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:41269	en
dc.identifier.uri	https://hdl.handle.net/10919/121157	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Information Extraction	en
dc.subject	Limited Supervision	en
dc.subject	Event Extraction	en
dc.subject	Entity Linking	en
dc.title	Towards Generalizable Information Extraction with Limited Supervision	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science & Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Wang_S_D_2024.pdf
Size:: 8.4 MB
Format:: Adobe Portable Document Format

Download

Name:: Wang_S_D_2024_support_1.docx
Size:: 13.96 KB
Format:: Microsoft Word XML
Description:: Supporting documents

Download

Collections

Doctoral Dissertations