Uncertainty Estimation on Natural Language Processing

Files

TR Number

Date

2024-05-15

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Text plays a pivotal role in our daily lives, encompassing various forms such as social media posts, news articles, books, reports, and more. Consequently, Natural Language Processing (NLP) has garnered widespread attention. This technology empowers us to undertake tasks like text classification, entity recognition, and even crafting responses within a dialogue context. However, despite the expansive utility of NLP, it frequently necessitates a critical decision: whether to place trust in a model's predictions. To illustrate, consider a state-of-the-art (SOTA) model entrusted with diagnosing a disease or assessing the veracity of a rumor. An incorrect prediction in such scenarios can have dire consequences, impacting individuals' health or tarnishing their reputation. Consequently, it becomes imperative to establish a reliable method for evaluating the reliability of an NLP model's predictions, which is our focus-uncertainty estimation on NLP.

Though many works have researched uncertainty estimation or NLP, the combination of these two domains is rare. This is because most NLP research emphasizes model prediction performance but tends to overlook the reliability of NLP model predictions. Additionally, current uncertainty estimation models may not be suitable for NLP due to the unique characteristics of NLP tasks, such as the need for more fine-grained information in named entity recognition. Therefore, this dissertation proposes novel uncertainty estimation methods for different NLP tasks by considering the NLP task's distinct characteristics.

The NLP tasks are categorized into natural language understanding (NLU) and natural language generation (NLG, such as text summarization). Among the NLU tasks, the understanding could be on two views, global-view (e.g. text classification at document level) and local-view (e.g. natural language inference at sentence level and named entity recognition at token level). As a result, we research uncertainty estimation on three tasks: text classification, named entity recognition, and text summarization. Besides, because few-shot text classification has captured much attention recently, we also research the uncertainty estimation on few-shot text classification.

For the first topic, uncertainty estimation on text classification, few uncertainty models focus on improving the performance of text classification where human resources are involved. In response to this gap, our research focuses on enhancing the accuracy of uncertainty scores by bolstering the confidence associated with winning scores. we introduce MSD, a novel model comprising three distinct components: 'mix-up,' 'self-ensembling,' and 'distinctiveness score.' The primary objective of MSD is to refine the accuracy of uncertainty scores by mitigating the issue of overconfidence in winning scores while simultaneously considering various categories of uncertainty. seamlessly integrate with different Deep Neural Networks. Extensive experiments with ablation settings are conducted on four real-world datasets, resulting in consistently competitive improvements.

Our second topic focuses on uncertainty estimation on few-shot text classification (UEFTC), which has few or even only one available support sample for each class. UEFTC represents an underexplored research domain where, due to limited data samples, a UEFTC model predicts an uncertainty score to assess the likelihood of classification errors. However, traditional uncertainty estimation models in text classification are ill-suited for UEFTC since they demand extensive training data, while UEFTC operates in a few-shot scenario, typically providing just a few support samples, or even just one, per class. To tackle this challenge, we introduce Contrastive Learning from Uncertainty Relations (CLUR) as a solution tailored for UEFTC. CLUR exhibits the unique capability to be effectively trained with only one support sample per class, aided by pseudo uncertainty scores. A distinguishing feature of CLUR is its autonomous learning of these pseudo uncertainty scores, in contrast to previous approaches that relied on manual specification. Our investigation of CLUR encompasses four model structures, allowing us to evaluate the performance of three commonly employed contrastive learning components in the context of UEFTC. Our findings highlight the effectiveness of two of these components.

Our third topic focuses on uncertainty estimation on sequential labeling. Sequential labeling involves the task of assigning labels to individual tokens in a sequence, exemplified by Named Entity Recognition (NER). Despite significant advancements in enhancing NER performance in prior research, the realm of uncertainty estimation for NER (UE-NER) remains relatively uncharted but is of paramount importance. This topic focuses on UE-NER, seeking to gauge uncertainty scores for NER predictions. Previous models for uncertainty estimation often overlook two distinctive attributes of NER: the interrelation among entities (where the learning of one entity's embedding depends on others) and the challenges posed by incorrect span predictions in entity extraction. To address these issues, we introduce the Sequential Labeling Posterior Network (SLPN), designed to estimate uncertainty scores for the extracted entities while considering uncertainty propagation from other tokens. Additionally, we have devised an evaluation methodology tailored to the specific nuances of wrong-span cases.

Our fourth topic focuses on an overlooked question that persists regarding the evaluation reliability of uncertainty estimation in text summarization (UE-TS). Text summarization, a key task in natural language generation (NLG), holds significant importance, particularly in domains where inaccuracies can have serious consequences, such as healthcare. UE-TS has garnered attention due to the potential risks associated with erroneous summaries. However, the reliability of evaluating UE-TS methods raises concerns, stemming from the interdependence between uncertainty model metrics and the wide array of NLG metrics. To address these concerns, we introduce a comprehensive UE-TS benchmark incorporating twenty-six NLG metrics across four dimensions. This benchmark evaluates the uncertainty estimation capabilities of two large language models and one pre-trained language model across two datasets. Additionally, it assesses the effectiveness of fourteen common uncertainty estimation methods. Our study underscores the necessity of utilizing diverse, uncorrelated NLG metrics and uncertainty estimation techniques for a robust evaluation of UE-TS methods.

Description

Keywords

Uncertainty Estimation, Bayesian Neural Network, Evidential Neural Network, Text Classification, Few-Shot, Named Entity Recognition, Text Summarization

Citation