Browsing by Author "Huang, Lifu"
Now showing 1 - 20 of 26
Results Per Page
Sort Options
- Advancing Chart Question Answering with Robust Chart Component RecognitionZheng, Hanwen (Virginia Tech, 2024-08-13)The task of comprehending charts [1, 2, 3] presents significant challenges for machine learning models due to the diverse and intricate shapes of charts. The chart extraction task ensures the precise identification of key components, while the chart question answering (ChartQA) task integrates visual and textual information, facilitating accurate responses to queries based on the chart's content. To approach ChartQA, this research focuses on two main aspects. Firstly, we introduce ChartFormer, an integrated framework that simultaneously identifies and classifies every chart element. ChartFormer extends beyond traditional data visualization by identifying descriptive components such as the chart title, legend, and axes, providing a comprehensive understanding of the chart's content. ChartFormer is particularly effective for complex instance segmentation tasks that involve a wide variety of class objects with unique visual structures. It utilizes an end-to-end transformer architecture, which enhances its ability to handle the intricacies of diverse and distinct object features. Secondly, we present Question-guided Deformable Co-Attention (QDCAt), which facilitates multimodal fusion by incorporating question information into a deformable offset network and enhancing visual representation from ChartFormer through a deformable co-attention block.
- Analyzing and Navigating Electronic Theses and DissertationsAhuja, Aman (Virginia Tech, 2023-07-21)Electronic Theses and Dissertations (ETDs) contain valuable scholarly information that can be of immense value to the scholarly community. Millions of ETDs are now publicly available online, often through one of many digital libraries. However, since a majority of these digital libraries are institutional repositories with the objective being content archiving, they often lack end-user services needed to make this valuable data useful for the scholarly community. To effectively utilize such data to address the information needs of users, digital libraries should support various end-user services such as document search and browsing, document recommendation, as well as services to make navigation of long PDF documents easier. In recent years, with advances in the field of machine learning for text data, several techniques have been proposed to support such end-user services. However, limited research has been conducted towards integrating such techniques with digital libraries. This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, as well as to support end-user services for digital libraries, such as document browsing and long document navigation. First, we review several machine learning models that can be used to support such services. Next, to support a comprehensive evaluation of different models, as well as to train models that are tailored to the ETD data, we introduce several new datasets from the ETD domain. To minimize the resources required to develop high quality training datasets required for supervised training, a novel AI-aided annotation method is also discussed. Finally, we propose techniques and frameworks to support the various digital library services such as search, browsing, and recommendation. The key contributions of this research are as follows: - A system to help with parsing long scholarly documents such as ETDs by means of object-detection methods trained to extract digital objects from long documents. The parsed documents can be used for further downstream tasks such as long document navigation, figure and/or table search, etc. - Datasets to support supervised training of object detection models on scholarly documents of multiple types, such as born-digital and scanned. In addition to manually annotated datasets, a framework (along with the resulting dataset) for AI-aided annotation also is proposed. - A web-based system for information extraction from long PDF theses and dissertations, into a structured format such as XML, aimed at making scholarly literature more accessible to users with disabilities. - A topic-modeling based framework to support exploration tasks such as searching and/or browsing documents (and document portions, e.g., chapters) by topic, document recommendation, topic recommendation, and describing temporal topic trends.
- Behind the Scenes: Evaluating Computer Vision Embedding Techniques for Discovering Similar Photo BackgroundsDodson, Terryl Dwayne (Virginia Tech, 2023-07-11)Historical photographs can generate significant cultural and economic value, but often their subjects go unidentified. However, if analyzed correctly, visual clues in these photographs can open up new directions in identifying unknown subjects. For example, many 19th century photographs contain painted backdrops that can be mapped to a specific photographer or location, but this research process is often manual, time-consuming, and unsuccessful. AI-based computer vision algorithms could be used to automatically identify painted backdrops or photographers or cluster photos with similar backdrops in order to aid researchers. However, it is unknown which computer vision algorithms are feasible for painted backdrop identification or which techniques work better than others. We present three studies evaluating four different types of image embeddings – Inception, CLIP, MAE, and pHash – across a variety of metrics and techniques. We find that a workflow using CLIP embeddings combined with a background classifier and simulated user feedback performs best. We also discuss implications for human-AI collaboration in visual analysis and new possibilities for digital humanities scholarship.
- Commonsense for Zero-Shot Natural Language Video LocalizationHolla, Meghana (Virginia Tech, 2023-07-07)Zero-shot Natural Language-Video Localization (NLVL) has shown promising results in training NLVL models solely with raw video data through dynamic video segment proposal generation and pseudo-query annotations. However, existing pseudo-queries lack grounding in the source video and suffer from a lack of common ground due to their unstructured nature. In this work, we investigate the effectiveness of commonsense reasoning in zero-shot NLVL. Specifically, we present CORONET, a zero-shot NLVL framework that utilizes commonsense information to bridge the gap between videos and generated pseudo-queries through a commonsense enhancement module. Our approach employs Graph Convolutional Networks (GCN) to encode commonsense information extracted from a knowledge graph, conditioned on the video, and cross-attention mechanisms to enhance the encoded video and pseudo-query vectors prior to localization. Through empirical evaluations on two benchmark datasets, we demonstrate that our model surpasses both zero-shot and weakly supervised baselines. These results underscore the significance of leveraging commonsense reasoning abilities in multimodal understanding tasks.
- A Deep Learning Approach to Predict Full-Field Stress Distribution in Composite MaterialsSepasdar, Reza (Virginia Tech, 2021-05-17)This thesis proposes a deep learning approach to predict stress at various stages of mechanical loading in 2-D representations of fiber-reinforced composites. More specifically, the full-field stress distribution at elastic and at an early stage of damage initiation is predicted based on the microstructural geometry. The required data set for the purposes of training and validation are generated via high-fidelity simulations of several randomly generated microstructural representations with complex geometries. Two deep learning approaches are employed and their performances are compared: fully convolutional generator and Pix2Pix translation. It is shown that both the utilized approaches can well predict the stress distributions at the designated loading stages with high accuracy.
- End-to-End Multimodal Fact-Checking and Explanation Generation: A Challenging Dataset and ModelsYao, Barry; Shah, Aditya; Sun, Lichao; Cho, Jin-Hee; Huang, Lifu (ACM, 2023-07-19)We propose end-to-end multimodal fact-checking and explanation generation, where the input is a claim and a large collection of web sources, including articles, images, videos, and tweets, and the goal is to assess the truthfulness of the claim by retrieving relevant evidence and predicting a truthfulness label (e.g., support, refute or not enough information), and to generate a statement to summarize and explain the reasoning and ruling process. To support this research, we construct Mocheg, a large-scale dataset consisting of 15,601 claims where each claim is annotated with a truthfulness label and a ruling statement, and 33,880 textual paragraphs and 12,112 images in total as evidence. To establish baseline performances on Mocheg, we experiment with several state-of-the-art neural architectures on the three pipelined subtasks: multimodal evidence retrieval, claim verification, and explanation generation, and demonstrate that the performance of the state-of-the-art end-to-end multimodal factchecking does not provide satisfactory outcomes. To the best of our knowledge, we are the first to build the benchmark dataset and solutions for end-to-end multimodal fact-checking and explanation generation. The dataset, source code and model checkpoints are available at https://github.com/VT-NLP/Mocheg.
- Explainable Interactive Projections for Image DataHan, Huimin (Virginia Tech, 2023-01-12)Making sense of large collections of images is difficult. Dimension reductions (DR) assist by organizing images in a 2D space based on similarities, but provide little support for explaining why images were placed together or apart in the 2D space. Additionally, they do not provide support for modifying and updating the 2D space to explore new relationships and organizations of images. To address these problems, we present an interactive DR method for images that uses visual features extracted by a deep neural network to project the images into 2D space and provides visual explanations of image features that contributed to the 2D location. In addition, it allows people to directly manipulate the 2D projection space to define alternative relationships and explore subsequent projections of the images. With an iterative cycle of semantic interaction and explainable-AI feedback, people can explore complex visual relationships in image data. Our approach to human-AI interaction integrates visual knowledge from both human mental models and pre-trained deep neural models to explore image data. Two usage scenarios are provided to demonstrate that our method is able to capture human feedback and incorporate it into the model. Our visual explanations help bridge the gap between the feature space and the original images to illustrate the knowledge learned by the model, creating a synergy between human and machine that facilitates a more complete analysis experience.
- Explainable Neural Claim Verification Using RationalizationGurrapu, Sai Charan (Virginia Tech, 2022-06-15)The dependence on Natural Language Processing (NLP) systems has grown significantly in the last decade. Recent advances in deep learning have enabled language models to generate high-quality text at the same level as human-written text. If this growth continues, it can potentially lead to increased misinformation, which is a significant challenge. Although claim verification techniques exist, they lack proper explainability. Numerical scores such as Attention and Lime and visualization techniques such as saliency heat maps are insufficient because they require specialized knowledge. It is inaccessible and challenging for the nonexpert to understand black-box NLP systems. We propose a novel approach called, ExClaim for explainable claim verification using NLP rationalization. We demonstrate that our approach can predict a verdict for the claim but also justify and rationalize its output as a natural language explanation (NLE). We extensively evaluate the system using statistical and Explainable AI (XAI) metrics to ensure the outcomes are valid, verified, and trustworthy to help reinforce the human-AI trust. We propose a new subfield in XAI called Rational AI (RAI) to improve research progress on rationalization and NLE-based explainability techniques. Ensuring that claim verification systems are assured and explainable is a step towards trustworthy AI systems and ultimately helps mitigate misinformation.
- Few-Shot and Zero-Shot Learning for Information ExtractionGong, Jiaying (Virginia Tech, 2024-05-31)Information extraction aims to automatically extract structured information from unstructured texts. Supervised information extraction requires large quantities of labeled training data, which is time-consuming and labor-intensive. This dissertation focuses on information extraction, especially relation extraction and attribute-value extraction in e-commerce, with few labeled (few-shot learning) or even no labeled (zero-shot learning) training data. We explore multi-source auxiliary information and novel learning techniques to integrate semantic auxiliary information with the input text to improve few-shot learning and zero-shot learning. For zero-shot and few-shot relation extraction, the first method explores the existing data statistics and leverages auxiliary information including labels, synonyms of labels, keywords, and hypernyms of name entities to enable zero-shot learning for the unlabeled data. We build an automatic hypernym extraction framework to help acquire hypernyms of different entities directly from the web. The second method explores the relations between seen classes and new classes. We propose a prompt-based model with semantic knowledge augmentation to recognize new relation triplets under the zero-shot setting. In this method, we transform the problem of zero-shot learning into supervised learning with the generated augmented data for new relations. We design the prompts for training using the auxiliary information based on an external knowledge graph to integrate semantic knowledge learned from seen relations. The third work utilizes auxiliary information from images to enhance few-shot learning. We propose a multi-modal few-shot relation extraction model that leverages both textual and visual semantic information to learn a multi-modal representation jointly. To supplement the missing contexts in text, this work integrates both local features (object-level) and global features (pixel-level) from different modalities through image-guided attention, object-guided attention, and hybrid feature attention to solve the problem of sparsity and noise. We then explore the few-shot and zero-shot aspect (attribute-value) extraction in the e-commerce application field. The first work studies the multi-label few-shot learning by leveraging the auxiliary information of anchor (label) and category description based on the prototypical networks, where the hybrid attention helps alleviate ambiguity and capture more informative semantics by calculating both the label-relevant and query-related weights. A dynamic threshold is learned by integrating the semantic information from support and query sets to achieve multi-label inference. The second work explores multi-label zero-shot learning via semi-inductive link prediction of the heterogeneous hypergraph. The heterogeneous hypergraph is built with higher-order relations (generated by the auxiliary information of user behavior data and product inventory data) to capture the complex and interconnected relations between users and the products.
- Generative Chatbot Framework for Cybergrooming PreventionWang, Pei (Virginia Tech, 2021-12-20)Cybergrooming refers to the crime of establishing personal close relationships with potential victims, commonly teens, for the purpose of sexual exploitation or abuse via online social media platforms. Cybergrooming has been recognized as a serious social problem. However, there have been insufficient programs to provide proactive prevention to protect the youth users from cybergrooming. In this thesis, we present a generative chatbot framework, called SERI (Stop cybERgroomIng), that can generate simulated conversations between a perpetrator chatbot and a potential victim chatbot. To realize the simulation of authentic conversations in the context of cybergrooming, we take deep reinforcement learning (DRL)-based dialogue generation to simulate the authentic conversations between a perpetrator and a potential victim. The design and development of the SERI are motivated to provide a safe and authentic chatting environment to enhance the youth's precautionary awareness and sensitivity of cybergrooming while any unnecessary ethical issues (e.g., the potential misuse of the SERI) are removed or minimized. We developed the SERI as a preliminary platform that the perpetrator chatbot can be deployed in social media environments to interact with human users (i.e., youth) and observe the conversations that the youth users respond to strangers or acquaintances when they are asked for private or sensitive information by the perpetrator. We evaluated the quality of conversations generated by the SERI based on open-source, referenced, and unreferenced metrics as well as human evaluation. The evaluation results show that the SERI can generate authentic conversations between two chatbots compared to the original conversations from the used datasets in perplexity and MaUde scores.
- Generative models meet similarity search: efficient, heuristic-free and robust retrievalDoan, Khoa Dang (Virginia Tech, 2021-09-23)The rapid growth of digital data, especially visual and textual contents, brings many challenges to the problem of finding similar data. Exact similarity search, which aims to exhaustively find all relevant items through a linear scan in a dataset, is impractical due to its high computational complexity. Approximate-nearest-neighbor (ANN) search methods, especially the Learning-to-hash or Hashing methods, provide principled approaches that balance the trade-offs between the quality of the guesses and the computational cost for web-scale databases. In this era of data explosion, it is crucial for the hashing methods to be both computationally efficient and robust to various scenarios such as when the application has noisy data or data that slightly changes over time (i.e., out-of-distribution). This Thesis focuses on the development of practical generative learning-to-hash methods and explainable retrieval models. We first identify and discuss the various aspects where the framework of generative modeling can be used to improve the model designs and generalization of the hashing methods. Then we show that these generative hashing methods similarly enjoy several appealing empirical and theoretical properties of generative modeling. Specifically, the proposed generative hashing models generalize better with important properties such as low-sample requirement, and out-of-distribution and data-corruption robustness. Finally, in domains with structured data such as graphs, we show that the computational methods in generative modeling have an interesting utility beyond estimating the data distribution and describe a retrieval framework that can explain its decision by borrowing the algorithmic ideas developed in these methods. Two subsets of generative hashing methods and a subset of explainable retrieval methods are proposed. For the first hashing subset, we propose a novel adversarial framework that can be easily adapted to a new problem domain and three training algorithms that learn the hash functions without several hyperparameters commonly found in the previous hashing methods. The contributions of our work include: (1) Propose novel algorithms, which are based on adversarial learning, to learn the hash functions; (2) Design computationally efficient Wasserstein-related adversarial approaches which have low computational and sample efficiency; (3) Conduct extensive experiments on several benchmark datasets in various domains, including computational advertising, and text and image retrieval, for performance evaluation. For the second hashing subset, we propose energy-based hashing solutions which can improve the generalization and robustness of existing hashing approaches. The contributions of our work for this task include: (1) Propose data-synthesis solutions to improve the generalization of existing hashing methods; (2) Propose energy-based hashing solutions which exhibit better robustness against out-of-distribution and corrupted data; (3) Conduct extensive experiments for performance evaluations on several benchmark datasets in the image retrieval domain. Finally, for the last subset of explainable retrieval methods, we propose an optimal alignment algorithm that achieves a better similarity approximation for a pair of structured objects, such as graphs, while capturing the alignment between the nodes of the graphs to explain the similarity calculation. The contributions of our work for this task include: (1) Propose a novel optimal alignment algorithm for comparing two sets of bag-of-vectors embeddings; (2) Propose a differentiable computation to learn the parameters of the proposed optimal alignment model; (3) Conduct extensive experiments, for performance evaluation of both the similarity approximation task and the retrieval task, on several benchmark graph datasets.
- GraphDHT: Scaling Graph Neural Networks' Distributed Training on Edge Devices on a Peer-to-Peer Distributed Hash Table NetworkGupta, Chirag (Virginia Tech, 2024-01-03)This thesis presents an innovative strategy for distributed Graph Neural Network (GNN) training, leveraging a peer-to-peer network of heterogeneous edge devices interconnected through a Distributed Hash Table (DHT). As GNNs become increasingly vital in analyzing graph-structured data across various domains, they pose unique challenges in computational demands and privacy preservation, particularly when deployed for training on edge devices like smartphones. To address these challenges, our study introduces the Adaptive Load- Balanced Partitioning (ALBP) technique in the GraphDHT system. This approach optimizes the division of graph datasets among edge devices, tailoring partitions to the computational capabilities of each device. By doing so, ALBP ensures efficient resource utilization across the network, significantly improving upon traditional participant selection strategies that often overlook the potential of lower-performance devices. Our methodology's core is weighted graph partitioning and model aggregation in GNNs, based on partition ratios, improving training efficiency and resource use. ALBP promotes inclusive device participation in training, overcoming computational limits and privacy concerns in large-scale graph data processing. Utilizing a DHT-based system enhances privacy in the peer-to-peer setup. The GraphDHT system, tested across various datasets and GNN architectures, shows ALBP's effectiveness in distributed GNN training and its broad applicability in different domains and structures. This contributes to applied machine learning, especially in optimizing distributed learning on edge devices.
- ImageSI: Interactive Deep Learning for Image Semantic InteractionLin, Jiayue (Virginia Tech, 2024-06-04)Interactive deep learning frameworks are crucial for effectively exploring and analyzing complex image datasets in visual analytics. However, existing approaches often face challenges related to inference accuracy and adaptability. To address these issues, we propose ImageSI, a framework integrating deep learning models with semantic interaction techniques for interactive image data analysis. Unlike traditional methods, ImageSI directly incorporates user feedback into the image model, updating underlying embeddings through customized loss functions, thereby enhancing the performance of dimension reduction tasks. We introduce three variations of ImageSI, ImageSI$_{text{MDS}^{-1}}$, prioritizing explicit pairwise relationships from user interaction, and ImageSI$_{text{DRTriplet}}$ and ImageSI$_{text{PHTriplet}}$, emphasizing clustering by defining groups of images based on user input. Through usage scenarios and quantitative analyses centered on algorithms, we demonstrate the superior performance of ImageSI$_{text{DRTriplet}}$ and ImageSI$_{text{MDS}^{-1}}$ in terms of inference accuracy and interaction efficiency. Moreover, ImageSI$_{text{PHTriplet}}$ shows competitive results. The baseline model, WMDS$^{-1}$, generally exhibits lower performance metrics.
- Integrated Predictive Modeling and Analytics for Crisis ManagementAlhamadani, Abdulaziz Abdulrhman (Virginia Tech, 2024-05-15)The surge in the application of big data and predictive analytics in fields of crisis management, such as pandemics and epidemics, highlights the vital need for advanced research in these areas, particularly in the wake of the COVID-19 pandemic. Traditional methods, which typically rely on historical data to forecast future trends, fall short in addressing the complex and ever-changing nature of challenges like pandemics and public health crises. This inadequacy is further underscored by the pandemic's significant impact on various sectors, notably healthcare, government, and the hotel industry. Current models often overlook key factors such as static spatial elements, socioeconomic conditions, and the wealth of data available from social media, which are crucial for a comprehensive understanding and effective response to these multifaceted crises. This thesis employs spatial forecasting and predictive analytics to address crisis management in several distinct but interrelated contexts: the COVID-19 pandemic, the opioid crisis, and the impact of the pandemic on the hotel industry. The first part of the study focuses on using big data analytics to explore the relationship between socioeconomic factors and the spread of COVID-19 at the zip code level, aiming to predict high-risk areas for infection. The second part delves into the opioid crisis, utilizing semi-supervised deep learning techniques to monitor and categorize drug-related discussions on Reddit. The third part concentrates on developing spatial forecasting and providing explanations of the rising epidemic of drug overdose fatalities. The fourth part of the study extends to the realm of the hotel industry, aiming to optimize customer experience by analyzing online reviews and employing a localized Large Language Model to generate future customer trends and scenarios. Across these studies, the thesis aims to provide actionable insights and comprehensive solutions for effectively managing these major crises. For the first work, the majority of current research in pandemic modeling primarily relies on historical data to predict dynamic trends such as COVID-19. This work makes the following contributions in spatial COVID-19 pandemic forecasting: 1) the development of a unique model solely employing a wide range of socioeconomic indicators to forecast areas most susceptible to COVID-19, using detailed static spatial analysis, 2) identification of the most and least influential socioeconomic variables affecting COVID-19 transmission within communities, 3) construction of a comprehensive dataset that merges state-level COVID-19 statistics with corresponding socioeconomic attributes, organized by zip code. For the second work, we make the following contributions in detecting drug Abuse crisis via social media: 1) enhancing the Dynamic Query Expansion (DQE) algorithm to dynamically detect and extract evolving drug names in Reddit comments, utilizing a list curated from government and healthcare agencies, 2) constructing a textual Graph Convolutional Network combined with word embeddings to achieve fine-grained drug abuse classification in Reddit comments, identifying seven specific drug classes for the first time, 3) conducting extensive experiments to validate the framework, outperforming six baseline models in drug abuse classification and demonstrating effectiveness across multiple types of embeddings. The third study focuses on developing spatial forecasting and providing explanations of the escalating epidemic of drug overdose fatalities. Current research in this field has shown a deficiency in comprehensive explanations of the crisis, spatial analyses, and predictions of high-risk zones for drug overdoses. Addressing these gaps, this study contributes in several key areas: 1) Establishing a framework for spatially forecasting drug overdose fatalities predominantly affecting U.S. counties, 2) Proposing solutions for dealing with scarce and heterogeneous data sets, 3) Developing an algorithm that offers clear and actionable insights into the crisis, and 4) Conducting extensive experiments to validate the effectiveness of our proposed framework. In the fourth study, we address the profound impact of the pandemic on the hotel industry, focusing on the optimization of customer experience. Traditional methodologies in this realm have predominantly relied on survey data and limited segments of social media analytics. Those methods are informative but fall short of providing a full picture due to their inability to include diverse perspectives and broader customer feedback. Our study aims to make the following contributions: 1) the development of an integrated platform that distinguishes and extracts positive and negative Memorable Experiences (MEs) from online customer reviews within the hotel industry, 2) The incorporation of an advanced analytical module that performs temporal trend analysis of MEs, utilizing sophisticated data mining algorithms to dissect customer feedback on a monthly and yearly scale, 3) the implementation of an advanced tool that generates prospective and unexplored Memorable Experiences (MEs) by utilizing a localized Large Language Model (LLM) with keywords extracted from authentic customer experiences to aid hotel management in preparing for future customer trends and scenarios. Building on the integrated predictive modeling approaches developed in the earlier parts of this dissertation, this final section explores the significant impacts of the COVID-19 pandemic on the airline industry. The pandemic has precipitated substantial financial losses and operational disruptions, necessitating innovative crisis management strategies within this sector. This study introduces a novel analytical framework, EAGLE (Enhancing Airline Groundtruth Labels and Review rating prediction), which utilizes Large Language Models (LLMs) to improve the accuracy and objectivity of customer sentiment analysis in strategic airline route planning. EAGLE leverages LLMs for zero-shot pseudo-labeling and zero-shot text classification, to enhance the processing of customer reviews without the biases of manual labeling. This approach streamlines data analysis, and refines decision-making processes which allows airlines to align route expansions with nuanced customer preferences and sentiments effectively. The comprehensive application of LLMs in this context underscores the potential of predictive analytics to transform traditional crisis management strategies by providing deeper, more actionable insights.
- Joint Biomedical Event Extraction and Entity Linking via Iterative Collaborative TrainingLi, Xiaochu (Virginia Tech, 2023-05)Biomedical entity linking and event extraction are two crucial tasks to support text understanding and retrieval in the biomedical domain. These two tasks intrinsically benefit each other: entity linking disambiguates the biomedical concepts by referring to external knowledge bases and the domain knowledge further provides additional clues to understand and extract the biological processes, while event extraction identifies a key trigger and entities involved to describe each biological process which also captures the structural context to better disambiguate the biomedical entities. However, previous research typically solves these two tasks separately or in a pipeline, leading to error propagation. What's more, it's even more challenging to solve these two tasks together as there is no existing dataset that contains annotations for both tasks. To solve these challenges, we propose joint biomedical entity linking and event extraction by regarding the event structures and entity references in knowledge bases as latent variables and updating the two task-specific models in an iterative training framework: (1) predicting the missing variables for each partially annotated dataset based on the current two task-specific models, and (2) updating the parameters of each model on the corresponding pseudo completed dataset. Experimental results on two benchmark datasets: Genia 2011 for event extraction and BC4GO for entity linking, show that our joint framework significantly improves the model for each individual task and outperforms the strong baselines for both tasks. We will make the code and model checkpoints publicly available once the paper is accepted.
- M3D: Multimodal MultiDocument Fine-Grained Inconsistency DetectionTang, Chia-Wei (Virginia Tech, 2024-06-10)Validating claims from misinformation is a highly challenging task that involves understanding how each factual assertion within the claim relates to a set of trusted source materials. Existing approaches often make coarse-grained predictions but fail to identify the specific aspects of the claim that are troublesome and the specific evidence relied upon. In this paper, we introduce a method and new benchmark for this challenging task. Our method predicts the fine-grained logical relationship of each aspect of the claim from a set of multimodal documents, which include text, image(s), video(s), and audio(s). We also introduce a new benchmark (M^3DC) of claims requiring multimodal multidocument reasoning, which we construct using a novel claim synthesis technique. Experiments show that our approach significantly outperforms state-of-the-art baselines on this challenging task on two benchmarks while providing finer-grained predictions, explanations, and evidence.
- Multimodal Representation Learning for Textual Reasoning over Knowledge GraphsChoudhary, Nurendra (Virginia Tech, 2023-05-18)Knowledge graphs (KGs) store relational information in a flexible triplet schema and have become ubiquitous for information storage in domains such as web search, e-commerce, social networks, and biology. Retrieval of information from KGs is generally achieved through logical reasoning, but this process can be computationally expensive and has limited performance due to the large size and complexity of relationships within the KGs. Furthermore, to extend the usage of KGs to non-expert users, retrieval over them cannot solely rely on logical reasoning but also needs to consider text-based search. This creates a need for multi-modal representations that capture both the semantic and structural features from the KGs. The primary objective of the proposed work is to extend the accessibility of KGs to non-expert users/institutions by enabling them to utilize non-technical textual queries to search over the vast amount of information stored in KGs. To achieve this objective, the research aims to solve four limitations: (i) develop a framework for logical reasoning over KGs that can learn representations to capture hierarchical dependencies between entities, (ii) design an architecture that can effectively learn the logic flow of queries from natural language text, (iii) create a multi-modal architecture that can capture inherent semantic and structural features from the entities and KGs, respectively, and (iv) introduce a novel hyperbolic learning framework to enable the scalability of hyperbolic neural networks over large graphs using meta-learning. The proposed work is distinct from current research because it models the logical flow of textual queries in hyperbolic space and uses it to perform complex reasoning over large KGs. The models developed in this work are evaluated on both the standard research setting of logical reasoning, as well as, real-world scenarios of query matching and search, specifically, in the e-commerce domain. In summary, the proposed work aims to extend the accessibility of KGs to non-expert users by enabling them to use non-technical textual queries to search vast amounts of information stored in KGs. To achieve this objective, the work proposes the use of multi-modal representations that capture both semantic and structural features from the KGs, and a novel hyperbolic learning framework to enable scalability of hyperbolic neural networks over large graphs. The work also models the logical flow of textual queries in hyperbolic space to perform complex reasoning over large KGs. The models developed in this work are evaluated on both the standard research setting of logical reasoning and real-world scenarios in the e-commerce domain.
- N-ary Cross-sentence Relation Extraction: From Supervised to Unsupervised LearningYuan, Chenhan (Virginia Tech, 2021-05-19)Relation extraction is the problem of extracting relations between entities described in the text. Relations identify a common "fact" described by distinct entities. Conventional relation extraction approaches focus on supervised binary intra-sentence relations, where the assumption is relations only exist between two entities within the same sentence. These approaches have two key limitations. First, binary intra-sentence relation extraction methods can not extract a relation in a fact that is described by more than two entities. Second, these methods cannot extract relations that span more than one sentence, which commonly occurs as the number of entities increases. Third, these methods assume a supervised setting and are therefore not able to extract relations in the absence of sufficient labeled data for training. This work aims to overcome these limitations by developing n-ary cross-sentence relation extraction methods for both supervised and unsupervised settings. Our work has three main goals and contributions: (1) two unsupervised binary intra-sentence relation extraction methods, (2) a supervised n-ary cross-sentence relation extraction method, and (3) an unsupervised n-ary cross-sentence relation extraction method. To achieve these goals, our work includes the following contributions: (1) an automatic labeling method for n-ary cross-sentence data, which is essential for model training, (2) a reinforcement learning-based sentence distribution estimator to minimize the impact of noise on model training, (3) a generative clustering-based technique for intra-sentence unsupervised relation extraction, (4) a variational autoencoder-based technique for unsupervised n-ary cross-sentence relation extraction, and (5) a sentence group selector that identifies groups of sentences that form relations.
- Narrative Characteristics in Refugee Discourse: An Analysis of American Public Opinion on Afghan Refugee Crisis After the Taliban TakeoverDogan, Hulya (Virginia Tech, 2023-06-22)The United States (U.S.) military withdrawal from Afghanistan in August 2021 was met with turmoil as Taliban regained control of most of the country, including Kabul. These events have affected many and were widely discussed on social media, especially in the U.S. In this work, we focus on Twitter discourse regarding these events, especially potential opinion shifts over time and the effect social media posts by established U.S. legislators might have had on online public perception. To this end, we investigate two datasets on the war in Afghanistan, consisting of Twitter posts by self-identified U.S. accounts and conversation threads initiated by U.S. politicians. We find that Twitter users' discussions revolve around the Kabul airport event, President Biden's handling of the situation, and people affected by the U.S. withdrawal. Microframe analysis indicates that discourse centers the humanitarianism underlying these occurrences and politically leans liberal, focusing on care and fairness. Lastly, network analysis shows that Republicans are far more active on Twitter compared to Democrats and there is more positive sentiment than negative in their conversations.
- Non-linguistic Notions in Language Modeling: Learning, Retention, and ApplicationsSharma, Mandar (Virginia Tech, 2024-09-11)Language modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models (less than 1 Billion parameters) than can run natively on-device. Between the complementary capabilities of qualitative and quantitative reasoning, this thesis focuses on the latter, where the goal is to devise mechanisms to instill quantitative reasoning capabilities into these models. However, instilling this notion is not as straight forward as traditional end-to-end learning. The learning of quantitative notions include the ability of the model to discern between regular linguistic tokens and magnitude/scale-oriented non-linguistic tokens. The learning of these notions, specially after pre-training, comes at a cost for these models: catastrophic forgetting. Thus, learning needs to be followed with retention - making sure these models do not forget what they have learned. Thus, we first motivate the need for numeracy-enhanced models via their potential applications in field of data-to-text generation (D2T), showcasing how these models behave as quantitative reasoners as-is. Then, we devise both token-level training interventions and information-theoretic training interventions to numerically enhance these models, with the latter specifically focused on combating catastrophic forgetting. Our information-theoretic interventions not only lead to numerically-enhanced models but lend us critical insights into the learning behavior of these models, especially when it comes to adapting these models to the target task distribution from their pretraining distribution. Finally, we extrapolate these insights to devise more effective strategies transfer learning and unlearning for language modeling.