M3D: Multimodal MultiDocument Fine-Grained Inconsistency Detection

Tang, Chia-Wei

M3D: Multimodal MultiDocument Fine-Grained Inconsistency Detection

Files

Tang_C_T_2024.pdf (48.81 MB)

Downloads: 135

Date

2024-06-10

Authors

Tang, Chia-Wei

Publisher

Virginia Tech

Abstract

Validating claims from misinformation is a highly challenging task that involves understanding how each factual assertion within the claim relates to a set of trusted source materials. Existing approaches often make coarse-grained predictions but fail to identify the specific aspects of the claim that are troublesome and the specific evidence relied upon. In this paper, we introduce a method and new benchmark for this challenging task. Our method predicts the fine-grained logical relationship of each aspect of the claim from a set of multimodal documents, which include text, image(s), video(s), and audio(s). We also introduce a new benchmark (M^3DC) of claims requiring multimodal multidocument reasoning, which we construct using a novel claim synthesis technique. Experiments show that our approach significantly outperforms state-of-the-art baselines on this challenging task on two benchmarks while providing finer-grained predictions, explanations, and evidence.

Keywords

multi-modality reasoning, fine-grained reasoning, multi-document understanding, text, image, video, audio

Persistent link

https://hdl.handle.net/10919/119382

Collections

Masters Theses

Full item page

M3D: Multimodal MultiDocument Fine-Grained Inconsistency Detection

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections