VTechWorks staff will be away for the Thanksgiving holiday beginning at noon on Wednesday, November 27, through Friday, November 29. We will resume normal operations on Monday, December 2. Thank you for your patience.
 

M3D: Multimodal MultiDocument Fine-Grained Inconsistency Detection

TR Number

Date

2024-06-10

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Validating claims from misinformation is a highly challenging task that involves understanding how each factual assertion within the claim relates to a set of trusted source materials. Existing approaches often make coarse-grained predictions but fail to identify the specific aspects of the claim that are troublesome and the specific evidence relied upon. In this paper, we introduce a method and new benchmark for this challenging task. Our method predicts the fine-grained logical relationship of each aspect of the claim from a set of multimodal documents, which include text, image(s), video(s), and audio(s). We also introduce a new benchmark (M^3DC) of claims requiring multimodal multidocument reasoning, which we construct using a novel claim synthesis technique. Experiments show that our approach significantly outperforms state-of-the-art baselines on this challenging task on two benchmarks while providing finer-grained predictions, explanations, and evidence.

Description

Keywords

multi-modality reasoning, fine-grained reasoning, multi-document understanding, text, image, video, audio

Citation

Collections