VTechWorks staff will be away for the Thanksgiving holiday beginning at noon on Wednesday, November 27, through Friday, November 29. We will resume normal operations on Monday, December 2. Thank you for your patience.
 

Advancing Chart Question Answering with Robust Chart Component Recognition

TR Number

Date

2024-08-13

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

The task of comprehending charts [1, 2, 3] presents significant challenges for machine learning models due to the diverse and intricate shapes of charts. The chart extraction task ensures the precise identification of key components, while the chart question answering (ChartQA) task integrates visual and textual information, facilitating accurate responses to queries based on the chart's content. To approach ChartQA, this research focuses on two main aspects. Firstly, we introduce ChartFormer, an integrated framework that simultaneously identifies and classifies every chart element. ChartFormer extends beyond traditional data visualization by identifying descriptive components such as the chart title, legend, and axes, providing a comprehensive understanding of the chart's content. ChartFormer is particularly effective for complex instance segmentation tasks that involve a wide variety of class objects with unique visual structures. It utilizes an end-to-end transformer architecture, which enhances its ability to handle the intricacies of diverse and distinct object features. Secondly, we present Question-guided Deformable Co-Attention (QDCAt), which facilitates multimodal fusion by incorporating question information into a deformable offset network and enhancing visual representation from ChartFormer through a deformable co-attention block.

Description

Keywords

Multimodal Learning, Instance Segmentation, Visual Question Answering

Citation

Collections