Zero-Shot Scene Graph Relationship Prediction using VLMs

Dutta, Amartya

Zero-Shot Scene Graph Relationship Prediction using VLMs

Files

Dutta_A_T_2025.pdf (3.1 MB)

Downloads: 199

Date

2025-03-24

Authors

Dutta, Amartya

Publisher

Virginia Tech

Abstract

Scene Graph Relationship Prediction aims to predict the interaction between the objects in an image. Despite the recent surge of interest in open-vocabulary and zero-shot SGG, most approaches still require some form of training or adaptation on the target dataset, even when using Vision-Language Models (VLMs). In this work, we propose a training-free framework for the VLMs to predict scene graph relationships. Our approach simply plugs VLMs into the pipeline without any fine-tuning, focusing on how to formulate relationship queries and aggregate predictions from the object pairs. To this end, we introduce two model-agnostic frameworks: SGRP-MC, a multiple-choice question answering (MCQA) approach, and SGRP-Open, an open-ended formulation. Evaluations on the PSG dataset reveal that well-scaled VLMs not only achieve competitive recall scores but also surpass most trained baselines by over 7% in mean recall, showcasing their strength in long-tail predicate predic- tion. Nonetheless, we identify several practical challenges: the large number of potential relationship candidates and the susceptibility of VLMs to choice ordering can affect con- sistency. Through our comparison of SGRP-MC and SGRP-Open, we highlight trade-offs in structured prediction performance between multiple-choice constraints and open-ended flexibility. Our findings establish that zero-shot scene graph relationship prediction is feasi- ble with a fully training-free VLM pipeline, laying the groundwork for leveraging large-scale foundation models for SGG without any additional fine-tuning.

Keywords

Vision Langugae Models, Scene Graphs, Zero Shot

Persistent link

https://hdl.handle.net/10919/125077

Collections

Masters Theses

Full item page

Zero-Shot Scene Graph Relationship Prediction using VLMs

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections