VTechWorks staff will be away for the winter holidays until January 5, 2026, and will respond to requests at that time.
 

GlitchAgent: Detecting Video Game Glitches from Gameplay Videos

Files

TR Number

Date

2025-09-16

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

The increasing complexity of modern video games has made Quality Assurance (QA) a critical yet challenging bottleneck in the video game development and maintenance lifecycle, which relies heavily on expensive, labor-intensive, and inefficient manual testing. Automated glitch detection from gameplay videos offers a promising alternative, but is hampered by a profound scarcity of annotated datasets, the ambiguity of identifying glitches without temporal context, and the need for precise temporal localization of anomalies. In this thesis, we propose a novel approach to address these challenges. First, we introduce a new video-based benchmark dataset VideoGlitch for video game glitch detection, featuring diverse gameplay videos. The videos are annotated with detailed, natural-language glitch descriptions and precise temporal timestamps, created through a semi-automated pipeline leveraging Multimodal Large Language Models (MLLMs) and human validation. Second, we propose GlitchAgent, a multi-stage framework for open-ended glitch detection with precise timestamps. GlitchAgent operates by different video preprocessing procedure, then generating glitch hypotheses with the Local Glitch Detector, tracing the full duration of anomalies via a novel temporal propagation mechanism, and synthesizing a single, temporal description for each unique glitch with corresponding timestamps. To evaluate our system, we introduce the LLM-as-the-judge Glitch Detection Score (GDS), a novel metric that uses an LLM for semantic scoring and couples it with temporal Intersection over Union (IoU) for a more robust assessment than traditional metrics. Experiments demonstrate that GlitchAgent significantly enhances the performance of various MLLM backbones, substantially improving detection precision and temporal grounding accuracy compared to baseline approaches.

Description

Keywords

Multimodal Large Language Model, Video Understanding, Glitch Detection

Citation

Collections