Reinforcement Learning–Based Discrete Prompt Optimization for Neuro-Symbolic Structured Simplification of Complex Game Descriptions with Large Language Models
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis investigates how large language models can be trained to perform structured simplification of complex, free-form game descriptions for the GameChangineer platform. The work formalizes simplification as a discrete prompt optimization problem and introduces a neuro-symbolic pipeline that maps raw natural language into controlled GameChangineer sentences via scenario normalization, retrieval-augmented code generation, and AST-based FACTS extraction. A reinforcement learning framework based on Proximal Policy Optimization optimizes discrete prompt edits using task-specific rewards that combine grammar compliance, semantic agreement with the FACTS contract, and compiler validity of the resulting games. Experiments on diverse arcade-style game descriptions show that the proposed GC-Repair and sentence correction agents significantly improve grammar-constrained generation, robustness to noisy user input, and end-to-end code correctness compared to direct LLM rewriting baselines.