Discrete Diffusion for Text Infilling
dc.contributor.author | Zhang, Andrew Xinghua | en |
dc.contributor.committeechair | Thomas, Christopher Lee | en |
dc.contributor.committeemember | Wang, Xuan | en |
dc.contributor.committeemember | Yanardag Delul, Pinar | en |
dc.contributor.department | Computer Science and#38; Applications | en |
dc.date.accessioned | 2025-07-11T08:00:32Z | en |
dc.date.available | 2025-07-11T08:00:32Z | en |
dc.date.issued | 2025-07-10 | en |
dc.description.abstract | Generative modeling of text is a fundamental challenge in natural language processing. While autoregressive models have achieved remarkable success, they face limitations in parallelizability and flexible control. Discrete diffusion models offer a promising alternative paradigm, leveraging iterative refinement and potentially enabling bidirectional context use, parallel generation, and flexible prompting. However, existing discrete text diffusion models typically assume fixed token positions, hindering their application to tasks requiring dynamic sequence lengths, such as unconstrained text infilling where ground-truth positional information is absent. vspace{baselineskip} This thesis introduces textbf{D}iscrete textbf{D}iffusion with textbf{O}ptimal textbf{T}ransport Position Coupling (DDOT) to overcome this critical limitation. DDOT is presented as the first discrete diffusion framework capable of handling flexible-length text infilling. At its core, DDOT employs a novel diffusion process that jointly models discrete token identities and continuous token positions. To maintain sequence coherence during the iterative generation process, a sample-level optimal transport (OT) coupling is integrated, ensuring consistent relative ordering of tokens. vspace{baselineskip} The methodology developed in this thesis is designed to be compatible with various underlying discrete diffusion techniques and pretrained denoising models. Comprehensive experimental validation on challenging constrained text generation benchmarks demonstrates DDOT's effectiveness. Results show that DDOT achieves performance competitive with state-of-the-art non-autoregressive methods, nears the quality of autoregressive models, and provides significant gains in training efficiency and flexibility for position-aware generation tasks. This research thus advances the capabilities of discrete diffusion models for complex text generation scenarios. | en |
dc.description.abstractgeneral | This thesis addresses the challenge of automatically filling missing text segments of arbitrary length, from a single word to entire passages, without prior knowledge of token positions. Traditional generation methods proceed in a fixed order, selecting one token at a time and assuming positions are given. The proposed framework, Discrete Diffusion with Optimal Transport Position Coupling (DDOT), treats both token identity and placement as part of a unified iterative refinement process. At each step, DDOT refines its guesses for words in the blank regions and determines their positions, allowing it to perform variable length infilling without constraints. vspace{baselineskip} To ensure that token arrangements remain coherent and respect natural word order, DDOT uses a sample level optimal transport coupling. This mechanism softly aligns tentative token placements with plausible positions relative to each other by guiding elements toward correct spatial configurations. Integrating this transport based guidance into the discrete diffusion denoising steps preserves sentence fluency even when reconstructing heavily disrupted inputs. vspace{baselineskip} Extensive experiments on standard infilling benchmarks show that DDOT matches leading non autoregressive methods and approaches the performance of strong autoregressive baselines. At the same time, it offers advantages in training efficiency and flexibility for tasks with incomplete or variable positional information. These results demonstrate that DDOT is a significant advancement in position aware text generation with potential applications in research and real world text editing tools. | en |
dc.description.degree | Master of Science | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:44258 | en |
dc.identifier.uri | https://hdl.handle.net/10919/135961 | en |
dc.language.iso | en | en |
dc.publisher | Virginia Tech | en |
dc.rights | Creative Commons Attribution 4.0 International | en |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | en |
dc.subject | Discrete Diffusion | en |
dc.subject | Text Modeling | en |
dc.subject | Text Infilling | en |
dc.subject | Masked Diffusion | en |
dc.title | Discrete Diffusion for Text Infilling | en |
dc.type | Thesis | en |
thesis.degree.discipline | Computer Science & Applications | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | masters | en |
thesis.degree.name | Master of Science | en |
Files
Original bundle
1 - 1 of 1