Discrete Diffusion for Text Infilling

dc.contributor.authorZhang, Andrew Xinghuaen
dc.contributor.committeechairThomas, Christopher Leeen
dc.contributor.committeememberWang, Xuanen
dc.contributor.committeememberYanardag Delul, Pinaren
dc.contributor.departmentComputer Science and#38; Applicationsen
dc.date.accessioned2025-07-11T08:00:32Zen
dc.date.available2025-07-11T08:00:32Zen
dc.date.issued2025-07-10en
dc.description.abstractGenerative modeling of text is a fundamental challenge in natural language processing. While autoregressive models have achieved remarkable success, they face limitations in parallelizability and flexible control. Discrete diffusion models offer a promising alternative paradigm, leveraging iterative refinement and potentially enabling bidirectional context use, parallel generation, and flexible prompting. However, existing discrete text diffusion models typically assume fixed token positions, hindering their application to tasks requiring dynamic sequence lengths, such as unconstrained text infilling where ground-truth positional information is absent. vspace{baselineskip} This thesis introduces textbf{D}iscrete textbf{D}iffusion with textbf{O}ptimal textbf{T}ransport Position Coupling (DDOT) to overcome this critical limitation. DDOT is presented as the first discrete diffusion framework capable of handling flexible-length text infilling. At its core, DDOT employs a novel diffusion process that jointly models discrete token identities and continuous token positions. To maintain sequence coherence during the iterative generation process, a sample-level optimal transport (OT) coupling is integrated, ensuring consistent relative ordering of tokens. vspace{baselineskip} The methodology developed in this thesis is designed to be compatible with various underlying discrete diffusion techniques and pretrained denoising models. Comprehensive experimental validation on challenging constrained text generation benchmarks demonstrates DDOT's effectiveness. Results show that DDOT achieves performance competitive with state-of-the-art non-autoregressive methods, nears the quality of autoregressive models, and provides significant gains in training efficiency and flexibility for position-aware generation tasks. This research thus advances the capabilities of discrete diffusion models for complex text generation scenarios.en
dc.description.abstractgeneralThis thesis addresses the challenge of automatically filling missing text segments of arbitrary length, from a single word to entire passages, without prior knowledge of token positions. Traditional generation methods proceed in a fixed order, selecting one token at a time and assuming positions are given. The proposed framework, Discrete Diffusion with Optimal Transport Position Coupling (DDOT), treats both token identity and placement as part of a unified iterative refinement process. At each step, DDOT refines its guesses for words in the blank regions and determines their positions, allowing it to perform variable length infilling without constraints. vspace{baselineskip} To ensure that token arrangements remain coherent and respect natural word order, DDOT uses a sample level optimal transport coupling. This mechanism softly aligns tentative token placements with plausible positions relative to each other by guiding elements toward correct spatial configurations. Integrating this transport based guidance into the discrete diffusion denoising steps preserves sentence fluency even when reconstructing heavily disrupted inputs. vspace{baselineskip} Extensive experiments on standard infilling benchmarks show that DDOT matches leading non autoregressive methods and approaches the performance of strong autoregressive baselines. At the same time, it offers advantages in training efficiency and flexibility for tasks with incomplete or variable positional information. These results demonstrate that DDOT is a significant advancement in position aware text generation with potential applications in research and real world text editing tools.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:44258en
dc.identifier.urihttps://hdl.handle.net/10919/135961en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectDiscrete Diffusionen
dc.subjectText Modelingen
dc.subjectText Infillingen
dc.subjectMasked Diffusionen
dc.titleDiscrete Diffusion for Text Infillingen
dc.typeThesisen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang_AX_T_2025.pdf
Size:
1.8 MB
Format:
Adobe Portable Document Format

Collections