New Approaches to Synthetic Tabular Data Generation

Xu, Shengzhe

New Approaches to Synthetic Tabular Data Generation

Files

Xu_S_D_2025.pdf (13.84 MB)

Downloads: 617

Date

2025-07-29

Authors

Xu, Shengzhe

Publisher

Virginia Tech

Abstract

Synthetic data generation, while already becoming well-known as part of Generative AI (GenAI), has been primarily focused on images, voice, and text, which mostly have homogeneous data formats. This dissertation focuses on the modeling and generation of synthetic tables, which involve a range of characteristics: numerous variables, diverse attribute types, functional dependencies across columns, and temporal dependencies across rows. We aim to explore how to generate higher-quality synthetic tabular data through the following subproblems: (1) auto-regressive DNNs for synthetic table generation (STG), (2) large language models (LLMs) for adaptive STG with higher fidelity, (3) reducing in-context learning burden in STG via LLM priors, (4) embedding isotropy as a trust indicator for STG with LLMs, and (5) STG for next-generation wireless as a telecom application. Through Problems 1 and 2, we aim to improve the quality of generated synthetic tables; in Problem 3, we reduce the computational cost while maintaining quality; Problem 4 proposes a trust indicator for evaluating synthetic data quality by analyzing the isotropy of the model's internal embeddings; and Problem 5 demonstrates an application scenario in wireless telecommunications.

Keywords

Generative Neural Models, Tabular Data, Large Language Models

Persistent link

https://hdl.handle.net/10919/136928

Collections

Doctoral Dissertations

Full item page

New Approaches to Synthetic Tabular Data Generation

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections