Understanding Machine Learning Models through a Data-Centric Lens

Files

TR Number

Date

2026-06-08

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Foundation models acquire their capabilities by training on internet-scale corpora that are too large to fully inspect, filter, or audit. As a result, problematic data --- copyrighted material, personal records, and sensitive attributes --- can readily enter training and create privacy risks. At the same time, training data also shapes what a deployed model can do. This dissertation studies training data through these two perspectives, taking a data-centric view of foundation models: both the privacy risks and the capability levers come from the same training data.

Part~I studies two privacy risks that training data can create. First, we develop a practical membership inference attack against large-scale multi-modal models under realistic constraints that preclude shadow training or access to the target training pipeline. Second, we identify a sustained spike in token-level prediction entropy as a precursor to memorized text emission and develop Confusion-Inducing Attacks, a principled extraction framework that systematically triggers this signal without privileged access to the training data.

Part~II studies how training data shapes model capability and what model developers can do about it, through four practical levers: trace, remove, audit, and repair. We introduce the Mirrored Influence Hypothesis, which reformulates influence estimation around forward-pass-heavy computation and enables scalable data attribution at foundation-model scale. We then develop an unlearning framework based on the restricted gradient that removes targeted influence from text-to-image diffusion models while preserving text-image alignment on the remainder. Because removal can quietly damage benign capabilities that static benchmarks fail to reveal, we develop an adaptive probing framework that exposes knowledge holes --- unintended capability losses that emerge after unlearning. Finally, we develop Diagnosis-Driven Synthesis (DDS), which converts trace-level diagnoses of model failures into targeted training data and uses a diagnostic crossover operator to repair interacting weaknesses. Together, these four levers let model developers understand, audit, control, and improve foundation models through their training data.

Description

Keywords

Data Extraction, Membership Inference Attacks, Unlearning, Data Influence Estimation, Data Synthesis, AI Safety

Citation