Understanding Machine Learning Models through a Data-Centric Lens
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Foundation models acquire their capabilities by training on internet-scale corpora that are too large to fully inspect, filter, or audit. As a result, problematic data --- copyrighted material, personal records, and sensitive attributes --- can readily enter training and create privacy risks. At the same time, training data also shapes what a deployed model can do. This dissertation studies training data through these two perspectives, taking a data-centric view of foundation models: both the privacy risks and the capability levers come from the same training data.
Part~I studies two privacy risks that training data can create. First, we develop a practical membership inference attack against large-scale multi-modal models under realistic constraints that preclude shadow training or access to the target training pipeline. Second, we identify a sustained spike in token-level prediction entropy as a precursor to memorized text emission and develop Confusion-Inducing Attacks, a principled extraction framework that systematically triggers this signal without privileged access to the training data.
Part~II studies how training data shapes model capability and what model developers can do about it, through four practical levers: trace, remove, audit, and repair. We introduce the Mirrored Influence Hypothesis, which reformulates influence estimation around forward-pass-heavy computation and enables scalable data attribution at foundation-model scale. We then develop an unlearning framework based on the restricted gradient that removes targeted influence from text-to-image diffusion models while preserving text-image alignment on the remainder. Because removal can quietly damage benign capabilities that static benchmarks fail to reveal, we develop an adaptive probing framework that exposes knowledge holes --- unintended capability losses that emerge after unlearning. Finally, we develop Diagnosis-Driven Synthesis (DDS), which converts trace-level diagnoses of model failures into targeted training data and uses a diagnostic crossover operator to repair interacting weaknesses. Together, these four levers let model developers understand, audit, control, and improve foundation models through their training data.