ModelPred: A Framework for Predicting Trained Model from Training Data

Zeng, Yingyan

ModelPred: A Framework for Predicting Trained Model from Training Data

Files

Zeng_Y_T_2024.pdf (10.67 MB)

Downloads: 22

Date

2024-06-06

Authors

Zeng, Yingyan

Publisher

Virginia Tech

Abstract

In this work, we propose ModelPred, a framework that helps to understand the impact of changes in training data on a trained model. This is critical for building trust in various stages of a machine learning pipeline: from cleaning poor-quality samples and tracking important ones to be collected during data preparation, to calibrating uncertainty of model prediction, to interpreting why certain behaviors of a model emerge during deployment. Specifically, ModelPred learns a parameterized function that takes a dataset S as the input and predicts the model obtained by training on S. Our work differs from the recent work of Datamodels as we aim for predicting the trained model parameters directly instead of the trained model behaviors. We demonstrate that a neural network-based set function class is capable of learning the complex relationships between the training data and model parameters. We introduce novel global and local regularization techniques to prevent overfitting and we rigorously characterize the expressive power of neural networks (NN) in approximating the end-to-end training process. Through extensive empirical investigations, we show that ModelPred enables a variety of applications that boost the interpretability and accountability of machine learning (ML), such as data valuation, data selection, memorization quantification, and model calibration.

Keywords

Neural Network Approximability, Data Valuation, Trustworthy Machine Learning

Persistent link

https://hdl.handle.net/10919/120887

Collections

Masters Theses

Full item page

ModelPred: A Framework for Predicting Trained Model from Training Data

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections