ModelPred: A Framework for Predicting Trained Model from Training Data

Zeng, Yingyan

ModelPred: A Framework for Predicting Trained Model from Training Data

dc.contributor.author	Zeng, Yingyan	en
dc.contributor.committeechair	Jia, Ruoxi	en
dc.contributor.committeemember	Abbott, A. Lynn	en
dc.contributor.committeemember	Jin, Ran	en
dc.contributor.department	Electrical and Computer Engineering	en
dc.date.accessioned	2024-08-07T18:53:54Z	en
dc.date.available	2024-08-07T18:53:54Z	en
dc.date.issued	2024-06-06	en
dc.description.abstract	In this work, we propose ModelPred, a framework that helps to understand the impact of changes in training data on a trained model. This is critical for building trust in various stages of a machine learning pipeline: from cleaning poor-quality samples and tracking important ones to be collected during data preparation, to calibrating uncertainty of model prediction, to interpreting why certain behaviors of a model emerge during deployment. Specifically, ModelPred learns a parameterized function that takes a dataset S as the input and predicts the model obtained by training on S. Our work differs from the recent work of Datamodels as we aim for predicting the trained model parameters directly instead of the trained model behaviors. We demonstrate that a neural network-based set function class is capable of learning the complex relationships between the training data and model parameters. We introduce novel global and local regularization techniques to prevent overfitting and we rigorously characterize the expressive power of neural networks (NN) in approximating the end-to-end training process. Through extensive empirical investigations, we show that ModelPred enables a variety of applications that boost the interpretability and accountability of machine learning (ML), such as data valuation, data selection, memorization quantification, and model calibration.	en
dc.description.abstractgeneral	With the prevalence of large and complicated Artificial Intelligence (AI) models, it is important to build trust in the various stages of a machine learning model pipeline, from cleaning poor-quality samples and tracking important ones to be collected during the training data preparation, to calibrating uncertainty of model prediction during the inference stage, to interpreting why certain behaviors of a model emerge during deployment. In this work, we propose ModelPred, a framework that helps to understand the impact of changes in training data on a trained model. To achieve this, ModelPred learns a parameterized function that takes a dataset S as the input and predicts the model obtained by training on S, thus learning the impact from data on the model efficiently. Our work differs from the recent work of Datamodels [28] as we aim for predicting the trained model parameters directly instead of the trained model behaviors. We demonstrate that a neural network-based set function class is capable of learning the complex relationships between the training data and model parameters. We introduce novel global and local regularization techniques to enhance the generalizability and prevent overfitting. We also rigorously characterize the expressive power of neural networks (NN) in approximating the end-to-end training process. Through extensive empirical investigations, we show that ModelPred enables a variety of applications that boost the interpretability and accountability of machine learning (ML), such as data valuation, data selection, memorization quantification, and model calibration. This greatly enhances the trustworthy of machine learning models.	en
dc.description.degree	Master of Science	en
dc.description.notes	Also published as Zeng, Y., Wang, J. T., Chen, S., Just, H. A., Jin, R., & Jia, R. (2023, February). ModelPred: A Framework for Predicting Trained Model from Training Data. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) (pp. 432-449). IEEE. https://doi.org/10.1109/SaTML54575.2023.00037	en
dc.description.sponsorship	Amazon-Virginia Tech Initiative in Efficient and Robust Machine Learning	en
dc.format.medium	ETD	en
dc.format.mimetype	application/pdf	en
dc.identifier.uri	https://hdl.handle.net/10919/120887	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Neural Network Approximability	en
dc.subject	Data Valuation	en
dc.subject	Trustworthy Machine Learning	en
dc.title	ModelPred: A Framework for Predicting Trained Model from Training Data	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zeng_Y_T_2024.pdf
Size:: 10.67 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters Theses