Hierarchical Bayesian Dataset Selection

Zhou, Xiaona

Hierarchical Bayesian Dataset Selection

Files

XiaonaZhou_Masters_ETD.pdf (961.02 KB)

Downloads: 208

Date

2024-05

Authors

Zhou, Xiaona

Publisher

Virginia Tech

Abstract

Despite the profound impact of deep learning across various domains, supervised model training critically depends on access to large, high-quality datasets, which are often challenging to identify. To address this, we introduce Hierarchical Bayesian Dataset Selection (HBDS), the first dataset selection algorithm that utilizes hierarchical Bayesian modeling, designed for collaborative data-sharing ecosystems. The proposed method efficiently decomposes the contributions of dataset groups and individual datasets to local model performance using Bayesian updates with small data samples. Our experiments on two benchmark datasets demonstrate that HBDS not only offers a computationally lightweight solution but also enhances interpretability compared to existing data selection methods, by revealing deep insights into dataset interrelationships through learned posterior distributions. HBDS outperforms traditional non-hierarchical methods by correctly identifying all relevant datasets, achieving optimal accuracy with fewer computational steps, even when initial model accuracy is low. Specifically, HBDS surpasses its non-hierarchical counterpart by 1.8% on DIGIT-FIVE and 0.7% on DOMAINNET, on average. In settings with limited resources, HBDS achieves a 6.9% higher accuracy than its non-hierarchical counterpart. These results confirm HBDS's effectiveness in identifying datasets that improve the accuracy and efficiency of deep learning models when collaborative data utilization is essential.

Keywords

Hierarchical Bayesian, Data-Sharing, Reinforcement Learning, Dataset Selection

Persistent link

https://hdl.handle.net/10919/119391

Collections

Masters Theses

Full item page

Hierarchical Bayesian Dataset Selection

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections