Active Learning Under Limited Interaction with Data Labeler

Chen, Si

Active Learning Under Limited Interaction with Data Labeler

dc.contributor.author	Chen, Si	en
dc.contributor.committeechair	Jia, Ruoxi	en
dc.contributor.committeemember	Huang, Jia-Bin	en
dc.contributor.committeemember	Viswanath, Bimal	en
dc.contributor.department	Electrical and Computer Engineering	en
dc.date.accessioned	2021-09-01T20:23:26Z	en
dc.date.available	2021-09-01T20:23:26Z	en
dc.date.issued	2021	en
dc.description.abstract	Active learning (AL) aims at reducing labeling effort by identifying the most valuable unlabeled data points from a large pool. Traditional AL frameworks have two limitations: First, they perform data selection in a multi-round manner, which is time-consuming and impractical. Second, they usually assume that there are a small amount of labeled data points available in the same domain as the data in the unlabeled pool. In this thesis, we initiate the study of one-round active learning to solve the first issue. We propose DULO, a general framework for one-round setting based on the notion of data utility functions, which map a set of data points to some performance measure of the model trained on the set. We formulate the one-round active learning problem as data utility function maximization. We then propose D²ULO on the basis of DULO as a solution that solves both issues. Specifically, D²ULO leverages the idea of domain adaptation (DA) to train a data utility model on source labeled data. The trained utility model can then be used to select high-utility data in the target domain and at the same time, provide an estimate for the utility of the selected data. Our experiments show that the proposed frameworks achieves better performance compared with state-of-the-art baselines in the same setting. Particularly, D²ULO is applicable to the scenario where the source and target labels have mismatches, which is not supported by the existing works.	en
dc.description.abstractgeneral	Machine Learning (ML) has achieved huge success in recent years. Machine Learning technologies such as recommendation system, speech recognition and image recognition play an important role on human daily life. This success mainly build upon the use of large amount of labeled data: Compared with traditional programming, a ML algorithm does not rely on explicit instructions from human; instead, it takes the data along with the label as input, and aims to learn a function that can correctly map data to the label space by itself. However, data labeling requires human effort and could be time-consuming and expensive especially for datasets that contain domain-specific knowledge (e.g., disease prediction etc.) Active Learning (AL) is one of the solution to reduce data labeling effort. Specifically, the learning algorithm actively selects data points that provide more information for the model, hence a better model can be achieved with less labeled data. While traditional AL strategies do achieve good performance, it requires a small amount of labeled data as initialization and performs data selection in multi-round, which pose great challenge to its application, as there is no platform provide timely online interaction with data labeler and the interaction is often time inefficient. To deal with the limitations, we first propose DULO which a new setting of AL is studied: data selection is only allowed to be performed once. To further broaden the application of our method, we propose D²ULO which is built upon DULO and Domain Adaptation techniques to avoid the use of initial labeled data. Our experiments show that both of the proposed two frameworks achieve better performance compared with state-of-the-art baselines.	en
dc.description.degree	M.S.	en
dc.format.medium	ETD	en
dc.format.mimetype	application/pdf	en
dc.identifier.uri	http://hdl.handle.net/10919/104894	en
dc.language.iso	en_US	en
dc.publisher	Virginia Tech	en
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	en
dc.subject	Machine learning	en
dc.subject	Active Learning	en
dc.subject	Domain Adaptation	en
dc.subject	Deep Neural Networks.	en
dc.title	Active Learning Under Limited Interaction with Data Labeler	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	M.S.	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Chen_S_thesis_2021.pdf
Size:: 3.21 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters Theses