Active Learning Under Limited Interaction with Data Labeler

dc.contributor.authorChen, Sien
dc.contributor.committeechairJia, Ruoxien
dc.contributor.committeememberHuang, Jia-Binen
dc.contributor.committeememberViswanath, Bimalen
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2021-09-01T20:23:26Zen
dc.date.available2021-09-01T20:23:26Zen
dc.date.issued2021en
dc.description.abstractActive learning (AL) aims at reducing labeling effort by identifying the most valuable unlabeled data points from a large pool. Traditional AL frameworks have two limitations: First, they perform data selection in a multi-round manner, which is time-consuming and impractical. Second, they usually assume that there are a small amount of labeled data points available in the same domain as the data in the unlabeled pool. In this thesis, we initiate the study of one-round active learning to solve the first issue. We propose DULO, a general framework for one-round setting based on the notion of data utility functions, which map a set of data points to some performance measure of the model trained on the set. We formulate the one-round active learning problem as data utility function maximization. We then propose D²ULO on the basis of DULO as a solution that solves both issues. Specifically, D²ULO leverages the idea of domain adaptation (DA) to train a data utility model on source labeled data. The trained utility model can then be used to select high-utility data in the target domain and at the same time, provide an estimate for the utility of the selected data. Our experiments show that the proposed frameworks achieves better performance compared with state-of-the-art baselines in the same setting. Particularly, D²ULO is applicable to the scenario where the source and target labels have mismatches, which is not supported by the existing works.en
dc.description.abstractgeneralMachine Learning (ML) has achieved huge success in recent years. Machine Learning technologies such as recommendation system, speech recognition and image recognition play an important role on human daily life. This success mainly build upon the use of large amount of labeled data: Compared with traditional programming, a ML algorithm does not rely on explicit instructions from human; instead, it takes the data along with the label as input, and aims to learn a function that can correctly map data to the label space by itself. However, data labeling requires human effort and could be time-consuming and expensive especially for datasets that contain domain-specific knowledge (e.g., disease prediction etc.) Active Learning (AL) is one of the solution to reduce data labeling effort. Specifically, the learning algorithm actively selects data points that provide more information for the model, hence a better model can be achieved with less labeled data. While traditional AL strategies do achieve good performance, it requires a small amount of labeled data as initialization and performs data selection in multi-round, which pose great challenge to its application, as there is no platform provide timely online interaction with data labeler and the interaction is often time inefficient. To deal with the limitations, we first propose DULO which a new setting of AL is studied: data selection is only allowed to be performed once. To further broaden the application of our method, we propose D²ULO which is built upon DULO and Domain Adaptation techniques to avoid the use of initial labeled data. Our experiments show that both of the proposed two frameworks achieve better performance compared with state-of-the-art baselines.en
dc.description.degreeM.S.en
dc.format.mediumETDen
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttp://hdl.handle.net/10919/104894en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en
dc.subjectMachine learningen
dc.subjectActive Learningen
dc.subjectDomain Adaptationen
dc.subjectDeep Neural Networks.en
dc.titleActive Learning Under Limited Interaction with Data Labeleren
dc.typeThesisen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameM.S.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chen_S_thesis_2021.pdf
Size:
3.21 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections