Learning with Constraint-Based Weak Supervision

Arachie, Chidubem Gibson

Learning with Constraint-Based Weak Supervision

dc.contributor.author	Arachie, Chidubem Gibson	en
dc.contributor.committeechair	Huang, Bert	en
dc.contributor.committeemember	Ramakrishnan, Narendran	en
dc.contributor.committeemember	Reddy, Chandan K.	en
dc.contributor.committeemember	Bach, Stephen H.	en
dc.contributor.committeemember	Eldardiry, Hoda	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2022-04-29T08:00:27Z	en
dc.date.available	2022-04-29T08:00:27Z	en
dc.date.issued	2022-04-28	en
dc.description.abstract	Recent adaptations of machine learning models in many businesses has underscored the need for quality training data. Typically, training supervised machine learning systems involves using large amounts of human-annotated data. Labeling data is expensive and can be a limiting factor in using machine learning models. To enable continued integration of machine learning systems in businesses and also easy access by users, researchers have proposed several alternatives to supervised learning. Weak supervision is one such alternative. Weak supervision or weakly supervised learning involves using noisy labels (weak signals of the data) from multiple sources to train machine learning systems. A weak supervision model aggregates multiple noisy label sources called weak signals in order to produce probabilistic labels for the data. The main allure of weak supervision is that it provides a cheap yet effective substitute for supervised learning without need for labeled data. The key challenge in training weakly supervised machine learning models is that the weak supervision leaves ambiguity about the possible true labelings of the data. In this dissertation, we aim to address the challenge associated with training weakly supervised learning models by developing new weak supervision methods. Our work focuses on learning with constraint-based weak supervision algorithms. Firstly, we will propose an adversarial labeling approach for weak supervision. In this method, the adversary chooses the labels for the data and a model learns by minimising the error made by the adversarial model. Secondly, we will propose a simple constrained based approach that minimises a quadratic objective function in order to solve for the labels of the data. Next we explain the notion of data consistency for weak supervision and propose a data consistent method for weakly supervised learning. This approach combines weak supervision labels with features of the training data to make the learned labels consistent with the data. Lastly, we use this data consistent approach to propose a general approach for improving the performance of weak supervision models. In this method, we combine weak supervision with active learning in order to generate a model that outperforms each individual approach using only a handful of labeled data. For each algorithm we propose, we report extensive empirical validation of it by testing it on standard text and image classification datasets. We compare each approach against baseline and state-of-the-art methods and show that in most cases we match or outperform the methods we compare against. We report significant gains of our method on both binary and multi-class classification tasks.	en
dc.description.abstractgeneral	Machine learning models learn to make predictions from data. In supervised learning, a machine learning model is fed data and corresponding labels for the data so that the model can learn to predict labels for new unseen test data. Curation of large fully supervised datasets is expensive and time consuming since it involves subject matter experts providing labels for each individual data example. The cost of collecting labels has become one of the major roadblocks for training machine learning models. An alternative to supervised training of machine learning models is weak supervision. Weak supervision or weakly supervised learning trains with cheap, and easy to define signals that noisily label the data. We refer to these signals as weak signals. A weak supervision model combines various weak signals to produce training labels for the data. The key challenge in weak supervision is how to combine the different weak signals while navigating misleading correlations in their errors. In this dissertation, we propose several algorithms for weakly supervised learning. We classify our methods as constraint-based weak supervision since weak supervision is provided as constraints to our algorithms. We use experiments on different text and image classification datasets to show that our methods are effective and outperform competing methods that we compare against. Lastly, we propose a general framework for improving the performance of weak supervision models by incorporating a few labeled data. With this method we are able to close the gap to supervised learning without the need for labeling all the data examples.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:34296	en
dc.identifier.uri	http://hdl.handle.net/10919/109767	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Machine Learning	en
dc.subject	Weak Supervision	en
dc.subject	Active Learning	en
dc.subject	Semi-supervised Learning	en
dc.subject	Adversarial Optimization	en
dc.title	Learning with Constraint-Based Weak Supervision	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Arachie_CG_D_2022.pdf
Size:: 1016.55 KB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations