Label-Efficient Visual Understanding with Consistency Constraints

Zou, Yuliang

Label-Efficient Visual Understanding with Consistency Constraints

dc.contributor.author	Zou, Yuliang	en
dc.contributor.committeechair	Huang, Jia-Bin	en
dc.contributor.committeemember	Tokekar, Pratap	en
dc.contributor.committeemember	Abbott, A. Lynn	en
dc.contributor.committeemember	Dhillon, Harpreet Singh	en
dc.contributor.committeemember	Huang, Bert	en
dc.contributor.department	Electrical and Computer Engineering	en
dc.date.accessioned	2022-05-25T08:00:21Z	en
dc.date.available	2022-05-25T08:00:21Z	en
dc.date.issued	2022-05-24	en
dc.description.abstract	Modern deep neural networks are proficient at solving various visual recognition and understanding tasks, as long as a sufficiently large labeled dataset is available during the training time. However, the progress of these visual tasks is limited by the number of manual annotations. On the other hand, it is usually time-consuming and error-prone to annotate visual data, rendering the challenge of scaling up human labeling for many visual tasks. Fortunately, it is easy to collect large-scale, diverse unlabeled visual data from the Internet. And we can acquire a large amount of synthetic visual data with annotations from game engines effortlessly. In this dissertation, we explore how to utilize the unlabeled data and synthetic labeled data for various visual tasks, aiming to replace or reduce the direct supervision from the manual annotations. The key idea is to encourage deep neural networks to produce consistent predictions across different transformations (\eg geometry, temporal, photometric, etc.). We organize the dissertation as follows. In Part I, we propose to use the consistency over different geometric formulations and a cycle consistency over time to tackle the low-level scene geometry perception tasks in a self-supervised learning setting. In Part II, we tackle the high-level semantic understanding tasks in a semi-supervised learning setting, with the constraint that different augmented views of the same visual input maintain consistent semantic information. In Part III, we tackle the cross-domain image segmentation problem. By encouraging an adaptive segmentation model to output consistent results for a diverse set of strongly-augmented synthetic data, the model learns to perform test-time adaptation on unseen target domains with one single forward pass, without model training or optimization at the inference time.	en
dc.description.abstractgeneral	Recently, deep learning has emerged as one of the most powerful tools to solve various visual understanding tasks. However, the development of deep learning methods is significantly limited by the amount of manually labeled data. On the other hand, it is usually time-consuming and error-prone to annotate visual data, making the human labeling process not easily scalable. Fortunately, it is easy to collect large-scale, diverse raw visual data from the Internet (\eg search engines, YouTube, Instagram, etc.). And we can acquire a large amount of synthetic visual data with annotations from game engines effortlessly. In this dissertation, we explore how we can utilize the raw visual data and synthetic data for various visual tasks, aiming to replace or reduce the direct supervision from the manual annotations. The key idea behind this is to encourage deep neural networks to produce consistent predictions of the same visual input across different transformations (\eg geometry, temporal, photometric, etc.). We organize the dissertation as follows. In Part I, we propose using the consistency over different geometric formulations and a forward-backward cycle consistency over time to tackle the low-level scene geometry perception tasks, using unlabeled visual data only. In Part II, we tackle the high-level semantic understanding tasks using both a small amount of labeled data and a large amount of unlabeled data jointly, with the constraint that different augmented views of the same visual input maintain consistent semantic information. In Part III, we tackle the cross-domain image segmentation problem. By encouraging an adaptive segmentation model to output consistent results for a diverse set of strongly-augmented synthetic data, the model learns to perform test-time adaptation on unseen target domains.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:34390	en
dc.identifier.uri	http://hdl.handle.net/10919/110313	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	Creative Commons Attribution-NonCommercial 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	en
dc.subject	Label-Efficient	en
dc.subject	Consistency Regularization	en
dc.subject	Visual Understanding	en
dc.subject	Self-Supervised Learning	en
dc.subject	Semi-Supervised Learning	en
dc.subject	Pseudo Labeling	en
dc.subject	Test-Time Adaptation	en
dc.subject	BatchNorm Calibration	en
dc.subject	Cross-Domain Generalization	en
dc.title	Label-Efficient Visual Understanding with Consistency Constraints	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zou_Y_D_2022.pdf
Size:: 41.16 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations