Label-Efficient Visual Understanding with Consistency Constraints


TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Tech


Modern deep neural networks are proficient at solving various visual recognition and understanding tasks, as long as a sufficiently large labeled dataset is available during the training time. However, the progress of these visual tasks is limited by the number of manual annotations. On the other hand, it is usually time-consuming and error-prone to annotate visual data, rendering the challenge of scaling up human labeling for many visual tasks. Fortunately, it is easy to collect large-scale, diverse unlabeled visual data from the Internet. And we can acquire a large amount of synthetic visual data with annotations from game engines effortlessly. In this dissertation, we explore how to utilize the unlabeled data and synthetic labeled data for various visual tasks, aiming to replace or reduce the direct supervision from the manual annotations. The key idea is to encourage deep neural networks to produce consistent predictions across different transformations (\eg geometry, temporal, photometric, etc.).

We organize the dissertation as follows. In Part I, we propose to use the consistency over different geometric formulations and a cycle consistency over time to tackle the low-level scene geometry perception tasks in a self-supervised learning setting. In Part II, we tackle the high-level semantic understanding tasks in a semi-supervised learning setting, with the constraint that different augmented views of the same visual input maintain consistent semantic information. In Part III, we tackle the cross-domain image segmentation problem. By encouraging an adaptive segmentation model to output consistent results for a diverse set of strongly-augmented synthetic data, the model learns to perform test-time adaptation on unseen target domains with one single forward pass, without model training or optimization at the inference time.



Label-Efficient, Consistency Regularization, Visual Understanding, Self-Supervised Learning, Semi-Supervised Learning, Pseudo Labeling, Test-Time Adaptation, BatchNorm Calibration, Cross-Domain Generalization