Learning without Expert Labels for Multimodal Data

dc.contributor.authorMaruf, Md Abdullah Alen
dc.contributor.committeechairKarpatne, Anujen
dc.contributor.committeememberHuang, Lifuen
dc.contributor.committeememberChao, Wei-Lunen
dc.contributor.committeememberLourentzou, Isminien
dc.contributor.committeememberMurali, T. M.en
dc.contributor.departmentComputer Science and#38; Applicationsen
dc.date.accessioned2025-01-10T09:01:33Zen
dc.date.available2025-01-10T09:01:33Zen
dc.date.issued2025-01-09en
dc.description.abstractWhile advancements in deep learning have been largely possible due to the availability of large-scale labeled datasets, obtaining labeled datasets at the required granularity is challenging in many real-world applications, especially in scientific domains, due to the costly and labor-intensive nature of generating annotations. Hence, there is a need to develop new paradigms for learning that do not rely on expert-labeled data and can work even with indirect supervision. Approaches for learning with indirect supervision include unsupervised learning, self-supervised learning, weakly supervised learning, few-shot learning, and knowledge distillation. This thesis addresses these opportunities in the context of multi-modal data through three main contributions. First, this thesis proposes a novel Distance-aware Negative Sampling method for self-supervised Graph Representation Learning (GRL) that learns node representations directly from the graph structure by maximizing separation between distant nodes and maximizing cohesion among nearby nodes. Second, this thesis introduces effective modifications to weakly supervised semantic segmentation (WS3) models, such as stochastic aggregation to saliency maps that improve the learning of pseudo-ground truths from class-level coarse-grained labels and address the limitations of class activation maps. Finally, this thesis evaluates whether pre-trained Vision-Language Models (VLMs) contain the necessary scientific knowledge to identify and reason about biological traits from scientific images. The zero-shot performance of 12 large VLMs is evaluated on a novel VLM4Bio dataset, along with the effects of prompting and reasoning hallucinations are explored.en
dc.description.abstractgeneralWhile advancements in machine learning (ML), such as deep learning, have been largely possible due to the availability of large-scale labeled datasets, obtaining high-quality and high-resolution labels is challenging in many real-world applications due to the costly and labor-intensive nature of generating annotations. This thesis explores new ways of training ML models without relying heavily on expert-labeled data using indirect supervision. First, it introduces a novel way of using the structure of graphs for learning representations of graph-based data. Second, it analyzes the effect of weak supervision using coarse labels for image-based data. Third, it evaluates whether current ML models can recognize and reason about scientific images on their own, aiming to make learning more efficient and less dependent on exhaustive labeling.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:42306en
dc.identifier.urihttps://hdl.handle.net/10919/124087en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectDeep Learningen
dc.subjectKnowledge-Guided Machine Learningen
dc.subjectWeak Supervisionen
dc.subjectSelf-Supervisionen
dc.subjectVision-Language Modelsen
dc.titleLearning without Expert Labels for Multimodal Dataen
dc.typeDissertationen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Maruf_M_D_2025.pdf
Size:
23.34 MB
Format:
Adobe Portable Document Format