VTechWorks Repository :: Browsing by Author "Huang, Jia-Bin"

Browsing by Author "Huang, Jia-Bin"

Now showing 1 - 20 of 49

3D Deep Learning for Object-Centric Geometric Perception
Li, Xiaolong (Virginia Tech, 2022-06-30)
Object-centric geometric perception aims at extracting the geometric attributes of 3D objects. These attributes include shape, pose, and motion of the target objects, which enable fine-grained object-level understanding for various tasks in graphics, computer vision, and robotics. With the growth of 3D geometry data and 3D deep learning methods, it becomes more and more likely to achieve such tasks directly using 3D input data. Among different 3D representations, a 3D point cloud is a simple, common, and memory-efficient representation that could be directly retrieved from multi-view images, depth scans, or LiDAR range images. Different challenges exist in achieving object-centric geometric perception, such as achieving a fine-grained geometric understanding of common articulated objects with multiple rigid parts, learning disentangled shape and pose representations with fewer labels, or tackling dynamic and sequential geometric input in an end-to-end fashion. Here we identify and solve these challenges from a 3D deep learning perspective by designing effective and generalizable 3D representations, architectures, and pipelines. We propose the first deep pose estimation for common articulated objects by designing a novel hierarchical invariant representation. To push the boundary of 6D pose estimation for common rigid objects, a simple yet effective self-supervised framework is designed to handle unlabeled partial segmented scans. We further contribute a novel 4D convolutional neural network called PointMotionNet to learn spatio-temporal features for 3D point cloud sequences. All these works advance the domain of object-centric geometric perception from a unique 3D deep learning perspective.
Action Recognition with Knowledge Transfer
Choi, Jin-Woo (Virginia Tech, 2021-01-07)
Recent progress on deep neural networks has shown remarkable action recognition performance from videos. The remarkable performance is often achieved by transfer learning: training a model on a large-scale labeled dataset (source) and then fine-tuning the model on the small-scale labeled datasets (targets). However, existing action recognition models do not always generalize well on new tasks or datasets because of the following two reasons. i) Current action recognition datasets have a spurious correlation between action types and background scene types. The models trained on these datasets are biased towards the scene instead of focusing on the actual action. This scene bias leads to poor generalization performance. ii) Directly testing the model trained on the source data on the target data leads to poor performance as the source, and target distributions are different. Fine-tuning the model on the target data can mitigate this issue. However, manual labeling small- scale target videos is labor-intensive. In this dissertation, I propose solutions to these two problems. For the first problem, I propose to learn scene-invariant action representations to mitigate the scene bias in action recognition models. Specifically, I augment the standard cross-entropy loss for action classification with 1) an adversarial loss for the scene types and 2) a human mask confusion loss for videos where the human actors are invisible. These two losses encourage learning representations unsuitable for predicting 1) the correct scene types and 2) the correct action types when there is no evidence. I validate the efficacy of the proposed method by transfer learning experiments. I trans- fer the pre-trained model to three different tasks, including action classification, temporal action localization, and spatio-temporal action detection. The results show consistent improvement over the baselines for every task and dataset. I formulate human action recognition as an unsupervised domain adaptation (UDA) problem to handle the second problem. In the UDA setting, we have many labeled videos as source data and unlabeled videos as target data. We can use already exist- ing labeled video datasets as source data in this setting. The task is to align the source and target feature distributions so that the learned model can generalize well on the target data. I propose 1) aligning the more important temporal part of each video and 2) encouraging the model to focus on action, not the background scene, to learn domain-invariant action representations. The proposed method is simple and intuitive while achieving state-of-the-art performance without training on a lot of labeled target videos. I relax the unsupervised target data setting to a sparsely labeled target data setting. Then I explore the semi-supervised video action recognition, where we have a lot of labeled videos as source data and sparsely labeled videos as target data. The semi-supervised setting is practical as sometimes we can afford a little bit of cost for labeling target data. I propose multiple video data augmentation methods to inject photometric, geometric, temporal, and scene invariances to the action recognition model in this setting. The resulting method shows favorable performance on the public benchmarks.
Active Learning Under Limited Interaction with Data Labeler
Chen, Si (Virginia Tech, 2021)
Active learning (AL) aims at reducing labeling effort by identifying the most valuable unlabeled data points from a large pool. Traditional AL frameworks have two limitations: First, they perform data selection in a multi-round manner, which is time-consuming and impractical. Second, they usually assume that there are a small amount of labeled data points available in the same domain as the data in the unlabeled pool. In this thesis, we initiate the study of one-round active learning to solve the first issue. We propose DULO, a general framework for one-round setting based on the notion of data utility functions, which map a set of data points to some performance measure of the model trained on the set. We formulate the one-round active learning problem as data utility function maximization. We then propose D²ULO on the basis of DULO as a solution that solves both issues. Specifically, D²ULO leverages the idea of domain adaptation (DA) to train a data utility model on source labeled data. The trained utility model can then be used to select high-utility data in the target domain and at the same time, provide an estimate for the utility of the selected data. Our experiments show that the proposed frameworks achieves better performance compared with state-of-the-art baselines in the same setting. Particularly, D²ULO is applicable to the scenario where the source and target labels have mismatches, which is not supported by the existing works.
The Art of Deep Connection - Towards Natural and Pragmatic Conversational Agent Interactions
Ray, Arijit (Virginia Tech, 2017-07-12)
As research in Artificial Intelligence (AI) advances, it is crucial to focus on having seamless communication between humans and machines in order to effectively accomplish tasks. Smooth human-machine communication requires the machine to be sensible and human-like while interacting with humans, while simultaneously being capable of extracting the maximum information it needs to accomplish the desired task. Since a lot of the tasks required to be solved by machines today involve the understanding of images, training machines to have human-like and effective image-grounded conversations with humans is one important step towards achieving this goal. Although we now have agents that can answer questions asked for images, they are prone to failure from confusing input, and cannot ask clarification questions, in turn, to extract the desired information from humans. Hence, as a first step, we direct our efforts towards making Visual Question Answering agents human-like by making them resilient to confusing inputs that otherwise do not confuse humans. Not only is it crucial for a machine to answer questions reasonably, it should also know how to ask questions sequentially to extract the desired information it needs from a human. Hence, we introduce a novel game called the Visual 20 Questions Game, where a machine tries to figure out a secret image a human has picked by having a natural language conversation with the human. Using deep learning techniques like recurrent neural networks and sequence-to-sequence learning, we demonstrate scalable and reasonable performances on both the tasks.
Automated Movement Assessment in Stroke Rehabilitation
Ahmed, Tamim; Thopalli, Kowshik; Rikakis, Thanassis; Turaga, Pavan; Kelliher, Aisling; Huang, Jia-Bin; Wolf, Steven L. (2021-08-19)
We are developing a system for long term Semi-Automated Rehabilitation At the Home (SARAH) that relies on low-cost and unobtrusive video-based sensing. We present a cyber-human methodology used by the SARAH system for automated assessment of upper extremity stroke rehabilitation at the home. We propose a hierarchical model for automatically segmenting stroke survivor's movements and generating training task performance assessment scores during rehabilitation. The hierarchical model fuses expert therapist knowledge-based approaches with data-driven techniques. The expert knowledge is more observable in the higher layers of the hierarchy (task and segment) and therefore more accessible to algorithms incorporating high level constraints relating to activity structure (i.e., type and order of segments per task). We utilize an HMM and a Decision Tree model to connect these high level priors to data driven analysis. The lower layers (RGB images and raw kinematics) need to be addressed primarily through data driven techniques. We use a transformer based architecture operating on low-level action features (tracking of individual body joints and objects) and a Multi-Stage Temporal Convolutional Network(MS-TCN) operating on raw RGB images. We develop a sequence combining these complimentary algorithms effectively, thus encoding the information from different layers of the movement hierarchy. Through this combination, we produce a robust segmentation and task assessment results on noisy, variable and limited data, which is characteristic of low cost video capture of rehabilitation at the home. Our proposed approach achieves 85% accuracy in per-frame labeling, 99% accuracy in segment classification and 93% accuracy in task completion assessment. Although the methodology proposed in this paper applies to upper extremity rehabilitation using the SARAH system, it can potentially be used, with minor alterations, to assist automation in many other movement rehabilitation contexts (i.e., lower extremity training for neurological accidents).
Color Invariant Skin Segmentation
Xu, Han (Virginia Tech, 2022-03-25)
This work addresses the problem of automatically detecting human skin in images without reliance on color information. Unlike previous methods, we present a new approach that performs well in the absence of such information. A key aspect of the work is that color-space augmentation is applied strategically during the training, with the goal of reducing the influence of features that are based entirely on color and increasing more semantic understanding. The resulting system exhibits a dramatic improvement in performance for images in which color details are diminished. We have demonstrated the concept using the U-Net architecture, and experimental results show improvements in evaluations for all Fitzpatrick skin tones in the ECU dataset. We further tested the system with RFW dataset to show that the proposed method is consistent across different ethnicities and reduces bias to any skin tones. Therefore, this work has strong potential to aid in mitigating bias in automated systems that can be applied to many applications including surveillance and biometrics.
A Comparison of Image Classification with Different Activation Functions in Balanced and Unbalanced Datasets
Zhang, Moqi (Virginia Tech, 2021-06-04)
When the novel coronavirus (COVID-19) outbreak began to ring alarm bells worldwide, rapid, efficient diagnosis was critical to the emergency response. The limited ability of medical systems and the increasing number of daily cases pushed researchers to investigate automated models. The use of deep neural networks to help doctors make the correct diagnosis has dramatically reduced the pressure on the healthcare system. Promoting the improvement of diagnosis networks depends not only on the network structure design but also on the activation function performance. To identify an optimal activation function, this study investigates the correlation between the activation function selection and image classification performance in balanced or imbalanced datasets. Our analysis evaluates various network architectures for both commonly used and novel datasets and presents a comprehensive analysis of ten widely used activation functions. The experimental results show that the swish and softplus functions enhance the classification ability of state-of-the-art networks. Finally, this thesis distinguishes the neural networks using ten activation functions, analyzes their pros and cons, and puts forward detailed suggestions on choosing appropriate activation functions in future work.
A Computer-Aided Framework for Cell Phenotype Identification, Analysis and Classification
Pradeep, Subramanian (Virginia Tech, 2017-09-11)
Cancer is arguably one of the most dangerous diseases and the major causes of death in the modern day. It becomes increasingly harder to treat and cure the disease as it makes progress. Detecting cancer at an early stage can help in preventing it from affecting an organism. However, it is very hard to detect at an early stage. The best possible way to tackle this disease is to first study it at a cellular level. This study aims at identifying various phenotypic traits of these cells in the Dielectrophoresis (DEP) based microfluidic device experimental setup and subsequently classifying the cells from the rest. A general framework for automatic labeling, identifying and classifying the malignant from the dead cells is developed in this work. The framework shows a top-down approach starting from static background subtraction, tracking, automatic labeling, feature extraction and finally classification. The data used in this work are videos of live and dead human prostate cancer (PC-3) cells flowing through the microfluidic device. Previous studies have shown that there are significant differences in morphological attributes between cancerous and non-cancerous cells. We focus mainly on shape, texture and geometry as the prominent attribute in our work and subsequently use them for classification. In this work we obtain good tracking results through optical flow as compared to previous work. For classification, linear classifiers such as logistic regression and linear Support Vector Machine (SVM) showed decent results. The machine learning algorithms use Histogram of Oriented Gradient (HOG) features plus the elliptical features as a combined feature vector. The elliptic features branch out this study to another direction that is useful in calculation of physical properties such as the cell elasticity through video processing and we propose a model for the same for the given setup. Currently, the elasticity of a single cell is calculated using expensive and time consuming procedures such as the atomic force microscopy (AFM). Using our framework, we can potentially obtain elasticity for a batch of cells in much less time. Also, our cell classification algorithm procedure is suitable for real time applications and can be a proposed futuristic concept for selective killing of cells.
Continual Learning for Deep Dense Prediction
Lokegaonkar, Sanket Avinash (Virginia Tech, 2018-06-11)
Transferring a deep learning model from old tasks to a new one is known to suffer from the catastrophic forgetting effects. Such forgetting mechanism is problematic as it does not allow us to accumulate knowledge sequentially and requires retaining and retraining on all the training data. Existing techniques for mitigating the abrupt performance degradation on previously trained tasks are mainly studied in the context of image classification. In this work, we present a simple method to alleviate catastrophic forgetting for pixel-wise dense labeling problems. We build upon the regularization technique using knowledge distillation to minimize the discrepancy between the posterior distribution of pixel class labels for old tasks predicted from 1) the original and 2) the updated networks. This technique, however, might fail in circumstances where the source and target distribution differ significantly. To handle the above scenario, we further propose an improvement to the distillation based approach by adding adaptive l2-regularization depending upon the per-parameter importance to the older tasks. We train our model on FCN8s, but our training can be generalized to stronger models like DeepLab, PSPNet, etc. Through extensive evaluation and comparisons, we show that our technique can incrementally train dense prediction models for novel object classes, different visual domains, and different visual tasks.
Controllable Visual Synthesis
AlBahar, Badour A. Sh A. (Virginia Tech, 2023-06-08)
Computer graphics has become an integral part of various industries such as entertainment (i.e.,films and content creation), fashion (i.e.,virtual try-on), and video games. Computer graphics has evolved tremendously over the past years. It has shown remarkable image generation improvement from low-quality, pixelated images with limited details to highly realistic images with fine details that can often be mistaken for real images. However, the traditional pipeline of rendering an image in computer graphics is complex and time- consuming. The whole process of creating the geometry, material, and textures requires not only time but also significant expertise. In this work, we aim to replace this complex traditional computer graphics pipeline with a simple machine learning model. This machine learning model can synthesize realistic images without requiring expertise or significant time and effort. Specifically, we address the problem of controllable image synthesis. We propose several approaches that allow the user to synthesize realistic content and manipulate images to achieve their desired goals with ease and flexibility.
Data-Efficient Learning in Image Synthesis and Instance Segmentation
Robb, Esther Anne (Virginia Tech, 2021-08-18)
Modern deep learning methods have achieve remarkable performance on a variety of computer vision tasks, but frequently require large, well-balanced training datasets to achieve high-quality results. Data-efficient performance is critical for downstream tasks such as automated driving or facial recognition. We propose two methods of data-efficient learning for the tasks of image synthesis and instance segmentation. We first propose a method of high-quality and diverse image generation from finetuning to only 5-100 images. Our method factors a pretrained model into a small but highly expressive weight space for finetuning, which discourages overfitting in a small training set. We validate our method in a challenging few-shot setting of 5-100 images in the target domain. We show that our method has significant visual quality gains compared with existing GAN adaptation methods. Next, we introduce a simple adaptive instance segmentation loss which achieves state-of-the-art results on the LVIS dataset. We demonstrate that the rare categories are heavily suppressed by textit{correct background predictions}, which reduces the probability for all foreground categories with equal weight. Due to the relative infrequency of rare categories, this leads to an imbalance that biases towards predicting more frequent categories. Based on this insight, we develop DropLoss -- a novel adaptive loss to compensate for this imbalance without a trade-off between rare and frequent categories.
Deep Learning Neural Network-based Sinogram Interpolation for Sparse-View CT Reconstruction
Vekhande, Swapnil Sudhir (Virginia Tech, 2019-06-14)
Computed Tomography (CT) finds applications across domains like medical diagnosis, security screening, and scientific research. In medical imaging, CT allows physicians to diagnose injuries and disease more quickly and accurately than other imaging techniques. However, CT is one of the most significant contributors of radiation dose to the general population and the required radiation dose for scanning could lead to cancer. On the other hand, a shallow radiation dose could sacrifice image quality causing misdiagnosis. To reduce the radiation dose, sparse-view CT, which includes capturing a smaller number of projections, becomes a promising alternative. However, the image reconstructed from linearly interpolated views possesses severe artifacts. Recently, Deep Learning-based methods are increasingly being used to interpret the missing data by learning the nature of the image formation process. The current methods are promising but operate mostly in the image domain presumably due to lack of projection data. Another limitation is the use of simulated data with less sparsity (up to 75%). This research aims to interpolate the missing sparse-view CT in the sinogram domain using deep learning. To this end, a residual U-Net architecture has been trained with patch-wise projection data to minimize Euclidean distance between the ground truth and the interpolated sinogram. The model can generate highly sparse missing projection data. The results show improvement in SSIM and RMSE by 14% and 52% respectively with respect to the linear interpolation-based methods. Thus, experimental sparse-view CT data with 90% sparsity has been successfully interpolated while improving CT image quality.
Degenerate Near-planar Road Surface 3D Reconstruction and Automatic Defects Detection
Hu, Yazhe (Virginia Tech, 2020-06-02)
This dissertation presents an approach to reconstruct degenerate near-planar road surface in three-dimensional (3D) while automatically detect road defects. Three techniques are developed in this dissertation to establish the proposed approach. The first technique is proposed to reconstruct the degenerate near-planar road surface into 3D from one camera. Unlike the traditional Structure from Motion (SfM) technique which has the degeneracy issue for near-planar object 3D reconstruction, the uniqueness of the proposed technique lies in the use of near-planar characteristics of surfaces in the 3D reconstruction process, which solves the degenerate road surface reconstruction problem using only two images. Following the accuracy-enhanced 3D reconstructed road surface, the second technique automatically detects and estimates road surface defects. As the 3D surface is inversely solved from 2D road images, the detection is achieved by jointly identifying irregularities from the 3D road surfaces and the corresponding image information, while clustering road defects and obstacles using a mean-shift algorithm with flat kernel to estimate the depth, size, and location of the defects. To enhance the physics-driven automatic detection reliability, the third technique proposes and incorporates a self-supervised learning structure with data-driven Convolutional Neural Networks (CNN). Different from supervised learning approaches which need labeled training images, the road anomaly detection network is trained by road surface images that are automatically labeled based on the reconstructed 3D surface information. In order to collect clear road surface images on the public road, a road surface monitoring system is designed and integrated for the road surface image capturing and visualization. The proposed approach is evaluated in both simulated environment and through real-world experiments. The parametric study of the proposed approach shows the small error of the 3D road surface reconstruction influenced by different variables such as the image noise, camera orientation, and the vertical movement of the camera in a controlled simulation environment. The comparison with traditional SfM technique and the numerical results of the proposed reconstruction using real-world road surface images then indicate that the proposed approach effectively reconstructs high quality near-planar road surface while automatically detects road defects with high precision, accuracy, and recall rates without the degenerate issue.
Distributed, Stable Topology Control of Multi-Robot Systems with Asymmetric Interactions
Mukherjee, Pratik (Virginia Tech, 2021-06-17)
Multi-robot systems have recently witnessed a swell in interest in the past few years because of their various applications such as agricultural autonomy, medical robotics, industrial and commercial automation and, search and rescue. In this thesis, we particularly investigate the behavior of multi-robot systems with respect to stable topology control in asymmetric interaction settings. From theoretical perspective, we first classify stable topologies, and identify the conditions under which we can determine whether a topology is stable or not. Then, we design a limited fields-of-view (FOV) controller for robots that use sensors like cameras for coordination which induce asymmetric robot to robot interactions. Finally, we conduct a rigorous theoretical analysis to qualitatively determine which interactions are suitable for stable directed topology control of multi-robot systems with asymmetric interactions. In this regard, we solve an optimal topology selection problem to determine the topology with the best interactions based on a suitable metric that represents the quality of interaction. Further, we solve this optimal problem distributively and validate the distributed optimization formulation with extensive simulations. For experimental purposes, we developed a portable multi-robot testbed which enables us to conduct multi-robot topology control experiments in both indoor and outdoor settings and validate our theoretical findings. Therefore, the contribution of this thesis is two fold: i) We provide rigorous theoretical analysis of stable coordination of multi-robot systems with directed graphs, demonstrating the graph structures that induce stability for a broad class of coordination objectives; ii) We develop a testbed that enables validating multi-robot topology control in both indoor and outdoor settings.
Efficient Community Detection for Large Scale Networks via Sub-sampling
Bellam, Venkata Pavan Kumar (Virginia Tech, 2018-01-18)
Many real-world systems can be represented as network-graphs. Some of the networks have an inherent community structure based on interactions. The problem of identifying this grouping structure given a graph is termed as community detection problem which has certain existing algorithms. This thesis contributes by providing specific improvements to various community detection algorithms such as spectral clustering and extreme point algorithm. One of the main contributions is proposing a new sub-sampling method to make existing spectral clustering method scalable by reducing the computational complexity. Also, we have implemented extreme points algorithm for a general multiple communities detection case along with a sub-sampling based version to reduce the computational complexity. We have also developed spectral clustering algorithm for popularity-adjusted block model (PABM) model based graphs to make the algorithm exact thus improving its accuracy.
End-To-End Text Detection Using Deep Learning
Ibrahim, Ahmed Sobhy Elnady (Virginia Tech, 2017-12-19)
Text detection in the wild is the problem of locating text in images of everyday scenes. It is a challenging problem due to the complexity of everyday scenes. This problem possesses a great importance for many trending applications, such as self-driving cars. Previous research in text detection has been dominated by multi-stage sequential approaches which suffer from many limitations including error propagation from one stage to the next. Another line of work is the use of deep learning techniques. Some of the deep methods used for text detection are box detection models and fully convolutional models. Box detection models suffer from the nature of the annotations, which may be too coarse to provide detailed supervision. Fully convolutional models learn to generate pixel-wise maps that represent the location of text instances in the input image. These models suffer from the inability to create accurate word level annotations without heavy post processing. To overcome these aforementioned problems we propose a novel end-to-end system based on a mix of novel deep learning techniques. The proposed system consists of an attention model, based on a new deep architecture proposed in this dissertation, followed by a deep network based on Faster-RCNN. The attention model produces a high-resolution map that indicates likely locations of text instances. A novel aspect of the system is an early fusion step that merges the attention map directly with the input image prior to word-box prediction. This approach suppresses but does not eliminate contextual information from consideration. Progressively larger models were trained in 3 separate phases. The resulting system has demonstrated an ability to detect text under difficult conditions related to illumination, resolution, and legibility. The system has exceeded the state of the art on the ICDAR 2013 and COCO-Text benchmarks with F-measure values of 0.875 and 0.533, respectively.
Enhanced Neural Network Training Using Selective Backpropagation and Forward Propagation
Bendelac, Shiri (Virginia Tech, 2018-06-22)
Neural networks are making headlines every day as the tool of the future, powering artificial intelligence programs and supporting technologies never seen before. However, the training of neural networks can take days or even weeks for bigger networks, and requires the use of super computers and GPUs in academia and industry in order to achieve state of the art results. This thesis discusses employing selective measures to determine when to backpropagate and forward propagate in order to reduce training time while maintaining classification performance. This thesis tests these new algorithms on the MNIST and CASIA datasets, and achieves successful results with both algorithms on the two datasets. The selective backpropagation algorithm shows a reduction of up to 93.3% of backpropagations completed, and the selective forward propagation algorithm shows a reduction of up to 72.90% in forward propagations and backpropagations completed compared to baseline runs of always forward propagating and backpropagating. This work also discusses employing the selective backpropagation algorithm on a modified dataset with disproportional under-representation of some classes compared to others.
Exploring Accumulated Gradient-Based Quantization and Compression for Deep Neural Networks
Gaopande, Meghana Laxmidhar (Virginia Tech, 2020-05-29)
The growing complexity of neural networks makes their deployment on resource-constrained embedded or mobile devices challenging. With millions of weights and biases, modern deep neural networks can be computationally intensive, with large memory, power and computational requirements. In this thesis, we devise and explore three quantization methods (post-training, in-training and combined quantization) that quantize 32-bit floating-point weights and biases to lower bit width fixed-point parameters while also achieving significant pruning, leading to model compression. We use the total accumulated absolute gradient over the training process as the indicator of importance of a parameter to the network. The most important parameters are quantized by the smallest amount. The post-training quantization method sorts and clusters the accumulated gradients of the full parameter set and subsequently assigns a bit width to each cluster. The in-training quantization method sorts and divides the accumulated gradients into two groups after each training epoch. The larger group consisting of the lowest accumulated gradients is quantized. The combined quantization method performs in-training quantization followed by post-training quantization. We assume storage of the quantized parameters using compressed sparse row format for sparse matrix storage. On LeNet-300-100 (MNIST dataset), LeNet-5 (MNIST dataset), AlexNet (CIFAR-10 dataset) and VGG-16 (CIFAR-10 dataset), post-training quantization achieves 7.62x, 10.87x, 6.39x and 12.43x compression, in-training quantization achieves 22.08x, 21.05x, 7.95x and 12.71x compression and combined quantization achieves 57.22x, 50.19x, 13.15x and 13.53x compression, respectively. Our methods quantize at the cost of accuracy, and we present our work in the light of the accuracy-compression trade-off.
Few-Shot and Zero-Shot Learning for Information Extraction
Gong, Jiaying (Virginia Tech, 2024-05-31)
Information extraction aims to automatically extract structured information from unstructured texts. Supervised information extraction requires large quantities of labeled training data, which is time-consuming and labor-intensive. This dissertation focuses on information extraction, especially relation extraction and attribute-value extraction in e-commerce, with few labeled (few-shot learning) or even no labeled (zero-shot learning) training data. We explore multi-source auxiliary information and novel learning techniques to integrate semantic auxiliary information with the input text to improve few-shot learning and zero-shot learning. For zero-shot and few-shot relation extraction, the first method explores the existing data statistics and leverages auxiliary information including labels, synonyms of labels, keywords, and hypernyms of name entities to enable zero-shot learning for the unlabeled data. We build an automatic hypernym extraction framework to help acquire hypernyms of different entities directly from the web. The second method explores the relations between seen classes and new classes. We propose a prompt-based model with semantic knowledge augmentation to recognize new relation triplets under the zero-shot setting. In this method, we transform the problem of zero-shot learning into supervised learning with the generated augmented data for new relations. We design the prompts for training using the auxiliary information based on an external knowledge graph to integrate semantic knowledge learned from seen relations. The third work utilizes auxiliary information from images to enhance few-shot learning. We propose a multi-modal few-shot relation extraction model that leverages both textual and visual semantic information to learn a multi-modal representation jointly. To supplement the missing contexts in text, this work integrates both local features (object-level) and global features (pixel-level) from different modalities through image-guided attention, object-guided attention, and hybrid feature attention to solve the problem of sparsity and noise. We then explore the few-shot and zero-shot aspect (attribute-value) extraction in the e-commerce application field. The first work studies the multi-label few-shot learning by leveraging the auxiliary information of anchor (label) and category description based on the prototypical networks, where the hybrid attention helps alleviate ambiguity and capture more informative semantics by calculating both the label-relevant and query-related weights. A dynamic threshold is learned by integrating the semantic information from support and query sets to achieve multi-label inference. The second work explores multi-label zero-shot learning via semi-inductive link prediction of the heterogeneous hypergraph. The heterogeneous hypergraph is built with higher-order relations (generated by the auxiliary information of user behavior data and product inventory data) to capture the complex and interconnected relations between users and the products.
Foundations of Radio Frequency Transfer Learning
Wong, Lauren Joy (Virginia Tech, 2024-02-06)
The introduction of Machine Learning (ML) and Deep Learning (DL) techniques into modern radio communications system, a field known as Radio Frequency Machine Learning (RFML), has the potential to provide increased performance and flexibility when compared to traditional signal processing techniques and has broad utility in both the commercial and defense sectors. Existing RFML systems predominately utilize supervised learning solutions in which the training process is performed offline, before deployment, and the learned model remains fixed once deployed. The inflexibility of these systems means that, while they are appropriate for the conditions assumed during offline training, they show limited adaptability to changes in the propagation environment and transmitter/receiver hardware, leading to significant performance degradation. Given the fluidity of modern communication environments, this rigidness has limited the widespread adoption of RFML solutions to date. Transfer Learning (TL) is a means to mitigate such performance degradations by re-using prior knowledge learned from a source domain and task to improve performance on a "similar" target domain and task. However, the benefits of TL have yet to be fully demonstrated and integrated into RFML systems. This dissertation begins by clearly defining the problem space of RF TL through a domain-specific TL taxonomy for RFML that provides common language and terminology with concrete and Radio Frequency (RF)-specific example use- cases. Then, the impacts of the RF domain, characterized by the hardware and channel environment(s), and task, characterized by the application(s) being addressed, on performance are studied, and methods and metrics for predicting and quantifying RF TL performance are examined. In total, this work provides the foundational knowledge to more reliably use TL approaches in RF contexts and opens directions for future work that will improve the robustness and increase the deployability of RFML.

Browsing by Author "Huang, Jia-Bin"

Results Per Page

Sort Options