Browsing by Author "Eldardiry, Hoda Mohamed"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
- Comparative Analysis of Machine Learning Models for ERCOT Short Term Load ForecastingSingh, Gurkirat (Virginia Tech, 2025-01-29)This study investigates the efficacy of various machine learning (ML) and deep learning (DL) models for short-term load forecasting (STLF) in the Electric Reliability Council of Texas (ERCOT) grid. A dual comparative approach is employed, evaluating models based on temporal features alone as well as in combination with actual and forecasted weather variables. The research emphasizes region-specific forecasting by capturing heterogeneous load patterns for ERCOT's individual weather zones and aggregating them to predict total load. Model evaluation is conducted using accuracy and bias metrics, with particular attention to high-demand months and peak load hours. The findings reveal that Generalized Additive Models (GAM) consistently outperform other models, most importantly during summer months and peak load hours.
- Few-Shot and Zero-Shot Learning for Information ExtractionGong, Jiaying (Virginia Tech, 2024-05-31)Information extraction aims to automatically extract structured information from unstructured texts. Supervised information extraction requires large quantities of labeled training data, which is time-consuming and labor-intensive. This dissertation focuses on information extraction, especially relation extraction and attribute-value extraction in e-commerce, with few labeled (few-shot learning) or even no labeled (zero-shot learning) training data. We explore multi-source auxiliary information and novel learning techniques to integrate semantic auxiliary information with the input text to improve few-shot learning and zero-shot learning. For zero-shot and few-shot relation extraction, the first method explores the existing data statistics and leverages auxiliary information including labels, synonyms of labels, keywords, and hypernyms of name entities to enable zero-shot learning for the unlabeled data. We build an automatic hypernym extraction framework to help acquire hypernyms of different entities directly from the web. The second method explores the relations between seen classes and new classes. We propose a prompt-based model with semantic knowledge augmentation to recognize new relation triplets under the zero-shot setting. In this method, we transform the problem of zero-shot learning into supervised learning with the generated augmented data for new relations. We design the prompts for training using the auxiliary information based on an external knowledge graph to integrate semantic knowledge learned from seen relations. The third work utilizes auxiliary information from images to enhance few-shot learning. We propose a multi-modal few-shot relation extraction model that leverages both textual and visual semantic information to learn a multi-modal representation jointly. To supplement the missing contexts in text, this work integrates both local features (object-level) and global features (pixel-level) from different modalities through image-guided attention, object-guided attention, and hybrid feature attention to solve the problem of sparsity and noise. We then explore the few-shot and zero-shot aspect (attribute-value) extraction in the e-commerce application field. The first work studies the multi-label few-shot learning by leveraging the auxiliary information of anchor (label) and category description based on the prototypical networks, where the hybrid attention helps alleviate ambiguity and capture more informative semantics by calculating both the label-relevant and query-related weights. A dynamic threshold is learned by integrating the semantic information from support and query sets to achieve multi-label inference. The second work explores multi-label zero-shot learning via semi-inductive link prediction of the heterogeneous hypergraph. The heterogeneous hypergraph is built with higher-order relations (generated by the auxiliary information of user behavior data and product inventory data) to capture the complex and interconnected relations between users and the products.
- Graph-based Time-series Forecasting in Deep LearningChen, Hongjie (Virginia Tech, 2024-04-02)Time-series forecasting has long been studied and remains an important research task. In scenarios where multiple time series need to be forecast, approaches that exploit the mutual impact between time series results in more accurate forecasts. This has been demonstrated in various applications, including demand forecasting and traffic forecasting, among others. Hence, this dissertation focuses on graph-based models, which leverage the internode relations to forecast more efficiently and effectively by associating time series with nodes. This dissertation begins by introducing the notion of graph time-series models in a comprehensive survey of related models. The main contributions of this survey are: (1) A novel categorization is proposed to thoroughly analyze over 20 representative graph time-series models from various perspectives, including temporal components, propagation procedures, and graph construction methods, among others. (2) Similarities and differences among models are discussed to provide a fundamental understanding of decisive factors in graph time-series models. Model challenges and future directions are also discussed. Following the survey, this dissertation develops graph time-series models that utilize complex time-series interactions to yield context-aware, real-time, and probabilistic forecasting. The first method, Context Integrated Graph Neural Network (CIGNN), targets resource forecasting with contextual data. Previous solutions either neglect contextual data or only leverage static features, which fail to exploit contextual information. Its main contributions include: (1) Integrating multiple contextual graphs; and (2) Introducing and incorporating temporal, spatial, relational, and contextual dependencies; The second method, Evolving Super Graph Neural Network (ESGNN), targets large-scale time-series datasets through training on super graphs. Most graph time-series models let each node associate with a time series, potentially resulting in a high time cost. Its main contributions include: (1) Generating multiple super graphs to reflect node dynamics at different periods; and (2) Proposing an efficient super graph construction method based on K-Means and LSH; The third method, Probabilistic Hypergraph Recurrent Neural Network (PHRNN), targets datasets under the assumption that nodes interact in a simultaneous broadcasting manner. Previous hypergraph approaches leverage a static weight hypergraph, which fails to capture the interaction dynamics among nodes. Its main contributions include: (1) Learning a probabilistic hypergraph structure from the time series; and (2) Proposing the use of a KNN hypergraph for hypergraph initialization and regularization. The last method, Graph Deep Factors (GraphDF), aims at efficient and effective probabilistic forecasting. Previous probabilistic approaches neglect the interrelations between time series. Its main contributions include: (1) Proposing a framework that consists of a relational global component and a relational local component; (2) Conducting analysis in terms of accuracy, efficiency, scalability, and simulation with opportunistic scheduling. (3) Designing an algorithm for incremental online learning.
- Improving Text Classification Using Graph-based MethodsKarajeh, Ola Abdel-Raheem Mohammed (Virginia Tech, 2024-06-05)Text classification is a fundamental natural language processing task. However, in real-world applications, class distributions are usually skewed, e.g., due to inherent class imbalance. In addition, the task difficulty changes based on the underlying language. When rich morphological structure and high ambiguity are exhibited, natural language understanding can become challenging. For example, Arabic, ranked the fifth most widely used language, has a rich morphological structure and high ambiguity that result from Arabic orthography. Thus, Arabic natural language processing is challenging. Several studies employ Long Short- Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs), but Graph Convolutional Networks (GCNs) have not yet been investigated for the task. Sequence- based models can successfully capture semantics in local consecutive text sequences. On the other hand, graph-based models can preserve global co-occurrences that capture non- consecutive and long-distance semantics. A text representation approach that combines local and global information can enhance performance in practical class imbalance text classification scenarios. Yet, multi-view graph-based text representations have received limited attention. In this research, first we introduce Multi-view Minority Class Text Graph Convolutional Network (MMCT-GCN), a transductive multi-view text classification model that captures textual graph representations for the minority class alongside sequence-based text representations. Experimental results show that MMCT-GCN obtains consistent improvements over baselines. Second, we develop an Arabic Bidirectional Encoder Representations from Transformers (BERT) Graph Convolutional Network (AraBERT-GCN), a hybrid model that combines the large-scale pre-trained models that encode the local context and semantics alongside graph-based features that are capable of extracting the global word co-occurrences in non-consecutive extended semantics by only one or two hops. Experimental results show that AraBERT-GCN outperforms the state-of-the-art (SOTA) on our Arabic text datasets. Finally, we propose an Arabic Multidimensional Edge Graph Convolutional Network (AraMEGraph) designed for text classification that encapsulates richer and context-aware representations of word and phrase relationships, thus mitigating the impact of the complexity and ambiguity of the Arabic language.
- Learning-Based Pareto Optimal Control of Large-Scale Systems with Unknown Slow DynamicsTajik Hesarkuchak, Saeed (Virginia Tech, 2024-06-10)We develop a data-driven approach to Pareto optimal control of large-scale systems, where decision makers know only their local dynamics. Using reinforcement learning, we design a control strategy that optimally balances multiple objectives. The proposed method achieves near-optimal performance and scales well with the total dimension of the system. Experimental results demonstrate the effectiveness of our approach in managing multi-area power systems.
- REFT: Resource-Efficient Federated Training Framework for Heterogeneous and Resource-Constrained EnvironmentsDesai, Humaid Ahmed Habibullah (Virginia Tech, 2023-11-22)Federated Learning (FL) is a sub-domain of machine learning (ML) that enforces privacy by allowing the user's local data to reside on their device. Instead of having users send their personal data to a server where the model resides, FL flips the paradigm and brings the model to the user's device for training. Existing works share model parameters or use distillation principles to address the challenges of data heterogeneity. However, these methods ignore some of the other fundamental challenges in FL: device heterogeneity and communication efficiency. In practice, client devices in FL differ greatly in their computational power and communication resources. This is exacerbated by unbalanced data distribution, resulting in an overall increase in training times and the consumption of more bandwidth. In this work, we present a novel approach for resource-efficient FL called emph{REFT} with variable pruning and knowledge distillation techniques to address the computational and communication challenges faced by resource-constrained devices. Our variable pruning technique is designed to reduce computational overhead and increase resource utilization for clients by adapting the pruning process to their individual computational capabilities. Furthermore, to minimize bandwidth consumption and reduce the number of back-and-forth communications between the clients and the server, we leverage knowledge distillation to create an ensemble of client models and distill their collective knowledge to the server. Our experimental results on image classification tasks demonstrate the effectiveness of our approach in conducting FL in a resource-constrained environment. We achieve this by training Deep Neural Network (DNN) models while optimizing resource utilization at each client. Additionally, our method allows for minimal bandwidth consumption and a diverse range of client architectures while maintaining performance and data privacy.
- Traffic Signal Phase and Timing Prediction: A Machine Learning and Controller Logic Hybrid ApproachEteifa, Seifeldeen Omar (Virginia Tech, 2024-03-14)Green light optimal speed advisory (GLOSA) systems require reliable estimates of signal switching times to improve vehicle energy/fuel efficiency. Deployment of successful infrastructure to vehicle communication requires Signal Phase and Timing (SPaT) messages to be populated with most likely estimates of switching times and confidence levels in these estimates. Obtaining these estimates is difficult for actuated signals where the length of each green indication changes to accommodate varying traffic conditions and pedestrian requests. This dissertation explores the different ways in which predictions can be made for the most likely switching times. Data are gathered from six intersections along the Gallows Road corridor in Northern Virginia. The application of long-short term memory neural networks for obtaining predictions is explored for one of the intersections. Different loss functions are tried for the purpose of prediction and a new loss function is devised. Mean absolute percentage error is found to be the best loss function in the short-term predictions. Mean squared error is the best for long-term predictions and the proposed loss function balances both well. The amount of historical data needed to make a single accurate prediction is assessed. The assessment concludes that the short-term prediction is accurate with only a 3 to 10 second time window in the past as long as the training dataset is large enough. Long term prediction, however, is better with a larger past time window. The robustness of LSTM models to different demand levels is then assessed utilizing the unique scenario created by the COVID-19 pandemic stay-at-home order. The study shows that the models are robust to the changing demands and while regularization does not really affect their robustness, L1 and L2 regularization can improve the overall prediction performance. An ensemble approach is used considering the use of transformers for SPaT prediction for the first time across the six intersections. Transformers are shown to outperform other models including LSTM. The ensemble provides a valuable metric to show the certainty level in each of the predictions through the level of consensus of the models. Finally, a hybrid approach integrating deep learning and controller logic is proposed by predicting actuations separately and using a digital twin to replicate SPaT information. The approach is proven to be the best approach with 58% less mean absolute error than other approaches. Overall, this dissertation provides a holistic methodology for predicting SPaT and the certainty level associated with it tailored to the existing technology and communication needs.