Enhanced Feature Representation in Multi-Modal Learning for Driving Safety Assessment
dc.contributor.author | Shi, Liang | en |
dc.contributor.committeechair | Guo, Feng | en |
dc.contributor.committeemember | Xing, Xin | en |
dc.contributor.committeemember | Deng, Xinwei | en |
dc.contributor.committeemember | Leman, Scott C. | en |
dc.contributor.department | Statistics | en |
dc.date.accessioned | 2024-12-04T09:00:12Z | en |
dc.date.available | 2024-12-04T09:00:12Z | en |
dc.date.issued | 2024-12-03 | en |
dc.description.abstract | This dissertation explores innovative approaches in driving safety through the development of multi-modal learning frameworks that leverage high-frequency, high-resolution driving data and videos to detect safety-critical events (SCEs). The research unfolds across four methodologies, each contributing to advance the field. The introductory chapter sets the stage by outlining the motivations and challenges in driving safety research, highlighting the need for advanced data-driven approaches to improve SCE prediction and detection. The second chapter presents a framework that combines Convolutional Neural Networks (CNN) and Gated Recurrent Units (GRU) with XGBoost. This approach reduces dependency on domain expertise and effectively manages imbalanced crash data, enhancing the accuracy and reliability of SCE detection. In the third chapter, a two-stream network architecture is introduced, integrating optical flow with TimeSFormer with a multi-head attention mechanism. This innovative combination achieves exceptional detection accuracy, demonstrating its potential for applications in driving safety. The fourth chapter focuses on the Dual Swin Transformer framework, which enables concurrent analysis of video and time-series data, this methodology shows effective in processing driving front videos for improved SCE detection. The fifth chapter explores the integration of corporate labels' semantic meaning into a classification model and introduces ScVLM, a hybrid approach that merges supervised learning with contrastive learning techniques to enhance understanding of driving videos and improve event description rationality for Vision-Language Models (VLMs). This chapter addresses existing model limitations by providing a more comprehensive analysis of driving scenarios. This dissertation addresses the challenges of analyzing multimodal data and paves the way for future advancements in autonomous driving and traffic safety management. It underscores the potential of integrating diverse data sources to enhance driving safety. | en |
dc.description.abstractgeneral | This dissertation explores new approaches to enhance driving safety by using advanced learning frameworks that combine video data with high-frequency, high-resolution driving information, introducing innovative techniques to predict and detect critical driving events. The introduction chapter outlines the current challenges in driving safety and emphasizes the potential of data-driven methods to improve predictions and prevent accidents. The second chapter describes a method that uses machine learning models to analyze crash data, reducing the need for expert input and effectively handling data imbalances. This approach improves the accuracy of predicting safety-critical events. The third chapter introduces a two-stream network that processes both sensor data and video frames, achieving high accuracy in detecting safety-related driving incidents. The fourth chapter presents a framework that simultaneously analyzes video and time-series data, validated using a comprehensive driving study dataset. This technique enhances the detection of complex driving scenarios. The fifth chapter introduces a hybrid learning approach that improves understanding of driving videos and event descriptions. By combining different learning techniques, this method addresses limitations in existing models. This work tackles challenges in analyzing multimodal data and sets the stage for future advancements in autonomous driving and traffic safety management. It highlights the potential of integrating diverse data types to create safer driving environments. | en |
dc.description.degree | Doctor of Philosophy | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:41742 | en |
dc.identifier.uri | https://hdl.handle.net/10919/123730 | en |
dc.language.iso | en | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | ulti-Modal Learning | en |
dc.subject | Traffic Safety-Critical Event Detection | en |
dc.subject | Deep Learning in Traffic Analysis | en |
dc.subject | Autonomous Driving Safety | en |
dc.subject | Driving Data Analytics | en |
dc.title | Enhanced Feature Representation in Multi-Modal Learning for Driving Safety Assessment | en |
dc.type | Dissertation | en |
thesis.degree.discipline | Statistics | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | doctoral | en |
thesis.degree.name | Doctor of Philosophy | en |
Files
Original bundle
1 - 1 of 1