Enhanced Feature Representation in Multi-Modal Learning for Driving Safety Assessment

Files

TR Number

Date

2024-12-03

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

This dissertation explores innovative approaches in driving safety through the development of multi-modal learning frameworks that leverage high-frequency, high-resolution driving data and videos to detect safety-critical events (SCEs). The research unfolds across four methodologies, each contributing to advance the field. The introductory chapter sets the stage by outlining the motivations and challenges in driving safety research, highlighting the need for advanced data-driven approaches to improve SCE prediction and detection. The second chapter presents a framework that combines Convolutional Neural Networks (CNN) and Gated Recurrent Units (GRU) with XGBoost. This approach reduces dependency on domain expertise and effectively manages imbalanced crash data, enhancing the accuracy and reliability of SCE detection. In the third chapter, a two-stream network architecture is introduced, integrating optical flow with TimeSFormer with a multi-head attention mechanism. This innovative combination achieves exceptional detection accuracy, demonstrating its potential for applications in driving safety. The fourth chapter focuses on the Dual Swin Transformer framework, which enables concurrent analysis of video and time-series data, this methodology shows effective in processing driving front videos for improved SCE detection. The fifth chapter explores the integration of corporate labels' semantic meaning into a classification model and introduces ScVLM, a hybrid approach that merges supervised learning with contrastive learning techniques to enhance understanding of driving videos and improve event description rationality for Vision-Language Models (VLMs). This chapter addresses existing model limitations by providing a more comprehensive analysis of driving scenarios. This dissertation addresses the challenges of analyzing multimodal data and paves the way for future advancements in autonomous driving and traffic safety management. It underscores the potential of integrating diverse data sources to enhance driving safety.

Description

Keywords

ulti-Modal Learning, Traffic Safety-Critical Event Detection, Deep Learning in Traffic Analysis, Autonomous Driving Safety, Driving Data Analytics

Citation