Pose Estimation and 3D Bounding Box Prediction for Autonomous Vehicles Through Lidar and Monocular Camera Sensor Fusion

Wale, Prajakta Nitin

Pose Estimation and 3D Bounding Box Prediction for Autonomous Vehicles Through Lidar and Monocular Camera Sensor Fusion

dc.contributor.author	Wale, Prajakta Nitin	en
dc.contributor.committeechair	Huxtable, Scott T.	en
dc.contributor.committeemember	Taheri, Saied	en
dc.contributor.committeemember	Ahmadian, Mehdi	en
dc.contributor.department	Mechanical Engineering	en
dc.date.accessioned	2024-08-09T08:00:10Z	en
dc.date.available	2024-08-09T08:00:10Z	en
dc.date.issued	2024-08-08	en
dc.description.abstract	This thesis investigates the integration of transfer learning with ResNet-101 and compares its performance with VGG-19 for 3D object detection in autonomous vehicles. ResNet-101 is a deep Convolutional Neural Network with 101 layers and VGG-19 is a one with 19 layers. The research emphasizes the fusion of camera and lidar outputs to enhance the accuracy of 3D bounding box estimation, which is critical in occluded environments. Selecting an appropriate backbone for feature extraction is pivotal for achieving high detection accuracy. To address this challenge, we propose a method leveraging transfer learning with ResNet- 101, pretrained on large-scale image datasets, to improve feature extraction capabilities. The averaging technique is used on output of these sensors to get the final bounding box. The experimental results demonstrate that the ResNet-101 based model outperforms the VGG-19 based model in terms of accuracy and robustness. This study provides valuable insights into the effectiveness of transfer learning and multi-sensor fusion in advancing the innovation in 3D object detection for autonomous driving.	en
dc.description.abstractgeneral	In the realm of computer vision, the quest for more accurate and robust 3D object detection pipelines remains an ongoing pursuit. This thesis investigates advanced techniques to im- prove 3D object detection by comparing two popular deep learning models, ResNet-101 and VGG-19. The study focuses on enhancing detection accuracy by combining the outputs from two distinct methods: one that uses a monocular camera to estimate 3D bounding boxes and another that employs lidar's bird's-eye view (BEV) data, converting it to image-based 3D bounding boxes. This fusion of outputs is critical in environments where objects may be partially obscured. By leveraging transfer learning, a method where models that are pre-trained on bigger datasets are finetuned for certain application, the research shows that ResNet-101 significantly outperforms VGG-19 in terms of accuracy and robustness. The approach involves averaging the outputs from both methods to refine the final 3D bound- ing box estimation. This work highlights the effectiveness of combining different detection methodologies and using advanced machine learning techniques to advance 3D object detec- tion technology.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:41280	en
dc.identifier.uri	https://hdl.handle.net/10919/120894	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Computer Vision	en
dc.subject	Object Detection	en
dc.subject	Bounding Box	en
dc.subject	Deep Learning	en
dc.subject	Feature Extraction	en
dc.subject	Transfer Learning	en
dc.subject	Sensor Fusion	en
dc.title	Pose Estimation and 3D Bounding Box Prediction for Autonomous Vehicles Through Lidar and Monocular Camera Sensor Fusion	en
dc.type	Thesis	en
thesis.degree.discipline	Mechanical Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Wale_P_T_2024.pdf
Size:: 5.22 MB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses