Pose Estimation and 3D Bounding Box Prediction for Autonomous Vehicles Through Lidar and Monocular Camera Sensor Fusion

dc.contributor.authorWale, Prajakta Nitinen
dc.contributor.committeechairHuxtable, Scott T.en
dc.contributor.committeememberTaheri, Saieden
dc.contributor.committeememberAhmadian, Mehdien
dc.contributor.departmentMechanical Engineeringen
dc.date.accessioned2024-08-09T08:00:10Zen
dc.date.available2024-08-09T08:00:10Zen
dc.date.issued2024-08-08en
dc.description.abstractThis thesis investigates the integration of transfer learning with ResNet-101 and compares its performance with VGG-19 for 3D object detection in autonomous vehicles. ResNet-101 is a deep Convolutional Neural Network with 101 layers and VGG-19 is a one with 19 layers. The research emphasizes the fusion of camera and lidar outputs to enhance the accuracy of 3D bounding box estimation, which is critical in occluded environments. Selecting an appropriate backbone for feature extraction is pivotal for achieving high detection accuracy. To address this challenge, we propose a method leveraging transfer learning with ResNet- 101, pretrained on large-scale image datasets, to improve feature extraction capabilities. The averaging technique is used on output of these sensors to get the final bounding box. The experimental results demonstrate that the ResNet-101 based model outperforms the VGG-19 based model in terms of accuracy and robustness. This study provides valuable insights into the effectiveness of transfer learning and multi-sensor fusion in advancing the innovation in 3D object detection for autonomous driving.en
dc.description.abstractgeneralIn the realm of computer vision, the quest for more accurate and robust 3D object detection pipelines remains an ongoing pursuit. This thesis investigates advanced techniques to im- prove 3D object detection by comparing two popular deep learning models, ResNet-101 and VGG-19. The study focuses on enhancing detection accuracy by combining the outputs from two distinct methods: one that uses a monocular camera to estimate 3D bounding boxes and another that employs lidar's bird's-eye view (BEV) data, converting it to image-based 3D bounding boxes. This fusion of outputs is critical in environments where objects may be partially obscured. By leveraging transfer learning, a method where models that are pre-trained on bigger datasets are finetuned for certain application, the research shows that ResNet-101 significantly outperforms VGG-19 in terms of accuracy and robustness. The approach involves averaging the outputs from both methods to refine the final 3D bound- ing box estimation. This work highlights the effectiveness of combining different detection methodologies and using advanced machine learning techniques to advance 3D object detec- tion technology.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:41280en
dc.identifier.urihttps://hdl.handle.net/10919/120894en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectComputer Visionen
dc.subjectObject Detectionen
dc.subjectBounding Boxen
dc.subjectDeep Learningen
dc.subjectFeature Extractionen
dc.subjectTransfer Learningen
dc.subjectSensor Fusionen
dc.titlePose Estimation and 3D Bounding Box Prediction for Autonomous Vehicles Through Lidar and Monocular Camera Sensor Fusionen
dc.typeThesisen
thesis.degree.disciplineMechanical Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wale_P_T_2024.pdf
Size:
5.22 MB
Format:
Adobe Portable Document Format

Collections