SafeRoad

Abstract

Described is a project on the development of the SafeRoad software application; the report provides a reference for future work. This project was completed as a capstone requirement for CS 4624 (Multimedia, Hypertext, and Information Access) at Virginia Tech, guided by the client.

SafeRoad is a software application designed to to analyze the NHTSA vehicle complaint database, determine the most common complaints, and predict recalls based on these complaints. The goal of the application is to make our vehicles and roads safer and prevent the loss of life.

The software has been developed to be used by data analysts for automotive manufacturers and governing agencies such as the NHTSA. The software can be run either preemptively or in response to a complaint or series of complaints regarding an automobile or one of its components. The results of the program can lead to the issuing of a recall before more serious consequences occur.

The project has been developed using Java, database connectivity, and machine learning algorithms. A classifier training set has been created and included with the source code. The final product has proven to predict recalls with an accuracy level that is significantly higher than what was required.

Description
SmartRoad is developed in Java. The NHTSA complaint and recalls databases are imported into a MySQL database. The Java application connects to the MySQL databases using the Java Database Connection (JDBC). Using the JDBC, the Java application sends SQL queries to the databases and determines the most commonly complained about vehicle makes, models, and components. Once these attributes are determined, the most common complaints are compiled into a comma separated value called "CommonComplaints.csv". Upon compilation to the file, all text values are converted to a unique corresponding numeric value for compatibility with the classifiers, and all complaints are initially unclassified. A Hash Map is used to map text to numeric values and back. Once compiled, a classifier is instantiated and is built on the already classified training set. The classifier that is used is the Naive Bayes classifier from the Java Machine Learning (JML) Library. Upon building, the classifier learns what patterns and attributes contribute to a recall. Once built, the classifier then works on "CommonComplaints.csv" and classifies each complaint in the file. All complaints that are classified as a recall are then compiled to "PredictedRecalls.csv" and numerical values are converted back to text for ease of reading. Comma separated value format is used for its exceptional organization and compatibility. Also included in this submission is the final report for the SafeRoad project. Submissions have been made in both PDF and Microsoft Word Document format. The final report contains a more detailed summary of the project as well as a user manual and developers manual. Documentation of the development process, usage instructions, and development information are all available in the report. PowerPoint and PDF files are provided for the final project presentation.
Keywords
Machine learning, Natural Language Processing, Database, NHTSA, vehicle complaint database, automobile recalls, automotive manufacturers, classifier training set, Naive Bayes
Citation