Product Defect Mining

dc.contributor.authorVillaflor, Elizabeth M.en
dc.contributor.authorGolden, Grant D.en
dc.contributor.authorHall, Jack W. W.en
dc.contributor.authorNguyen, Thomasen
dc.contributor.authorPeng, Tianchenen
dc.contributor.authorZhang, Shuaichengen
dc.date.accessioned2017-05-16T21:25:50Zen
dc.date.available2017-05-16T21:25:50Zen
dc.date.issued2017-05-01en
dc.description.abstractThis project is focused on customer reviews on various product defects. The goal of the project is to use machine learning algorithms to train on sets of these customer reviews in order to be able to easily identify the different defect entities within an unseen review. The identification of these entities will be beneficial to customers, product manufacturers, and governments as it will shed light on the most common defects for a certain product, as well as common defects across a class of products. Additionally, it will bring to light common resolutions for defect symptoms, including both correct and incorrect resolutions. This project also aims to make contributions to the opinion mining research community. These goals will be accomplished by breaking the project into three main parts: data collection, data labeling, and classifier training. In the data collection phase, a web crawler will be created to pull customer reviews off of forum sites in order to create new datasets. For data labeling, datasets, both pre-existing and newly created, will be split into sentences and be assigned a defect entity based on the content of the sentence. For example, if a sentence describes a product defect, the sentence will be labeled as a symptom, and so on. Finally, in the classifier training portion of the project, machine learning algorithms will be used to classify unlabeled datasets in order to learn what types of words indicate a certain defect entity. While these are the three main aspects of the project, there are other minor phases and categories of work that will be necessary. One of these sub-phases includes designing the database tables that will be used to store the labeled datasets. Throughout the semester the following was accomplished: the creation of a web crawler, the completion of five new datasets, the labeling of five datasets, and preliminary training results based on the linear SVC algorithm. Additionally, the new datasets and labeled datasets were uploaded into the client’s preexisting database. The new datasets were collected from the Apple Community, Samsung, and Dell forum boards and include product defect reports for both hardware and software products. Based on the labeling results, and quick scans of the collected data, it was found that many defect reports contain contextual information that is not directly related to the description of either a product defect or its corresponding solution. Additionally, it was found that many reports do not include resolutions or the resolution did not actual solve the defect described. The linear SVC algorithm used for classifier training was able to accurately predict the label for a sentence about 80% of the time when training and testing occurred on similar products, i.e. two different car models. However, the accuracy was only about 60% at best when used on two completely different products, i.e. cars vs cellphones. Overall, about 75% of the anticipated work was completed this semester. The work that was completed should provide a good foundation for continued work in the future.en
dc.description.notesDescription of Files: ProductDefectMiningReport (Word Document and PDF) - the final report that outlines the project details, timeline, and process ProductDefectMiningPresentation (PowerPoint Presentation and PDF) - the final presentation that gives a brief overview of the project, project history, deliverables, and lessons learned. ProductDefectMiningFiles (ZIP archive) - includes all source code, scripts, and datasets used and created for this project.en
dc.identifier.urihttp://hdl.handle.net/10919/77675en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsCreative Commons CC0 1.0 Universal Public Domain Dedicationen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/en
dc.subjectProduct Defecten
dc.subjectData Miningen
dc.subjectOpinion Miningen
dc.subjectClassifier Trainingen
dc.subjectMachine Learningen
dc.subjectData Analyticsen
dc.subjectWeb Crawleren
dc.titleProduct Defect Miningen
dc.typeDataseten
dc.typePresentationen
dc.typeReporten
dc.typeSoftwareen

Files

Original bundle
Now showing 1 - 5 of 5
Name:
ProductDefectMiningFiles.zip
Size:
17.7 MB
Format:
Loading...
Thumbnail Image
Name:
ProductDefectMiningReport.pdf
Size:
595.36 KB
Format:
Adobe Portable Document Format
Name:
ProductDefectMiningReport.docx
Size:
445.89 KB
Format:
Microsoft Word XML
Name:
ProductDefectMiningPresentation.pptx
Size:
114.24 KB
Format:
Microsoft Powerpoint XML
Loading...
Thumbnail Image
Name:
ProductDefectMiningPresentation.pdf
Size:
46.52 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: