Automated extraction of product feedback from online reviews: Improving efficiency, value, and total yield

TR Number
Date
2019-04-25
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Tech
Abstract

In recent years, the expansion of online media has presented firms with rich and voluminous new datasets with profound business applications. Among these, online reviews provide nuanced details on consumers' interactions with products. Analysis of these reviews has enormous potential, but the enormity of the data and the nature of unstructured text make mining these insights challenging and time-consuming. This paper presents three studies examining this problem and suggesting techniques for automated extraction of vital insights.

The first study examines the problem of identifying mentions of safety hazards in online reviews. Discussions of hazards may have profound importance for firms and regulators as they seek to protect consumers. However, as most online reviews do not pertain to safety hazards, identifying this small portion of reviews is a challenging problem. Much of the literature in this domain focuses on selecting "smoke terms," or specific words and phrases closely associated with the mentions of safety hazards. We first examine and evaluate prior techniques to identify these reviews, which incorporate substantial human opinion in curating smoke terms and thus vary in their effectiveness. We propose a new automated method that utilizes a heuristic to curate smoke terms, and we find that this method is far more efficient than the human-driven techniques. Finally, we incorporate consumers' star ratings in our analysis, further improving prediction of safety hazard-related discussions.

The second study examines the identification of consumer-sourced innovation ideas and opportunities from online reviews. We build upon a widely-accepted attribute mapping framework from the entrepreneurship literature for evaluating and comparing product attributes. We first adapt this framework for use in the analysis of online reviews. Then, we develop analytical techniques based on smoke terms for automated identification of innovation opportunities mentioned in online reviews. These techniques can be used to profile products as to attributes that affect or have the potential to affect their competitive standing. In collaboration with a large countertop appliances manufacturer, we assess and validate the usefulness of these suggestions, tying together the theoretical value of the attribute mapping framework and the practical value of identifying innovation-related discussions in online reviews.

The third study addresses safety hazard monitoring for use cases in which a higher yield of safety hazards detected is desirable. We note a trade-off between the efficiency of hazard techniques described in the first study and the depth of such techniques, as a high proportion of identified records refer to true hazards, but several important hazards may be undetected. We suggest several techniques for handling this trade-off, including alternate objective functions for heuristics and fuzzy term matching, which improve the total yield. We examine the efficacy of each of these techniques and contrast their merits with past techniques. Finally, we test the capability of these methods to generalize to online reviews across different product categories.

Description
Keywords
text analytics, online reviews, business intelligence, heuristics, classification
Citation