Flavor language in expert reviews versus consumer preferences: An application to expensive American whiskeys

TR Number



Journal Title

Journal ISSN

Volume Title




Treating natural language flavor descriptions as data that can explain or “predict” consumer or market responses to a product, a process called Natural Language Processing or Text Mining, is increasingly common in food research. Text data has high variation in vocabulary usage and which features writers attend to, necessitating large datasets which tend to be from unblinded tastings with limited types of supplemental data. In this study, a random forest model trained on 4300 full-text whiskey reviews identified terms commonly describing higher- or lower-priced whiskeys. Ten terms were selected for a survey of American whiskey consumers. Professional whiskey reviewers commonly describe expensive whiskeys as tasting of “sultanas”, “oak”, “leather”, and “chocolate”. “Corn” and “grassy” are used commonly for inexpensive whiskeys. In contrast, US consumers are more likely to purchase whiskeys with “chocolate” and “caramel” flavor, ranking “corn” near the middle of the 10 terms tested and “tobacco”, “leather”, and “grass” the lowest. This study shows that the flavor terms reviewers use for expensive whiskeys aren’t necessarily most important to consumers, possibly due to bias from unblinded tastings or differences between reviewers and consumers. Predictions based on reviews can also overestimate the negative impact of common or expected flavors (like “corn” or “caramel” in whiskeys). Large correlational studies using convenient text corpora can effectively generate hypotheses or identify vocabulary and follow up surveys or controlled sensory experiments using the population of interest can provide additional insights about the product category and the groups of people interacting with it.



Random Forest, Natural Language Processing, American Whiskey, Flavor, Preference mapping