Novel Algorithms for Understanding Online Reviews
MetadataShow full item record
This dissertation focuses on the review understanding problem, which has gained attention from both industry and academia, and has found applications in many downstream tasks, such as recommendation, information retrieval and review summarization. In this dissertation, we aim to develop machine learning and natural language processing tools to understand and learn structured knowledge from unstructured reviews, which can be investigated in three research directions, including understanding review corpora, understanding review documents, and understanding review segments. For the corpus-level review understanding, we have focused on discovering knowledge from corpora that consist of short texts. Since they have limited contextual information, automatically learning topics from them remains a challenging problem. We propose a semantics-assisted non-negative matrix factorization model to deal with this problem. It effectively incorporates the word-context semantic correlations into the model, where the semantic relationships between the words and their contexts are learned from the skip-gram view of a corpus. We conduct extensive sets of experiments on several short text corpora to demonstrate the proposed model can discover meaningful and coherent topics. For document-level review understanding, we have focused on building interpretable and reliable models for the document-level multi-aspect sentiment analysis (DMSA) task, which can help us to not only recover missing aspect-level ratings and analyze sentiment of customers, but also detect aspect and opinion terms from reviews. We conduct three studies in this research direction. In the first study, we collect a new DMSA dataset in the healthcare domain and systematically investigate reviews in this dataset, including a comprehensive statistical analysis and topic modeling to discover aspects. We also propose a multi-task learning framework with self-attention networks to predict sentiment and ratings for given aspects. In the second study, we propose corpus-level and concept-based explanation methods to interpret attention-based deep learning models for text classification, including sentiment classification. The proposed corpus-level explanation approach aims to capture causal relationships between keywords and model predictions via learning importance of keywords for predicted labels across a training corpus based on attention weights. We also propose a concept-based explanation method that can automatically learn higher level concepts and their importance to model predictions. We apply these methods to the classification task and show that they are powerful in extracting semantically meaningful keywords and concepts, and explaining model predictions. In the third study, we propose an interpretable and uncertainty aware multi-task learning framework for DMSA, which can achieve competitive performance while also being able to interpret the predictions made. Based on the corpus-level explanation method, we propose an attention-driven keywords ranking method, which can automatically discover aspect terms and aspect-level opinion terms from a review corpus using the attention weights. In addition, we propose a lecture-audience strategy to estimate model uncertainty in the context of multi-task learning. For the segment-level review understanding, we have focused on the unsupervised aspect detection task, which aims to automatically extract interpretable aspects and identify aspect-specific segments from online reviews. The existing deep learning-based topic models suffer from several problems such as extracting noisy aspects and poorly mapping aspects discovered by models to the aspects of interest. To deal with these problems, we propose a self-supervised contrastive learning framework in order to learn better representations for aspects and review segments. We also introduce a high-resolution selective mapping method to efficiently assign aspects discovered by the model to the aspects of interest. In addition, we propose using a knowledge distillation technique to further improve the aspect detection performance.
General Audience Abstract
Nowadays, online reviews are playing an important role in our daily lives. They are also critical to the success of many e-commerce and local businesses because they can help people build trust in brands and businesses, provide insights into products and services, and improve consumers' confidence. As a large number of reviews accumulate every day, a central research problem is to build an artificial intelligence system that can understand and interact with these reviews, and further use them to offer customers better support and services. In order to tackle challenges in these applications, we first have to get an in-depth understanding of online reviews. In this dissertation, we focus on the review understanding problem and develop machine learning and natural language processing tools to understand reviews and learn structured knowledge from unstructured reviews. We have addressed the review understanding problem in three directions, including understanding a collection of reviews, understanding a single review, and understanding a piece of a review segment. In the first direction, we proposed a short-text topic modeling method to extract topics from review corpora that consist of primary complaints of consumers. In the second direction, we focused on building sentiment analysis models to predict the opinions of consumers from their reviews. Our deep learning models can provide good prediction accuracy as well as a human-understandable explanation for the prediction. In the third direction, we develop an aspect detection method to automatically extract sentences that mention certain features consumers are interested in, from reviews, which can help customers efficiently navigate through reviews and help businesses identify the advantages and disadvantages of their products.
- Doctoral Dissertations