Browsing by Author "Shi, Tian"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- Novel Algorithms for Understanding Online ReviewsShi, Tian (Virginia Tech, 2021-09-14)This dissertation focuses on the review understanding problem, which has gained attention from both industry and academia, and has found applications in many downstream tasks, such as recommendation, information retrieval and review summarization. In this dissertation, we aim to develop machine learning and natural language processing tools to understand and learn structured knowledge from unstructured reviews, which can be investigated in three research directions, including understanding review corpora, understanding review documents, and understanding review segments. For the corpus-level review understanding, we have focused on discovering knowledge from corpora that consist of short texts. Since they have limited contextual information, automatically learning topics from them remains a challenging problem. We propose a semantics-assisted non-negative matrix factorization model to deal with this problem. It effectively incorporates the word-context semantic correlations into the model, where the semantic relationships between the words and their contexts are learned from the skip-gram view of a corpus. We conduct extensive sets of experiments on several short text corpora to demonstrate the proposed model can discover meaningful and coherent topics. For document-level review understanding, we have focused on building interpretable and reliable models for the document-level multi-aspect sentiment analysis (DMSA) task, which can help us to not only recover missing aspect-level ratings and analyze sentiment of customers, but also detect aspect and opinion terms from reviews. We conduct three studies in this research direction. In the first study, we collect a new DMSA dataset in the healthcare domain and systematically investigate reviews in this dataset, including a comprehensive statistical analysis and topic modeling to discover aspects. We also propose a multi-task learning framework with self-attention networks to predict sentiment and ratings for given aspects. In the second study, we propose corpus-level and concept-based explanation methods to interpret attention-based deep learning models for text classification, including sentiment classification. The proposed corpus-level explanation approach aims to capture causal relationships between keywords and model predictions via learning importance of keywords for predicted labels across a training corpus based on attention weights. We also propose a concept-based explanation method that can automatically learn higher level concepts and their importance to model predictions. We apply these methods to the classification task and show that they are powerful in extracting semantically meaningful keywords and concepts, and explaining model predictions. In the third study, we propose an interpretable and uncertainty aware multi-task learning framework for DMSA, which can achieve competitive performance while also being able to interpret the predictions made. Based on the corpus-level explanation method, we propose an attention-driven keywords ranking method, which can automatically discover aspect terms and aspect-level opinion terms from a review corpus using the attention weights. In addition, we propose a lecture-audience strategy to estimate model uncertainty in the context of multi-task learning. For the segment-level review understanding, we have focused on the unsupervised aspect detection task, which aims to automatically extract interpretable aspects and identify aspect-specific segments from online reviews. The existing deep learning-based topic models suffer from several problems such as extracting noisy aspects and poorly mapping aspects discovered by models to the aspects of interest. To deal with these problems, we propose a self-supervised contrastive learning framework in order to learn better representations for aspects and review segments. We also introduce a high-resolution selective mapping method to efficiently assign aspects discovered by the model to the aspects of interest. In addition, we propose using a knowledge distillation technique to further improve the aspect detection performance.
- Production of films of SiO.sub.2 by chemical vapor deposition(United States Patent and Trademark Office, 1997-01-14)The chemical vapor deposition of hydridospherosiloxane to generate films of SiO.sub.2 at low temperatures on substrates that cannot withstand high temperatures. The chemical vapor deposition process synthesized compounds with the general formula, EQU (HSiO.sub.3/2).sub.n, with n being an even number ranging from 8 to a very large number. More particularly, it relates to the vapor deposition of oligomeric hydrogensilsesquioxanes, henceforth referred to as hydridospherosiloxanes. The hydridospherosiloxanes are used directly in a chemical vapor deposition reactor to generate films of SiO.sub.2 at low temperatures on substrates that cannot withstand high temperatures. Hydridospherosiloxanes and soluble hydrogensilsesquioxane resin are produced having the formula EQU (HSiO.sub.3/2).sub.n, where n is an even integer greater than 8.
- Tensor-Based Temporal Multi-Task Survival AnalysisWang, Ping; Shi, Tian; Reddy, Chandan K. (IEEE, 2021-09-01)Survival analysis aims at predicting the time to event of interest along with its probability on longitudinal data. It is commonly used to make predictions for a single specific event of interest at a given time point. However, predicting the occurrence of multiple events of interest simultaneously and dynamically is needed in many real-world applications. An intuitive way to solve this problem is to simply apply the standard survival analysis method independently to each prediction task at each time point. However, it often leads to a sub-optimal solution since the underlying dependencies between these tasks are ignored. This motivates us to analyze these prediction tasks jointly in order to select the common features shared across all the tasks. In this paper, we formulate a temporal (Multiple Time points) Multi-Task learning framework (MTMT) for survival analysis problems using tensor representation. More specifically, given a survival dataset and a sequence of time points, which are considered as the monitored time points for the events of interest, we reformulate the survival analysis problem to jointly handle each task at each time point and optimize them simultaneously. We demonstrate the performance of the proposed MTMT model on important real-world datasets, including employee attrition and medical records. We show the superior performance of the MTMT model compared to several state-of-the-art models using standard metrics. We also provide the list of important features selected by our MTMT model thus demonstrating the interpretability of the proposed model.
- Text-to-ESQ: A Two-Stage Controllable Approach for Efficient Retrieval of Vaccine Adverse Events from NoSQL DatabaseZhang, Wenlong; Zeng, Kangping; Yang, Xinming; Shi, Tian; Wang, Ping (ACM, 2023-09-03)The Vaccine Adverse Event Reporting System (VAERS) contains detailed reports of adverse events following vaccine administration. However, efficiently and accurately searching for specific information from VAERS poses significant challenges, especially for medical experts. Natural language querying (NLQ) methods tackle the challenge by translating the input questions into executable queries, allowing for the exploration of complex databases with large amounts of information. Most existing studies focus on the relational database and solve the Text-to-SQL task. However, the capability of full-text for Text-to-SQL is greatly limited by the data structures and functionality of the SQL databases. In addition, the potential of natural language querying has not been comprehensively explored in the healthcare domain. To overcome these limitations, we investigate the potential of NoSQL databases, specifically Elasticsearch, and forge a new research direction for NLQ, which we refer to as Text-to-ESQ generation. This exploration requires us to re-design various aspects of NLQ, such as the target application and the advantages of NoSQL database. In our approach, we develop a two-stage controllable (TSC) framework consisting of a question-to-question (Q2Q) translation module and an ESQ condition extraction (ECE) module. These modules are carefully designed to efficiently retrieve information from the VEARS data stored in a NoSQL database. Additionally, we construct a dedicated question-ESQ pair dataset called VAERSESQ, to support the task in the healthcare domain. Extensive experiments were conducted on the VAERSESQ dataset to evaluate the proposed methods. The results, both quantitative and qualitative, demonstrate the accuracy and efficiency of our approach in generating queries for NoSQL databases, thus enabling efficient retrieval of VEARS data.