Browsing by Author "Wang, Ping"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- Automatic Question Answering and Knowledge Discovery from Electronic Health RecordsWang, Ping (Virginia Tech, 2021-08-25)Electronic Health Records (EHR) data contain comprehensive longitudinal patient information, which is usually stored in databases in the form of either multi-relational structured tables or unstructured texts, e.g., clinical notes. EHR provides a useful resource to assist doctors' decision making, however, they also present many unique challenges that limit the efficient use of the valuable information, such as large data volume, heterogeneous and dynamic information, medical term abbreviations, and noisy nature caused by misspelled words. This dissertation focuses on the development and evaluation of advanced machine learning algorithms to solve the following research questions: (1) How to seek answers from EHR for clinical activity related questions posed in human language without the assistance of database and natural language processing (NLP) domain experts, (2) How to discover underlying relationships of different events and entities in structured tabular EHRs, and (3) How to predict when a medical event will occur and estimate its probability based on previous medical information of patients. First, to automatically retrieve answers for natural language questions from the structured tables in EHR, we study the question-to-SQL generation task by generating the corresponding SQL query of the input question. We propose a translation-edit model driven by a language generation module and an editing module for the SQL query generation task. This model helps automatically translate clinical activity related questions to SQL queries, so that the doctors only need to provide their questions in natural language to get the answers they need. We also create a large-scale dataset for question answering on tabular EHR to simulate a more realistic setting. Our performance evaluation shows that the proposed model is effective in handling the unique challenges about clinical terminologies, such as abbreviations and misspelled words. Second, to automatically identify answers for natural language questions from unstructured clinical notes in EHR, we propose to achieve this goal by querying a knowledge base constructed based on fine-grained document-level expert annotations of clinical records for various NLP tasks. We first create a dataset for clinical knowledge base question answering with two sets: clinical knowledge base and question-answer pairs. An attention-based aspect-level reasoning model is developed and evaluated on the new dataset. Our experimental analysis shows that it is effective in identifying answers and also allows us to analyze the impact of different answer aspects in predicting correct answers. Third, we focus on discovering underlying relationships of different entities (e.g., patient, disease, medication, and treatment) in tabular EHR, which can be formulated as a link prediction problem in graph domain. We develop a self-supervised learning framework for better representation learning of entities across a large corpus and also consider local contextual information for the down-stream link prediction task. We demonstrate the effectiveness, interpretability, and scalability of the proposed model on the healthcare network built from tabular EHR. It is also successfully applied to solve link prediction problems in a variety of domains, such as e-commerce, social networks, and academic networks. Finally, to dynamically predict the occurrence of multiple correlated medical events, we formulate the problem as a temporal (multiple time-points) and multi-task learning problem using tensor representation. We propose an algorithm to jointly and dynamically predict several survival problems at each time point and optimize it with the Alternating Direction Methods of Multipliers (ADMM) algorithm. The model allows us to consider both the dependencies between different tasks and the correlations of each task at different time points. We evaluate the proposed model on two real-world applications and demonstrate its effectiveness and interpretability.
- Geology and Tectonic Significance of the Late Precambrian Eastern Blue Ridge Cover Sequence in Central VirginiaWang, Ping (Virginia Tech, 1991)The Late Precambrian cover sequence in the Blue Ridge of central Virginia includes rocks of the Moneta Formation and the overlying Lynchburg Group. The Moneta Formation comprises arnphibolites, felsites and biotite gneisses that unconformably overlie the Grenville basement. The Lynchburg Group in central Virginia is divided into three formations. Lynchburg I is made up of massive to thick bedded coarse-grained feldspathic arenites and conglomerates, which are interpreted as slope-apron deposits. Lynchburg IT contains mainly medium to fine grained feldspathic arenites and graphitic schist (black shales) with subordinate conglomeratic rocks. These are believed to be channelized submarine fan turbidites formed in an anoxic environment. Lynchburg ill consists of fine to medium grained feldspathic quartz arenites and a minor amount of conglomeratic rocks, which are considered to be channelized submarine turbidites with a more open marine environment and wider shelf. Three metamorphic facies and two deformation events are recognized in the cover sequence of the study area. The current tectonic models tend to view most of the mafic-ultramafic rocks and the host sedimentary rocks of the Lynchburg as ophiolitic melange, thus creating a suture, of Precambrian to Ordovician age. Detailed field mapping shows that the Lynchburg Group does not have the characteristics of melange and the mafic-ultramafic rocks in it do not resemble ophiolite. Rather, the cover sequence is related to the Late Precambrian Iapetan rifting event. Some tectonomagmatic discriminant diagrams have been used to support the current tectonic model and they are considered one of the most important arguments for ophiolites. These diagrams were tested by plotting samples from Jurassic rift basalts-diabases of eastern North America (ENA). The ENA samples, as well as the post Grenville mafic rocks in the Blue Ridge, tend to plot outside the within-plate field. It is clear that geochemical data alone may give a wrong tectonic classification, and that a knowledge of field relations is of paramount importance for interpretation.
- Studies of lepton and quark interactionsWang, Ping (Virginia Polytechnic Institute and State University, 1985)Part I Proposed Experimental Tests of the Right-handed Weak Current All possible experiments which test the SU(2)L x U(1)R x U(1)B-L model and SU(2)L x SU(2)R x U(1)B-L model using the LEP e⁺e⁻ collider and HERA e⁻p collider are calculated and the most sensitive experiments are examined. Part II Semi-Phenomenological Theory of (Qq̅) System The (QQ̅) and (Qq̅) mesons are calculated using a QCD motivated potential model. It is discovered that by including a long distance relativistic correction term derived by Grome, the Coulomb + Linear potential works not only for c and b quarks, but s quark as well. The leptonic decay constants of various (Qq̅) mesons together with their masses are predicted. The topponium states are also discussed.
- Tensor-Based Temporal Multi-Task Survival AnalysisWang, Ping; Shi, Tian; Reddy, Chandan K. (IEEE, 2021-09-01)Survival analysis aims at predicting the time to event of interest along with its probability on longitudinal data. It is commonly used to make predictions for a single specific event of interest at a given time point. However, predicting the occurrence of multiple events of interest simultaneously and dynamically is needed in many real-world applications. An intuitive way to solve this problem is to simply apply the standard survival analysis method independently to each prediction task at each time point. However, it often leads to a sub-optimal solution since the underlying dependencies between these tasks are ignored. This motivates us to analyze these prediction tasks jointly in order to select the common features shared across all the tasks. In this paper, we formulate a temporal (Multiple Time points) Multi-Task learning framework (MTMT) for survival analysis problems using tensor representation. More specifically, given a survival dataset and a sequence of time points, which are considered as the monitored time points for the events of interest, we reformulate the survival analysis problem to jointly handle each task at each time point and optimize them simultaneously. We demonstrate the performance of the proposed MTMT model on important real-world datasets, including employee attrition and medical records. We show the superior performance of the MTMT model compared to several state-of-the-art models using standard metrics. We also provide the list of important features selected by our MTMT model thus demonstrating the interpretability of the proposed model.
- Text-to-ESQ: A Two-Stage Controllable Approach for Efficient Retrieval of Vaccine Adverse Events from NoSQL DatabaseZhang, Wenlong; Zeng, Kangping; Yang, Xinming; Shi, Tian; Wang, Ping (ACM, 2023-09-03)The Vaccine Adverse Event Reporting System (VAERS) contains detailed reports of adverse events following vaccine administration. However, efficiently and accurately searching for specific information from VAERS poses significant challenges, especially for medical experts. Natural language querying (NLQ) methods tackle the challenge by translating the input questions into executable queries, allowing for the exploration of complex databases with large amounts of information. Most existing studies focus on the relational database and solve the Text-to-SQL task. However, the capability of full-text for Text-to-SQL is greatly limited by the data structures and functionality of the SQL databases. In addition, the potential of natural language querying has not been comprehensively explored in the healthcare domain. To overcome these limitations, we investigate the potential of NoSQL databases, specifically Elasticsearch, and forge a new research direction for NLQ, which we refer to as Text-to-ESQ generation. This exploration requires us to re-design various aspects of NLQ, such as the target application and the advantages of NoSQL database. In our approach, we develop a two-stage controllable (TSC) framework consisting of a question-to-question (Q2Q) translation module and an ESQ condition extraction (ECE) module. These modules are carefully designed to efficiently retrieve information from the VEARS data stored in a NoSQL database. Additionally, we construct a dedicated question-ESQ pair dataset called VAERSESQ, to support the task in the healthcare domain. Extensive experiments were conducted on the VAERSESQ dataset to evaluate the proposed methods. The results, both quantitative and qualitative, demonstrate the accuracy and efficiency of our approach in generating queries for NoSQL databases, thus enabling efficient retrieval of VEARS data.