Text-to-ESQ: A Two-Stage Controllable Approach for Efficient Retrieval of Vaccine Adverse Events from NoSQL Database
The Vaccine Adverse Event Reporting System (VAERS) contains detailed reports of adverse events following vaccine administration. However, efficiently and accurately searching for specific information from VAERS poses significant challenges, especially for medical experts. Natural language querying (NLQ) methods tackle the challenge by translating the input questions into executable queries, allowing for the exploration of complex databases with large amounts of information. Most existing studies focus on the relational database and solve the Text-to-SQL task. However, the capability of full-text for Text-to-SQL is greatly limited by the data structures and functionality of the SQL databases. In addition, the potential of natural language querying has not been comprehensively explored in the healthcare domain. To overcome these limitations, we investigate the potential of NoSQL databases, specifically Elasticsearch, and forge a new research direction for NLQ, which we refer to as Text-to-ESQ generation. This exploration requires us to re-design various aspects of NLQ, such as the target application and the advantages of NoSQL database. In our approach, we develop a two-stage controllable (TSC) framework consisting of a question-to-question (Q2Q) translation module and an ESQ condition extraction (ECE) module. These modules are carefully designed to efficiently retrieve information from the VEARS data stored in a NoSQL database. Additionally, we construct a dedicated question-ESQ pair dataset called VAERSESQ, to support the task in the healthcare domain. Extensive experiments were conducted on the VAERSESQ dataset to evaluate the proposed methods. The results, both quantitative and qualitative, demonstrate the accuracy and efficiency of our approach in generating queries for NoSQL databases, thus enabling efficient retrieval of VEARS data.