Query Expansion Study for Clinical Decision Support
Information retrieval is widely used for retrieving relevant information among a variety of data, such as text documents, images, audio and videos. Since the first medical batch retrieval system was developed in mid 1960s, significant research efforts have focused on applying information retrieval to medical data. However, despite the vast developments in medical information retrieval and accompanying technologies, the actual promise of this area remains unfulfilled due to properties of medical data and the huge volume of medical literature.
Specifically, the recall and precision of the selected dataset from the TREC clinical decision support track are low. The overriding objective of this thesis is to improve the performance of information retrieval techniques applied to biomedical text documents. We have focused on improving recall and precision among the top retrieved results. To that end, we have removed redundant words, and then expanded queries by adding MeSH terms in TREC CDS topics. We have also used other external data sources and domain knowledge to implement the expansion. In addition, we have also considered using the doc2vec model to optimize retrieval. Finally, we have applied learning to rank which sorts documents based on relevance and put relevant documents in front of irrelevant documents, so as to return the relevant retrieved data on the top. We have discovered that queries, expanded with external data sources and domain knowledge, perform better than applying the TREC topic information directly.