Query Expansion Study for Clinical Decision Support

dc.contributor.authorZhuang, Wenjieen
dc.contributor.committeechairFan, Weiguo Patricken
dc.contributor.committeememberHuang, Berten
dc.contributor.committeememberCao, Yangen
dc.contributor.committeememberTilevich, Elien
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2018-02-13T09:00:17Zen
dc.date.available2018-02-13T09:00:17Zen
dc.date.issued2018-02-12en
dc.description.abstractInformation retrieval is widely used for retrieving relevant information among a variety of data, such as text documents, images, audio and videos. Since the first medical batch retrieval system was developed in mid 1960s, significant research efforts have focused on applying information retrieval to medical data. However, despite the vast developments in medical information retrieval and accompanying technologies, the actual promise of this area remains unfulfilled due to properties of medical data and the huge volume of medical literature. Specifically, the recall and precision of the selected dataset from the TREC clinical decision support track are low. The overriding objective of this thesis is to improve the performance of information retrieval techniques applied to biomedical text documents. We have focused on improving recall and precision among the top retrieved results. To that end, we have removed redundant words, and then expanded queries by adding MeSH terms in TREC CDS topics. We have also used other external data sources and domain knowledge to implement the expansion. In addition, we have also considered using the doc2vec model to optimize retrieval. Finally, we have applied learning to rank which sorts documents based on relevance and put relevant documents in front of irrelevant documents, so as to return the relevant retrieved data on the top. We have discovered that queries, expanded with external data sources and domain knowledge, perform better than applying the TREC topic information directly.en
dc.description.abstractgeneralInformation retrieval is widely used for retrieving relevant information among a variety of data. Since the first medical batch retrieval system was developed in mid 1960s, significant research efforts have focused on applying information retrieval to medical data. However the actual promise of this area remains unfulfilled due to certain properties of medical data and the sheer volume of medical literature. The overriding objective of this thesis is to improve the performance of information retrieval techniques applied to biomedical text documents. This thesis presents several ways to implement query expansion in order to make more efficient retrieval. Then this thesis discusses some approaches to put documents relevant to the queries at the top.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:14312en
dc.identifier.urihttp://hdl.handle.net/10919/82068en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectQuery Expansionen
dc.subjectInformation Retrievalen
dc.subjectDoc2Vecen
dc.subjectMeSH Termen
dc.subjectLearning to Ranken
dc.titleQuery Expansion Study for Clinical Decision Supporten
dc.typeThesisen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhuang_W_T_2018.pdf
Size:
903.57 KB
Format:
Adobe Portable Document Format

Collections