Show simple item record

dc.contributor.authorGabbay, Igal
dc.date.accessioned2016-06-27T19:03:43Z
dc.date.available2016-06-27T19:03:43Z
dc.date.issued2004-01
dc.identifiereprint:287
dc.identifier.urihttp://hdl.handle.net/10919/71562
dc.description.abstractWhile an information retrieval system takes as input a user query and returns a list of relevant documents chosen from a large collection, a question answering system attempts to produce an exact answer. Recent research, motivated by the question answering track of the Text REtrieval Conference (TREC) has focused mainly on answering ‘factoid’ questions concerned with names, places, dates etc. in the news domain. However, questions seeking definitions of terms are common in the logs of search engines. The objective of this project was therefore to investigate methods of retrieving definitions from scientific documents. The subject domain was salmon, and an appropriate test collection of articles was created, pre-processed and indexed. Relevant terms were obtained from salmon researchers and a fish database. A system was built which accepted a term as input, retrieved relevant documents from the collection using a search engine, identified definition phrases within them using a vocabulary of syntactic patterns and associated heuristics, and produced as output phrases explaining the term. Four experiments were carried out which progressively extended and refined the patterns. The performance of the system, measured using an appropriate form of precision, improved over the experiments from 8.6% to 63.6%. The main findings of the research were: (1) Definitions were diverse despite the documents’ homogeneity and found not only in the Introduction and Abstract sections but also in the Methods and References; (2) Nevertheless, syntactic patterns were a useful starting point in extracting them; (3) Three patterns accounted for 90% of candidate phrases; (4) Statistically, the ordinal number of the instance of the term in a document was a better indicator of the presence of a definition than either sentence position and length, or the number of sentences in the document. Next steps include classifying terms, using information extraction-like templates, resolving basic anaphors, ranking answers, exploiting the structure of scientific papers, and refining the evaluation process.en_US
dc.format.mimetypeapplication/pdfen_US
dc.language.isoenen_US
dc.subjectquestion answeringen_US
dc.subjectdefinition questionsen_US
dc.subjectsalmonen_US
dc.subjectdefinitional questionsen_US
dc.subjectcomputational linguisticsen_US
dc.subjectnatural language processingen_US
dc.subject.lccQH301
dc.subject.lccQA75
dc.subject.lccP1
dc.subject.lccQ1
dc.subject.lccAI
dc.titleRetrieving Definitions from Scientific Text in the Salmon Fish Domain by Lexical Pattern Matchingen_US
dc.typeThesisen_US
dc.contributor.departmentTechnical Communicationen_US
thesis.degree.levelmastersen_US
thesis.degree.grantorUniversity of Limericken_US


Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record