Virginia Tech
    • Log in
    View Item 
    •   VTechWorks Home
    • ETDs: Networked Digital Library of Theses and Dissertations (NDLTD)
    • NDLTD Theses and Dissertations
    • View Item
    •   VTechWorks Home
    • ETDs: Networked Digital Library of Theses and Dissertations (NDLTD)
    • NDLTD Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Retrieving Definitions from Scientific Text in the Salmon Fish Domain by Lexical Pattern Matching

    Thumbnail
    View/Open
    287_1.pdf (49.87Kb)
    Downloads: 144
    287_2.pdf (260.2Kb)
    Downloads: 523
    287_3.pdf (55.88Kb)
    Downloads: 49
    287_4.pdf (70.00Kb)
    Downloads: 66
    287_5.pdf (200.8Kb)
    Downloads: 45
    287_6.pdf (37.34Kb)
    Downloads: 29
    287_7.pdf (143.7Kb)
    Downloads: 115
    Date
    2004-01
    Author
    Gabbay, Igal
    Metadata
    Show full item record
    Abstract
    While an information retrieval system takes as input a user query and returns a list of relevant documents chosen from a large collection, a question answering system attempts to produce an exact answer. Recent research, motivated by the question answering track of the Text REtrieval Conference (TREC) has focused mainly on answering ‘factoid’ questions concerned with names, places, dates etc. in the news domain. However, questions seeking definitions of terms are common in the logs of search engines. The objective of this project was therefore to investigate methods of retrieving definitions from scientific documents. The subject domain was salmon, and an appropriate test collection of articles was created, pre-processed and indexed. Relevant terms were obtained from salmon researchers and a fish database. A system was built which accepted a term as input, retrieved relevant documents from the collection using a search engine, identified definition phrases within them using a vocabulary of syntactic patterns and associated heuristics, and produced as output phrases explaining the term. Four experiments were carried out which progressively extended and refined the patterns. The performance of the system, measured using an appropriate form of precision, improved over the experiments from 8.6% to 63.6%. The main findings of the research were: (1) Definitions were diverse despite the documents’ homogeneity and found not only in the Introduction and Abstract sections but also in the Methods and References; (2) Nevertheless, syntactic patterns were a useful starting point in extracting them; (3) Three patterns accounted for 90% of candidate phrases; (4) Statistically, the ordinal number of the instance of the term in a document was a better indicator of the presence of a definition than either sentence position and length, or the number of sentences in the document. Next steps include classifying terms, using information extraction-like templates, resolving basic anaphors, ranking answers, exploiting the structure of scientific papers, and refining the evaluation process.
    URI
    http://hdl.handle.net/10919/71562
    Collections
    • NDLTD Theses and Dissertations [176]

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us
     

     

    VTechWorks

    AboutPoliciesHelp

    Browse

    All of VTechWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Log inRegister

    Statistics

    View Usage Statistics

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us