VTechWorks staff will be away for the winter holidays starting Tuesday, December 24, 2024, through Wednesday, January 1, 2025, and will not be replying to requests during this time. Thank you for your patience, and happy holidays!
 

Regression analysis of extended vectors to obtain coefficients for use in probabilistic information retrieval systems

dc.contributor.authorNunn, Gary L.en
dc.contributor.departmentComputer Science and Applicationsen
dc.date.accessioned2019-10-10T19:11:26Zen
dc.date.available2019-10-10T19:11:26Zen
dc.date.issued1987en
dc.description.abstractPrevious work by Fox has extended the vector space model of information retrieval and its implementation in the SMART system so different types of information about documents can be separately handled as multiple subvectors, each for a different concept type. We hypothesized that relevance of a document could be best predicted if proper coefficients are obtained to reflect the importance of the query-document similarity for each subvector when computing an overall similarity value. Two different research collections, CACM and ISI, each split into halves, were used to generate data for the regression studies to obtain coefficients. Most of the variance in relevance could be accounted for by only four of the subvectors (authors, Computing Review descriptors, links, and terms) for the CACM1 collection. In the ISI1 collection, two of the vectors (terms and cocitations) accounted for most of the variance. Log transformed data and samples of the records gave the best RSQ's; .6654 was the highest RSQ (binary relevance). The regression runs provided coefficients which were used in subsequent feedback runs in SMART. Having ranked relevance did not improve the regression model over binary relevance. The coefficients in the feedback runs with SMART proved to be of limited usefulness since improvements in precision were in the 1-5% range. Although log data and samples of the records gave the best RSQ's, coefficients from log values of all data improved precision the most. The findings of this study support previous work of Fox, that additional information improves retrieval. Regression coefficients improved precision slightly when used as subvector weights. Log transforming the data values for the concept types modestly helped both the regression analyses and the retrieval in SMART.en
dc.description.degreeM.C.S.en
dc.format.extentvii, 45 leavesen
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttp://hdl.handle.net/10919/94435en
dc.language.isoen_USen
dc.publisherVirginia Polytechnic Institute and State Universityen
dc.relation.isformatofOCLC# 17746024en
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subject.lccLD5655.V851 1987.N877en
dc.subject.lcshInformation storage and retrieval systems -- Researchen
dc.titleRegression analysis of extended vectors to obtain coefficients for use in probabilistic information retrieval systemsen
dc.typeMaster's projecten
dc.type.dcmitypeTexten
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LD5655.V851_1987.N877.pdf
Size:
2.18 MB
Format:
Adobe Portable Document Format