Experimental comparison of schemes for interpreting Boolean queries

TR Number
Date
1988
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Polytechnic Institute and State University
Abstract

The standard interpretation of the logical operators in a Boolean retrieval system is in general too strict. A standard Boolean query rarely comes close to retrieving all and only those documents which are relevant to the user. An AND query is often too narrow and an 0 R query is often too broad. The choice of the AND results in retrieving on the left end of a typical average recall-precision graph, while the choice of the OR results in retrieving on the right end, implying a tradeoff between precision and recall. This study basically examines various proposed schemes, the P-norm, Classical Fuzzy-Set, MMM, Paice and TIRS, which provide means to soften the interpretation of the logical operators, and thus to attain both high precision and high recall search performance.

Each of the above schemes has shown great improvement over the standard Boolean scheme in terms of retrieval effectiveness. The differences in retrieval effectiveness between P-norm, Paice and MMM are shown to be relatively small. However, related performance results obtained gives evidence of the ranking: P-norm, Paice, MMM and then TIRS.

This study employs the INNER PRODUCT function for computing the similarity between a document point and a query point in TIRS. There may be other choices of similarity functions for TIRS, but irrespective of the function used, the TIRS approach, having to deal with associated min-terms rather than the original query, is difficult to realize and involves far greater computational overhead than the other schemes.

The P-norm scheme, being a distance-based approach, has greater intuitive appeal than the Paice or MMM scheme. However, in terms of computational overhead required of each scheme, both the Paice and MM M are superior to P-norm. The Paice and MMM schemes are essentially variations of the classical fuzzy-set scheme. Both perform much better than the classical fuzzy-set scheme in terms of retrieval effectiveness.

Description
Keywords
Citation
Collections