Reports, Digital Library Research Laboratory

Permanent URI for this collection

https://hdl.handle.net/10919/18734

Browse

Now showing 1 - 8 of 8

The Academy: A Community of Information Retrieval Agents
France, Robert K. (1994-09-06)
We commonly picture text as a sequence of words; or alternatively as a sequence of paragraphs, each of which is composed of a sequence of sentences, each of which is itself a sequence of words. It is also worth noting that text is not so much a sequence of words as a sequence of terms, including most commonly words, but also including names, numbers, code sequences, and a variety of other $#*&)&@^ tokens. Just as we commonly simplify text into a sequence of words, so too it is common in information retrieval to regard documents as single texts. Nothing is less common, though, than a document with only a single part, and that unstructured text. Search and retrieval in such a universe involves new questions: Where does a document begin and end? How can we decide how much to show to a user? When does a query need to be matched by a single node in a hypertext, and when may partial matches in several nodes count?
Building the CODER Lexicon: The Collins English Dictionary and its Adverb Definitions
Fox, Edward A.; Wohlwend, Robert C.; Sheldon, Phyllis R.; Chen, Qi-Fan; France, Robert K. (1986-10-01)
The CODER (COmposite Document Expert/extended/effective Retrieval) project is an investigation of the applicability of artificial intelligence techniques to the information retrieval task of analyzing, storing, and retrieving heterogeneous collections of "composite documents. "In order to support some of the processing desired, and to allow experimentation in information retrieval and natural language processing, a lexicon was constructed from the machine readable Collins Dictionary of the English Language. After giving background, motivation, and a survey of related work, the Collins lexicon is discussed. Following is a description of the conversion process, the format of the resulting Prolog database, and characteristics of the dictionary and relations. To illustrate what is present and to explain how it relates to the files produced from Webster's Seventh New Collegiate Dictionary, a number of comparative charts are given. Finally, a grammar for adverb definitions is presented, together with a description of defining formula that usually indicate the type of the adverb. Ultimately it is hoped that definitions for adverbs and other words will be parsed so that the relational lexicon being constructed will include many additional relationships and other knowledge about words and their usage.
Indexing Large Collections of Small Text Records for Ranked Retrieval
France, Robert K.; Fox, Edward A. (1993)
The MARIAN online public access catalog system at Virginia Tech has been developed to apply advanced information retrieval methods and object-oriented technology to the needs of library patrons. We give a description of our data model, design, processing, data representations, and retrieval operation. By identifying objects of interest during the indexing process, storing them according to our "information graph" model, and applying weighting schemes that seem appropriate for this large collection of small text records, we hope to better serve user needs. Since every text word is important in this domain, we employ opportunistic matching algorithms and a mix of data structures to support searching, that will give good performance for a large campus community, even though MARIAN runs on a distributed collection of small workstations. An initial small experiment indicates that our new ad hoc weighting scheme is more effective than a more standard approach.
Information Interactions: User Interface Objects for CODER, INCARD, and MARIAN, v. 2.5
France, Robert K. (1992-08-24)
Any information system needs a user interface: a program or program module that eases the communication between the system's users and the underlying search and storage software. This document describes (part of) the specifications for the user interface to a family of information systems current at Virginia Tech: the experimental platform CODER, a specialized version of CODER dealing with medical information called INCARD for INformation about CARDiology, and a library catalog system named MARIAN.
MARIAN Design
France, Robert K.; Cline, Ben E.; Fox, Edward A. (1995-02-14)
MARIAN (Multiple Access Retrieval of library Information with ANotations) is an online library catalog information system. Intended for library end-users rather than catalogers, it provides controlled search by author, subject entry, and imprint; keyword search by title, subject, and other MARC text fields; feedback, locating the closest books to a relevant book or books; and user annotations of books.
Use and Usability in a Digital Library Search System
France, Robert K.; Nowell, Lucy Terry; Fox, Edward A.; Saad, Rani A.; Zhao, Jianxin (Virginia Tech Digital Library Research Laboratory, 1999)
Digital libraries must reach out to users from all walks of life, serving information needs at all levels. To do this, they must attain high standards of usability over an extremely broad audience. This paper details the evolution of one important digital library component as it has grown in functionality and usefulness over several years of use by a live, unrestricted community. Central to its evolution have been user studies, analysis of use patterns, and formative usability evaluation. We extrapolate that all three components are necessary in the production of successful digital library systems.
Weights and Measures: An Axiomatic Model for Similarity Computations
France, Robert K. (1994)
This paper proposes a formal model for similarity functions, first over arbitrary objects, then over sets and the sorts of weighted sets that are found in text retrieval systems. Using a handful of axioms and constraints, we are able to make statements about the behavior of such functions in reference to set overlap and to noise. The model is then used to analyze, and we hope illuminate, several popular text similarity functions.
When Stopping Rules Don't Stop
France, Robert K. (1995)
Performing ranked retrieval on large document collections can be slow. The method of stopping rules has been proposed to make it more efficient. Stopping rules, which terminate search when the highest ranked documents have been determined to some degree of likelihood, are attractive and have proven useful in clustering, but have not worked well in retrieval experiments. This paper presents a statistical analysis of why they have failed and where they can be expected to continue failing.

Browse

Browsing Reports, Digital Library Research Laboratory by Author "France, Robert K."

Results Per Page

Sort Options