Browsing by Author "France, Robert K."
Now showing 1 - 17 of 17
Results Per Page
Sort Options
- The Academy: A Community of Information Retrieval AgentsFrance, Robert K. (1994-09-06)We commonly picture text as a sequence of words; or alternatively as a sequence of paragraphs, each of which is composed of a sequence of sentences, each of which is itself a sequence of words. It is also worth noting that text is not so much a sequence of words as a sequence of terms, including most commonly words, but also including names, numbers, code sequences, and a variety of other $#*&)&@^ tokens. Just as we commonly simplify text into a sequence of words, so too it is common in information retrieval to regard documents as single texts. Nothing is less common, though, than a document with only a single part, and that unstructured text. Search and retrieval in such a universe involves new questions: Where does a document begin and end? How can we decide how much to show to a user? When does a query need to be matched by a single node in a hypertext, and when may partial matches in several nodes count?
- Architecture of an Object-Oriented Expert System for Composite Document Analysis, Representation, and RetrievalFox, Edward A.; France, Robert K. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1986-04-01)The CODER project is a multi-year effort to investigate how best to apply artificial intelligence methods to increase the effectiveness of information retrieval systems when handling collections of composite documents. In order to ensure system adaptability and to allow reconfiguration for controlled experimentation, the project has been designed as an expert system. The use of individually tailored specialist experts coupled with standardized blackboard modules for communication and internal and external knowledge bases for managing effective knowledge allows for quick prototyping, incremental development and flexibility under change. The system as a whole is structured as a set of communicating modules, designed under an object-oriented paradigm and implemented under UNIX&tm; using pipes and the TCP/IP protocol. Inferential modules are being coded in MU-Prolog; non-inferential modules are being prototyped in MU-Prolog and will be re-implemented as needed in C++.
- An Artificial Intelligence Environment for Information Retrieval ResearchFrance, Robert K.; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1988)The CODER (COmposite Document Expert/Extended/Effective Retrieval) project is a multi-year effort to investigate how best to apply artificial intelligence methods to increase the effectiveness of information retrieval systems. Particular attention is being given to analysis and representation of heterogeneous documents, such as electronic mail digests or messages, which vary widely in style, length, topic,and structure. In order to ensure system adaptability and to allow reconfiguration for controlled experimentation, the project has been designed as a moderated expert system. This thesis covers the design problems involved in providing a unified architecture and knowledge representation scheme for such a system, and the solutions chosen for CODER. An overall object-oriented environment is constructed using a set of message-passing primitives based on a modified Prolog call paradigm. Within this environment is embedded the skeleton of a flexible expert system, where task decomposition is performed in a knowledge-oriented fashion and where subtask managers are implemented as members of a community of experts. A three-level knowledge representation formalism of elementary data types, frames, and relations is provided, and can be used to construct knowledge structures such as terms, meaning structures, and document interpretations. The use of individually tailored specialist experts coupled with standardized blackboard modules for communication and control and external knowledge bases for maintenance of factual world knowledge allows for quick prototyping, incremental development and flexibility under change. The system as a whole is structured as a set of communicating modules, defined functionally and implemented under UNIX^TM using sockets and the TCP/IP protocol for communication. Inferential modules are being coded in MU-Prolog; non-inferential modules are being prototyped in MU-Prolog and will be re-implemented as needed in C++.
- Building the CODER Lexicon: The Collins English Dictionary and its Adverb DefinitionsFox, Edward A.; Wohlwend, Robert C.; Sheldon, Phyllis R.; Chen, Qi-Fan; France, Robert K. (1986-10-01)The CODER (COmposite Document Expert/extended/effective Retrieval) project is an investigation of the applicability of artificial intelligence techniques to the information retrieval task of analyzing, storing, and retrieving heterogeneous collections of "composite documents. "In order to support some of the processing desired, and to allow experimentation in information retrieval and natural language processing, a lexicon was constructed from the machine readable Collins Dictionary of the English Language. After giving background, motivation, and a survey of related work, the Collins lexicon is discussed. Following is a description of the conversion process, the format of the resulting Prolog database, and characteristics of the dictionary and relations. To illustrate what is present and to explain how it relates to the files produced from Webster's Seventh New Collegiate Dictionary, a number of comparative charts are given. Finally, a grammar for adverb definitions is presented, together with a description of defining formula that usually indicate the type of the adverb. Ultimately it is hoped that definitions for adverbs and other words will be parsed so that the relational lexicon being constructed will include many additional relationships and other knowledge about words and their usage.
- Building the CODER Lexicon: The Collins English Dictionary and Its Adverb DefinitionsFox, Edward A.; Wohlwend, Robert C.; Sheldon, Phyllis R.; Chen, Qi-Fan; France, Robert K. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1986-10-01)The CODER (COmposite Document Expert/extended/effective Retrieval) project is an investigation of the applicability of artificial intelligence techniques to the information retrieval task of analyzing, storing, and retrieving heterogeneous collections of "composite documents." In order to support some of the processing desired, and to allow experimentation in information retrieval and natural language processing, a lexicon was constructed from the machine readable Collins dictionary of the English Language. After giving background, motivation, and a survey of related work, the Collins lexicon is discussed. Following is a description of the conversion process, the format of the resulting Prolog database, and characteristics of the dictionary and relations. To illustrate what is present and to explain how it relates to the files produced from Webster's Seventh New Collegiate Dictionary, a number of comparative charts are given. Finally, a summary of adverb definitions is presented, together with a description of defining formula that usually indicate the type of the adverb. Ultimately it is hoped that definitions for adverbs and other words will be parsed so that the relational lexicon being constructed will include many additional relationships and other knowledge about words and their usage.
- Development of a Modern OPAC: From REVTOLC to MARIANFox, Edward A.; France, Robert K.; Sahle, Eskinder; Daoud, Amjad M.; Cline, Ben E. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1993-02-01)In the Retrieval Experiment -- Virginia Tech OnLine Catalog (REVTOLC) study we carried out a large pilot test in 1987 and a larger, controlled investigation in 1990, with 216 users and roughly 500,000 MARC records. Results indicated that a forms-based interface coupled with vector and relevance feedback retrieval methods would be well received. Recent efforts developing the Multiple Access and Retrieval of Information with ANnotations (MARIAN) system have involved use of a specially developed object-oriented DBMS, construction of a client running under NeXTSTEP, programming of a distributed server with a thread assigned to each user session to increase concurrency on a small network of NeXTs, refinement of algorithms to use objects and stopping rules for greater efficiency, usability testing and iterative interface refinement.
- A Frame-Based Language in Information RetrievalWeaver, Marybeth T.; France, Robert K.; Chen, Qi-Fan; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1988)With the advent of the information society, many researchers are turning to artificial intelligence techniques to provide effective retrieval over large bodies of textual information. Yet any AI system requires a formalism for encoding its knowledge about the objects of its knowledge, the world, and the intelligence that it is designed to manifest. In the CODER system, the mission of which is to provide an environment for experiments in applying AI to information retrieval, that formalism is provided by a single well defined factual representation language. Designed as a flexible tool for retrieval research, the CODER factual representation language is a hybrid AI language involving a system of strong types for attribute values, a frame system, and a system of Prolog-like relational structures. Inheritance is enforced throughout, and the semantics of type subsumption and object matching formally defined. A collection of type and object managers called the knowledge administration complex implements this common language for storing knowledge and communicating it within the system. Of the three types of knowledge structures in the language, the frame facility has proven most useful in the retrieval domain. The factual representation language is implemented in Prolog as a set of predicates accessible to all system modules. Each level of knowledge representation (elementary primitives, frames, and relations) has a type manager; the frame and relation levels also have object managers. Storage of complete knowledge objects (statements in the factual representation language) is supported by a system or external knowledge bases. One paper discusses the frame construct itself, the implementation of the knowledge administration complex and external knowledge bases. and the use of the construct in retrieval research. The paper closes with a discussion of the utility of the language in experiments.
- Indexing Large Collections of Small Text Records for Ranked RetrievalFrance, Robert K.; Fox, Edward A. (1993)The MARIAN online public access catalog system at Virginia Tech has been developed to apply advanced information retrieval methods and object-oriented technology to the needs of library patrons. We give a description of our data model, design, processing, data representations, and retrieval operation. By identifying objects of interest during the indexing process, storing them according to our "information graph" model, and applying weighting schemes that seem appropriate for this large collection of small text records, we hope to better serve user needs. Since every text word is important in this domain, we employ opportunistic matching algorithms and a mix of data structures to support searching, that will give good performance for a large campus community, even though MARIAN runs on a distributed collection of small workstations. An initial small experiment indicates that our new ad hoc weighting scheme is more effective than a more standard approach.
- Information Interactions: User Interface Objects for CODER, INCARD, and MARIAN, v. 2.5France, Robert K. (1992-08-24)Any information system needs a user interface: a program or program module that eases the communication between the system's users and the underlying search and storage software. This document describes (part of) the specifications for the user interface to a family of information systems current at Virginia Tech: the experimental platform CODER, a specialized version of CODER dealing with medical information called INCARD for INformation about CARDiology, and a library catalog system named MARIAN.
- Integrated Access to a Large Medical Literature DatabaseFox, Edward A.; Koushik, Prabhakar M.; Chen, Qi-Fan; France, Robert K. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1991)Project INCARD (INtegrated CARdiology Database) has adapted the CODER (COmposite Document Expert/effective/extended Retrieval) system and LEND (Large External Network object oriented Database) to provide integrated access to a large collection of bibliographic citations, a full text document in cardiology, and a large thesaurus of medical terms. CODER is a distributed expert-based information system that incorporates techniques from artificial intelligence, information retrieval, and human-computer interaction to support effective access to information and knowledge bases. LEND is an object-oriented database which incorporates techniques from information retrieval and database systems to support complex objects, hypertext/hypermedia and semantic network operations efficiently with very large sets of data. LEND stores the CED lexicon, MeSH thesaurus, MEDLARS bibliographics records on cardiology, and the syllabus for the topic Abnormal Human Biology (Cardiology Section) taught at Columbia University. Together, CODER/LEND allow efficient and flexible access to all of this information while supporting rapid "intelligent" searching and hypertext-style browsing by both novice and expert users. This report gives statistics on the collections, illustrations of the system's use, and details on the overall architecture and design for Project INCARD.
- A Knowledge-Based System for Composite Document Analysis and Retrieval: Design Issues in the CODER ProjectFox, Edward A.; France, Robert K. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1986-03-01)The CODER (COmposite Document Expert/Extended/Effective Retrieval) Project aims at applying a variety of methods developed in the realm of artificial intelligence to improve the performance of information retrieval systems. A prototype CODER system is being developed and will serve as a testbed for future research in this area. Initial experimentation will take place on a collection of more than three years of issues of the AIList ARPANET Digest CODER is being developed in MU-Prolog and C++ as a collection of experts communicating through central blackboards using UNIX&tm; pipes and the TCP/IP protocol. This distributed system can be divided up across several machines, to best utilize special display devices, storage facilities, and processors. There is a central spine, including document text and document knowledge representations, and a large lexicon being constructed from two machine readable, English dictionaries. An entry/analysis subsystem carries out detailed analysis of composite, documents, determining the structure and type of the whole and of each part. An access/retrieval subsystem has models of each user, can accomodate a variety of query languages, and supports browsing, searching, and immediate feedback. Many issues must be dealt with in the design of such a system, including issues of knowledge representation, natural language processing, storage management and support environments. This paper gives background, describes related work, explains the design principles and architecture, and closes with future plans.
- MARIAN DesignFrance, Robert K.; Cline, Ben E.; Fox, Edward A. (1995-02-14)MARIAN (Multiple Access Retrieval of library Information with ANotations) is an online library catalog information system. Intended for library end-users rather than catalogers, it provides controlled search by author, subject entry, and imprint; keyword search by title, subject, and other MARC text fields; feedback, locating the closest books to a relevant book or books; and user annotations of books.
- MARIAN: Flexible Interoperability for Federated Digital LibrariesGoncalves, Marcos A.; France, Robert K.; Fox, Edward A.; Hilf, Eberhard R.; Zimmermann, Kerstin; Severiens, Thomas (2001)Federated digital libraries are composed of distributed autonomous (heterogeneous) information services but provide users with a transparent, integrated view of collected information respecting different information sources' autonomy. In this paper we discuss a federated system for the Networked Digital Library of Theses and Dissertations (NDLTD), an international consortium of universities, libraries, and other supporting institutions focused on electronic theses and dissertations (ETDs). The NDLTD has so far allowed its members considerable autonomy, though agreements are developing on metadata standards and on support of the Open Archives initiative that eventually will promote greater homogeneity. At present, federation requires dealing flexibly with differences among systems, ontologies, and data formats. Our solution involves adapting MARIAN, an object oriented digital library retrieval system developed with support by NLM and NSF, to serve as mediation middleware for the federated NDLTD collection. Components of the solution include: 1) the use of several harvesting techniques; 2) an architecture based on object-oriented ontologies of search modules and metadata; 3) diversity within the harvested data joined to a single collection view for the user; and 4) an integrated framework for addressing such questions as data quality, information compression, and flexible search. The system can handle very large dynamic collections. An adaptable relationship between the collection view and harvested data facilitates adding new sites to the federation and adapting to changes in existing sites. MARIAN's modular architecture and powerful and flexible data model work together to build an effective integrated solution within a simple uniform framework. We present both the general design of the system and operational details of a preliminary federated collection involving several thousand ETDs in four different formats and two languages from USA and Europe.
- Use and Usability in a Digital Library Search SystemFrance, Robert K.; Nowell, Lucy Terry; Fox, Edward A.; Saad, Rani A.; Zhao, Jianxin (Virginia Tech Digital Library Research Laboratory, 1999)Digital libraries must reach out to users from all walks of life, serving information needs at all levels. To do this, they must attain high standards of usability over an extremely broad audience. This paper details the evolution of one important digital library component as it has grown in functionality and usefulness over several years of use by a live, unrestricted community. Central to its evolution have been user studies, analysis of use patterns, and formative usability evaluation. We extrapolate that all three components are necessary in the production of successful digital library systems.
- Weights and Measures: An Axiomatic Model for Similarity ComputationsFrance, Robert K. (1994)This paper proposes a formal model for similarity functions, first over arbitrary objects, then over sets and the sorts of weighted sets that are found in text retrieval systems. Using a handful of axioms and constraints, we are able to make statements about the behavior of such functions in reference to set overlap and to noise. The model is then used to analyze, and we hope illuminate, several popular text similarity functions.
- When Stopping Rules Don't StopFrance, Robert K. (1995)Performing ranked retrieval on large document collections can be slow. The method of stopping rules has been proposed to make it more efficient. Stopping rules, which terminate search when the highest ranked documents have been determined to some degree of likelihood, are attractive and have proven useful in clustering, but have not worked well in retrieval experiments. This paper presents a statistical analysis of why they have failed and where they can be expected to continue failing.
- The World According to MARIAN: How the Document Universe is Represented and Searched in the MARIAN/ Academy Digital Library Search SystemFrance, Robert K. (1999-09-27)This presentation focuses on MARIAN (Multiple Access Retrieval of library Information with ANotations), an online library catalog information system. MARIAN is intended for library end-users rather than catalogers, provides controlled search by author, subject entry, and imprint; keyword search by title, subject, and other MARC text fields; feedback, locating the closest books to a relevant book or books; and user annotations of books.