Digital Library Research Laboratory
Permanent URI for this community
Browse
Browsing Digital Library Research Laboratory by Issue Date
Now showing 1 - 20 of 134
Results Per Page
Sort Options
- Building the CODER Lexicon: The Collins English Dictionary and its Adverb DefinitionsFox, Edward A.; Wohlwend, Robert C.; Sheldon, Phyllis R.; Chen, Qi-Fan; France, Robert K. (1986-10-01)The CODER (COmposite Document Expert/extended/effective Retrieval) project is an investigation of the applicability of artificial intelligence techniques to the information retrieval task of analyzing, storing, and retrieving heterogeneous collections of "composite documents. "In order to support some of the processing desired, and to allow experimentation in information retrieval and natural language processing, a lexicon was constructed from the machine readable Collins Dictionary of the English Language. After giving background, motivation, and a survey of related work, the Collins lexicon is discussed. Following is a description of the conversion process, the format of the resulting Prolog database, and characteristics of the dictionary and relations. To illustrate what is present and to explain how it relates to the files produced from Webster's Seventh New Collegiate Dictionary, a number of comparative charts are given. Finally, a grammar for adverb definitions is presented, together with a description of defining formula that usually indicate the type of the adverb. Ultimately it is hoped that definitions for adverbs and other words will be parsed so that the relational lexicon being constructed will include many additional relationships and other knowledge about words and their usage.
- Information Interactions: User Interface Objects for CODER, INCARD, and MARIAN, v. 2.5France, Robert K. (1992-08-24)Any information system needs a user interface: a program or program module that eases the communication between the system's users and the underlying search and storage software. This document describes (part of) the specifications for the user interface to a family of information systems current at Virginia Tech: the experimental platform CODER, a specialized version of CODER dealing with medical information called INCARD for INformation about CARDiology, and a library catalog system named MARIAN.
- Indexing Large Collections of Small Text Records for Ranked RetrievalFrance, Robert K.; Fox, Edward A. (1993)The MARIAN online public access catalog system at Virginia Tech has been developed to apply advanced information retrieval methods and object-oriented technology to the needs of library patrons. We give a description of our data model, design, processing, data representations, and retrieval operation. By identifying objects of interest during the indexing process, storing them according to our "information graph" model, and applying weighting schemes that seem appropriate for this large collection of small text records, we hope to better serve user needs. Since every text word is important in this domain, we employ opportunistic matching algorithms and a mix of data structures to support searching, that will give good performance for a large campus community, even though MARIAN runs on a distributed collection of small workstations. An initial small experiment indicates that our new ad hoc weighting scheme is more effective than a more standard approach.
- Weights and Measures: An Axiomatic Model for Similarity ComputationsFrance, Robert K. (1994)This paper proposes a formal model for similarity functions, first over arbitrary objects, then over sets and the sorts of weighted sets that are found in text retrieval systems. Using a handful of axioms and constraints, we are able to make statements about the behavior of such functions in reference to set overlap and to noise. The model is then used to analyze, and we hope illuminate, several popular text similarity functions.
- The Academy: A Community of Information Retrieval AgentsFrance, Robert K. (1994-09-06)We commonly picture text as a sequence of words; or alternatively as a sequence of paragraphs, each of which is composed of a sequence of sentences, each of which is itself a sequence of words. It is also worth noting that text is not so much a sequence of words as a sequence of terms, including most commonly words, but also including names, numbers, code sequences, and a variety of other $#*&)&@^ tokens. Just as we commonly simplify text into a sequence of words, so too it is common in information retrieval to regard documents as single texts. Nothing is less common, though, than a document with only a single part, and that unstructured text. Search and retrieval in such a universe involves new questions: Where does a document begin and end? How can we decide how much to show to a user? When does a query need to be matched by a single node in a hypertext, and when may partial matches in several nodes count?
- When Stopping Rules Don't StopFrance, Robert K. (1995)Performing ranked retrieval on large document collections can be slow. The method of stopping rules has been proposed to make it more efficient. Stopping rules, which terminate search when the highest ranked documents have been determined to some degree of likelihood, are attractive and have proven useful in clustering, but have not worked well in retrieval experiments. This paper presents a statistical analysis of why they have failed and where they can be expected to continue failing.
- MARIAN DesignFrance, Robert K.; Cline, Ben E.; Fox, Edward A. (1995-02-14)MARIAN (Multiple Access Retrieval of library Information with ANotations) is an online library catalog information system. Intended for library end-users rather than catalogers, it provides controlled search by author, subject entry, and imprint; keyword search by title, subject, and other MARC text fields; feedback, locating the closest books to a relevant book or books; and user annotations of books.
- Use and Usability in a Digital Library Search SystemFrance, Robert K.; Nowell, Lucy Terry; Fox, Edward A.; Saad, Rani A.; Zhao, Jianxin (Virginia Tech Digital Library Research Laboratory, 1999)Digital libraries must reach out to users from all walks of life, serving information needs at all levels. To do this, they must attain high standards of usability over an extremely broad audience. This paper details the evolution of one important digital library component as it has grown in functionality and usefulness over several years of use by a live, unrestricted community. Central to its evolution have been user studies, analysis of use patterns, and formative usability evaluation. We extrapolate that all three components are necessary in the production of successful digital library systems.
- The World According to MARIAN: How the Document Universe is Represented and Searched in the MARIAN/ Academy Digital Library Search SystemFrance, Robert K. (1999-09-27)This presentation focuses on MARIAN (Multiple Access Retrieval of library Information with ANotations), an online library catalog information system. MARIAN is intended for library end-users rather than catalogers, provides controlled search by author, subject entry, and imprint; keyword search by title, subject, and other MARC text fields; feedback, locating the closest books to a relevant book or books; and user annotations of books.
- MARIAN SearchersVirginia Tech. Digital Library Research Laboratory (1999-10-13)A searcher in MARIAN is a class manager that can map abstract descriptions into weighted sets of matching class instances. This presentation focuses on linked-based, class-based, node-based, and context-based searchers.
- Open Archives Work at Virginia TechSuleman, Hussein (2000-06-01)This presentation describes the Open Archives Initiative, projects such as the Computer Science Teaching Center and the National Digital Library of Theses and Dissertations (NDLTD), the MARIAN catalog, and the organizational and technical issues associated with open archiving.
- Extending Interoperability of Digital Libraries: Building on the Open Archives InitiativeACM Conference on Digital Libraries (2000-06-03)This document describes the proceedings of the 2000 ACM DL OAI Workshop. Workshop topics include technical issues, Virginia Tech's report on open archives work, and highlights of the Santa Fe Convention.
- MARIAN: Flexible Interoperability for Federated Digital LibrariesGoncalves, Marcos A.; France, Robert K.; Fox, Edward A.; Hilf, Eberhard R.; Zimmermann, Kerstin; Severiens, Thomas (2001)Federated digital libraries are composed of distributed autonomous (heterogeneous) information services but provide users with a transparent, integrated view of collected information respecting different information sources' autonomy. In this paper we discuss a federated system for the Networked Digital Library of Theses and Dissertations (NDLTD), an international consortium of universities, libraries, and other supporting institutions focused on electronic theses and dissertations (ETDs). The NDLTD has so far allowed its members considerable autonomy, though agreements are developing on metadata standards and on support of the Open Archives initiative that eventually will promote greater homogeneity. At present, federation requires dealing flexibly with differences among systems, ontologies, and data formats. Our solution involves adapting MARIAN, an object oriented digital library retrieval system developed with support by NLM and NSF, to serve as mediation middleware for the federated NDLTD collection. Components of the solution include: 1) the use of several harvesting techniques; 2) an architecture based on object-oriented ontologies of search modules and metadata; 3) diversity within the harvested data joined to a single collection view for the user; and 4) an integrated framework for addressing such questions as data quality, information compression, and flexible search. The system can handle very large dynamic collections. An adaptable relationship between the collection view and harvested data facilitates adding new sites to the federation and adapting to changes in existing sites. MARIAN's modular architecture and powerful and flexible data model work together to build an effective integrated solution within a simple uniform framework. We present both the general design of the system and operational details of a preliminary federated collection involving several thousand ETDs in four different formats and two languages from USA and Europe.
- Building Interoperable Digital Libraries: A Practical Guide to Creating Open ArchivesSuleman, Hussein (2001)This presentation discusses the development of the Open Archives Initiative (OAI), metadata harvesting, digital library interoperability, the National Digital Library of Theses and Dissertations (NDLTD), and more.
- Open Archives: Distributed Services for Physicists and Graduate Students (OAD)Fox, Edward A.; Stamerjohanns, Heinrich; Hilf, Eberhard R.; Mittler, Elmar; Zia, Royce K. P. (2001)This 2001-2002 report evaluates the research done to improve distributed digital library services for two user communities: physicists and graduate students.
- Open Archiving @ Virginia TechSuleman, Hussein (2001-01-23)This presentation discusses why Virginia Tech is involved in the Open Archives Initiative (OAI), as well as OAI compliance, OAI technical requirements, and more.
- The Open Archives Initiative: Realizing Simple and Effective Digital Library InteroperabilitySuleman, Hussein; Fox, Edward A. (2001-03-01)The Open Archives Initiative (OAI) is dedicated to solving problems of digital library interoperability. Its focus has been on defining simple protocols, most recently for the exchange of metadata from archives. The OAI evolved out of a need to increase access to scholarly publications by supporting the creation of interoperable digital libraries. As a first step towards such interoperability, a metadata harvesting protocol was developed to support the streaming of metadata from one repository to another, ultimately to a provider of user services such as browsing, searching, or annotation. This article provides an overview of the mission, philosophy, and technical framework of the OAI.
- Using the Repository Explorer to Archive OAI Protocol ComplianceSuleman, Hussein (2001-06-24)The Open Archives Initiative (OAI) is dedicated to solving problems of digital library interoperability by defining simple protocols, most recently the Open Archives Initiative Protocol for Metadata Harvesting [2], which was unveiled in January 2001. To support the adoption of this new interoperability technology, we have developed the Repository Explorer [1], a web-based tool to enforce compliance to the same interpretation of the protocol by the various different server implementations. This demonstration will show how the Repository Explorer can be used to perform either user-driven browsing or automatic testing of an implementation of the protocol.
- Enforcing Interoperability with the Open Archives Initiative Repository ExplorerSuleman, Hussein (2001-06-24)This presentation gives an overview of the Open Archives Initiative (OAI), protocol validation procedures, interactive browsing, and the OAI repository explorer tool.
- Introduction to the OAI Metadata Harvesting ProtocolSuleman, Hussein (2001-09-13)This presentation describes the OAI-MHP (Open Archives Initiative - Metadata Harvesting Protocol), federated library services and searching, and other digital library tools and protocols.