Reports, Digital Library Research Laboratory
Permanent URI for this collection
Browse
Browsing Reports, Digital Library Research Laboratory by Issue Date
Now showing 1 - 20 of 28
Results Per Page
Sort Options
- Building the CODER Lexicon: The Collins English Dictionary and its Adverb DefinitionsFox, Edward A.; Wohlwend, Robert C.; Sheldon, Phyllis R.; Chen, Qi-Fan; France, Robert K. (1986-10-01)The CODER (COmposite Document Expert/extended/effective Retrieval) project is an investigation of the applicability of artificial intelligence techniques to the information retrieval task of analyzing, storing, and retrieving heterogeneous collections of "composite documents. "In order to support some of the processing desired, and to allow experimentation in information retrieval and natural language processing, a lexicon was constructed from the machine readable Collins Dictionary of the English Language. After giving background, motivation, and a survey of related work, the Collins lexicon is discussed. Following is a description of the conversion process, the format of the resulting Prolog database, and characteristics of the dictionary and relations. To illustrate what is present and to explain how it relates to the files produced from Webster's Seventh New Collegiate Dictionary, a number of comparative charts are given. Finally, a grammar for adverb definitions is presented, together with a description of defining formula that usually indicate the type of the adverb. Ultimately it is hoped that definitions for adverbs and other words will be parsed so that the relational lexicon being constructed will include many additional relationships and other knowledge about words and their usage.
- Information Interactions: User Interface Objects for CODER, INCARD, and MARIAN, v. 2.5France, Robert K. (1992-08-24)Any information system needs a user interface: a program or program module that eases the communication between the system's users and the underlying search and storage software. This document describes (part of) the specifications for the user interface to a family of information systems current at Virginia Tech: the experimental platform CODER, a specialized version of CODER dealing with medical information called INCARD for INformation about CARDiology, and a library catalog system named MARIAN.
- Indexing Large Collections of Small Text Records for Ranked RetrievalFrance, Robert K.; Fox, Edward A. (1993)The MARIAN online public access catalog system at Virginia Tech has been developed to apply advanced information retrieval methods and object-oriented technology to the needs of library patrons. We give a description of our data model, design, processing, data representations, and retrieval operation. By identifying objects of interest during the indexing process, storing them according to our "information graph" model, and applying weighting schemes that seem appropriate for this large collection of small text records, we hope to better serve user needs. Since every text word is important in this domain, we employ opportunistic matching algorithms and a mix of data structures to support searching, that will give good performance for a large campus community, even though MARIAN runs on a distributed collection of small workstations. An initial small experiment indicates that our new ad hoc weighting scheme is more effective than a more standard approach.
- Weights and Measures: An Axiomatic Model for Similarity ComputationsFrance, Robert K. (1994)This paper proposes a formal model for similarity functions, first over arbitrary objects, then over sets and the sorts of weighted sets that are found in text retrieval systems. Using a handful of axioms and constraints, we are able to make statements about the behavior of such functions in reference to set overlap and to noise. The model is then used to analyze, and we hope illuminate, several popular text similarity functions.
- The Academy: A Community of Information Retrieval AgentsFrance, Robert K. (1994-09-06)We commonly picture text as a sequence of words; or alternatively as a sequence of paragraphs, each of which is composed of a sequence of sentences, each of which is itself a sequence of words. It is also worth noting that text is not so much a sequence of words as a sequence of terms, including most commonly words, but also including names, numbers, code sequences, and a variety of other $#*&)&@^ tokens. Just as we commonly simplify text into a sequence of words, so too it is common in information retrieval to regard documents as single texts. Nothing is less common, though, than a document with only a single part, and that unstructured text. Search and retrieval in such a universe involves new questions: Where does a document begin and end? How can we decide how much to show to a user? When does a query need to be matched by a single node in a hypertext, and when may partial matches in several nodes count?
- When Stopping Rules Don't StopFrance, Robert K. (1995)Performing ranked retrieval on large document collections can be slow. The method of stopping rules has been proposed to make it more efficient. Stopping rules, which terminate search when the highest ranked documents have been determined to some degree of likelihood, are attractive and have proven useful in clustering, but have not worked well in retrieval experiments. This paper presents a statistical analysis of why they have failed and where they can be expected to continue failing.
- MARIAN DesignFrance, Robert K.; Cline, Ben E.; Fox, Edward A. (1995-02-14)MARIAN (Multiple Access Retrieval of library Information with ANotations) is an online library catalog information system. Intended for library end-users rather than catalogers, it provides controlled search by author, subject entry, and imprint; keyword search by title, subject, and other MARC text fields; feedback, locating the closest books to a relevant book or books; and user annotations of books.
- Use and Usability in a Digital Library Search SystemFrance, Robert K.; Nowell, Lucy Terry; Fox, Edward A.; Saad, Rani A.; Zhao, Jianxin (Virginia Tech Digital Library Research Laboratory, 1999)Digital libraries must reach out to users from all walks of life, serving information needs at all levels. To do this, they must attain high standards of usability over an extremely broad audience. This paper details the evolution of one important digital library component as it has grown in functionality and usefulness over several years of use by a live, unrestricted community. Central to its evolution have been user studies, analysis of use patterns, and formative usability evaluation. We extrapolate that all three components are necessary in the production of successful digital library systems.
- Open Archives: Distributed Services for Physicists and Graduate Students (OAD)Fox, Edward A.; Stamerjohanns, Heinrich; Hilf, Eberhard R.; Mittler, Elmar; Zia, Royce K. P. (2001)This 2001-2002 report evaluates the research done to improve distributed digital library services for two user communities: physicists and graduate students.
- Set OrthogonalitySuleman, Hussein; Zubair, Mohammad (2001-10-19)There is no way to determine all the sets that an identifier belongs to. This is typically referred to as set orthogonality because the protocol allows a harvester to find out which identifiers belong to a particular set but not vice versa. This is not as much of a problem for a flat space of archives, but organizations like NDLTD and NCSTRL have already started to create hierarchical catalogs based on OAI and existing set information is lost at the very first level. Also, the Internet2 Distributed Storage Initiative wants to work on replication of OAs - this will mean harvesting every set and dealing with duplicates. Can we do this in a way that is more efficient without adding to the complexity?
- Multiple Metadata / Best Metadata ReturnSuleman, Hussein; Nelson, Michael (2001-10-19)The OAI protocol currently supports a simple mapping of metadata names to metadata formats, whereby a metadata record can be requested for exactly one record in exactly one format in a single GetRecord request. In the case of ListRecords, all records within a set and/or date range may be requested but there is still the restriction of a single metadata format. This is usually sufficient for simple harvesting with the intention of transferring a stream of metadata records from the source archive to a service provider. However, in some cases, it may be desirable to obtain the most complete metadata format or a set of metadata formats for an identifier. In order to accomplish this it is currently necessary to submit multiple requests with different parameters and this is not most efficient.
- Open Archives: Distributed Services for Physicists and Graduate Students (OAD)Fox, Edward A.; Stamerjohanns, Heinrich; Hilf, Eberhard R.; Mittler, Elmar; Zia, Royce K. P. (2002)The objective of this project - "Open Archives: Distributed Services for Physicists and Graduate Students OAD" - is to improve the quality of resources and distributed digital library services, aimed at two communities: physicists and graduate students. The approach is to apply Open Archives Initiative (OAI) ideas and concepts to the physics community and the Networked Digital Library of Theses and Dissertations (NDLTD).
- Open Archives: Distributed Services for Physicists and Graduate Students (OAD)Fox, Edward A.; Stamerjohanns, Heinrich; Hilf, Eberhard R.; Mittler, Elmar; Zia, Royce K. P. (2003)This 2003 report evaluates the research done to improve distributed digital library services for two user communities: physicists and graduate students.
- Extending Retrieval with Stepping Stones and PathwaysFox, Edward A. (2003-08-01)This project researches an alternative interpretation of user queries and presentation of the results. Instead of returning a ranked list of documents, the result of a query is a connected network of chains of evidence. Each chain is made of a sequence of additional concepts (stepping stones). Each concept in the sequence is logically connected to the next and previous one, and the chains provide a rationale (a pathway) for the connection between the two original concepts. To increase the user's understanding of the chain, it is desirable that the stepping stones be justified by concrete documents, along with the connections (relationships) among those documents.
- High Performance Interoperable Digital Libraries in the Open Archives InitiativeFox, Edward A.; Sanchez, J. Alfredo; Garza-Salazar, David (2004-01-31)The scope of this project is high performance mechanisms for interoperable distributed digital repositories. We apply Open Archives Initiative ideas and concepts to the storage and retrieval of electronic theses and dissertations (ETDs), and work to make these more available to students by means of visualization tools.
- CTRnet: Project Proposal to NSFFox, Edward A.; Shoemaker, Donald J.; Kavanaugh, Andrea L.; Ramakrishnan, Naren (2009)Crises and tragedies are, regrettably, part of life; a recent sample, showing the small number of collections preserved at the Internet Archive, is shown in Table 1. While always difficult, recovery from tragic events may be increasingly facilitated and supported by information and communication technology (IC1). Individuals, groups, and communities are using ICT in innovative ways to learn from these events and recover more quickly and more effectively. During and after a crisis, individuals and communities face a confusing plethora of data and information, and strive to make sense by way of that data [114]. They seek to carry out their usual activities, but want to be informed by new insights. They work to help others, or to receive help, but the context and technologies involved in communication today (e.g., Internet, WWW, online communities, mobile devices) make it exceedingly difficult to integrate content, community, and services. Accordingly, individuals and communities respond by attempting to meet their needs with the tools they have, e.g., creating a Facebook group to quickly inform members who is OK, and other groups to share pictures, comments, and additional contributions.
- NSF Year 1 Report for CTRnet: Integrated Digital Library Support for Crisis, Tragedy, and RecoveryFox, Edward A.; Shoemaker, Donald J.; Sheetz, Steven D.; Kavanaugh, Andrea L.; Ramakrishnan, Naren (2010-07-08)The Crisis, Tragedy and Recovery network, or CTRnet, is a human and digital library network for providing a range of services relating to different kinds of tragic events. Through this digital library, we will collect and archive different types of CTR related information, and apply advanced information analysis methods to this domain. It is hoped that services provided through CTRnet can help communities, as they heal and recover from tragic events. We have taken several major steps towards our goal of building a digital library for CTR events. Different strategies for collecting comprehensive information surrounding various CTR events have been explored, using school shooting events as a testbed. Several GBs worth of school shootings related data has been collected using the web crawling tools and methodologies we developed. Several different methods for removing non-relevant pages (noise) from the crawled data have been explored. A focused crawler is being developed with the aim of providing users the ability to build high quality collections for CTR events focused on their interests. Use of social media for CTRnet related research is being explored. Software to integrate the popular social networking site Facebook with the CTRnet digital library has been prototyped, and is being developed further. Integration of the popular micro-blogging site Twitter with the CTRnet digital library is being explored.
- Why Students Use Social Networking Sites After Crisis SituationsSheetz, Steven D.; Fox, Edward A.; Fitzgerald, Andrew; Palmer, Sean; Shoemaker, Donald J.; Kavanaugh, Andrea L. (2011)Communities respond to tragedy by making virtuous use of social networking sites for a variety of purposes. We asked students to describe why they used a social networking site after the tragic shootings at Virginia Tech, then evaluated their responses using content analysis. Students went predominately to Facebook (99%). Most (59%) of the 426 students that responded went there because their friends were already there, and to find out if their friends were OK (28%) (and to let them know they were OK). Ideas related to relationships occurred more frequently in the responses than ideas related to the website's features. However, the ease of use of the website was mentioned often (22%). The results suggest this emergent phenomenon will recur.
- Social Media for Cities, Counties and CommunitiesKavanaugh, Andrea L.; Fox, Edward A.; Sheetz, Steven D.; Yang, Seungwon; Li, Lin Tzy; Whalen, Travis; Shoemaker, Donald J.; Natsev, Apostol; Xie, Lexing (Department of Computer Science, Virginia Polytechnic Institute & State University, 2011)Social media (i.e., Twitter, Facebook, Flickr, YouTube) and other tools and services with user- generated content have made a staggering amount of information (and misinformation) available. Some government officials seek to leverage these resources to improve services and communication with citizens, especially during crises and emergencies. Yet, the sheer volume of social data streams generates substantial noise that must be filtered. Potential exists to rapidly identify issues of concern for emergency management by detecting meaningful patterns or trends in the stream of messages and information flow. Similarly, monitoring these patterns and themes over time could provide officials with insights into the perceptions and mood of the community that cannot be collected through traditional methods (e.g., phone or mail surveys) due to their substantive costs, especially in light of reduced and shrinking budgets of governments at all levels. We conducted a pilot study in 2010 with government officials in Arlington, Virginia (and to a lesser extent representatives of groups from Alexandria and Fairfax, Virginia) with a view to contributing to a general understanding of the use of social media by government officials as well as community organizations, businesses and the public. We were especially interested in gaining greater insight into social media use in crisis situations (whether severe or fairly routine crises, such as traffic or weather disruptions).
- Microblogging in Crisis Situations: Mass Protests in Iran, Tunisia, EgyptKavanaugh, Andrea L.; Yang, Seungwon; Li, Lin Tzy; Sheetz, Steven D.; Fox, Edward A. (2011-05-01)In this paper we briefly examine the use of Twitter in Iran, Tunisia and Egypt during the mass political demonstrations and protests in June 2009, December 2010 and January 2011 respectively. We compare this usage with methods and findings from other studies on the use of Twitter in emergency situations, such as natural and man-made disasters. We draw on my own experiences and participant-observations as an eyewitness in Iran, and on Twitter data from Tunisia and Egypt. In these three cases, Twitter filled a unique technology and communication gap at least partially. We summarize suggested directions for future research with a view of placing this work in the larger context of social media use in conditions of crisis or social convergence.