Reports, Digital Library Research Laboratory
Permanent URI for this collection
Browse
Browsing Reports, Digital Library Research Laboratory by Author "Fox, Edward A."
Now showing 1 - 18 of 18
Results Per Page
Sort Options
- Building the CODER Lexicon: The Collins English Dictionary and its Adverb DefinitionsFox, Edward A.; Wohlwend, Robert C.; Sheldon, Phyllis R.; Chen, Qi-Fan; France, Robert K. (1986-10-01)The CODER (COmposite Document Expert/extended/effective Retrieval) project is an investigation of the applicability of artificial intelligence techniques to the information retrieval task of analyzing, storing, and retrieving heterogeneous collections of "composite documents. "In order to support some of the processing desired, and to allow experimentation in information retrieval and natural language processing, a lexicon was constructed from the machine readable Collins Dictionary of the English Language. After giving background, motivation, and a survey of related work, the Collins lexicon is discussed. Following is a description of the conversion process, the format of the resulting Prolog database, and characteristics of the dictionary and relations. To illustrate what is present and to explain how it relates to the files produced from Webster's Seventh New Collegiate Dictionary, a number of comparative charts are given. Finally, a grammar for adverb definitions is presented, together with a description of defining formula that usually indicate the type of the adverb. Ultimately it is hoped that definitions for adverbs and other words will be parsed so that the relational lexicon being constructed will include many additional relationships and other knowledge about words and their usage.
- CTRnet Final ReportFox, Edward A.; Shoemaker, Donald J.; Sheetz, Steven D.; Kavanaugh, Andrea L.; Ramakrishnan, Naren (2013-08-26)The CTRnet project team has been developing a digital library including many webpage archives and tweet archives related to disasters, in collaboration with the Internet Archive. The goals of the CTRnet project are to provide such archived data sets for analysis, including by researchers who are seeking deep insights about those events, and to support a range of services and infrastructure regarding those tragic events for the various stakeholders and the general public, allowing them to study and learn.
- CTRnet: Project Proposal to NSFFox, Edward A.; Shoemaker, Donald J.; Kavanaugh, Andrea L.; Ramakrishnan, Naren (2009)Crises and tragedies are, regrettably, part of life; a recent sample, showing the small number of collections preserved at the Internet Archive, is shown in Table 1. While always difficult, recovery from tragic events may be increasingly facilitated and supported by information and communication technology (IC1). Individuals, groups, and communities are using ICT in innovative ways to learn from these events and recover more quickly and more effectively. During and after a crisis, individuals and communities face a confusing plethora of data and information, and strive to make sense by way of that data [114]. They seek to carry out their usual activities, but want to be informed by new insights. They work to help others, or to receive help, but the context and technologies involved in communication today (e.g., Internet, WWW, online communities, mobile devices) make it exceedingly difficult to integrate content, community, and services. Accordingly, individuals and communities respond by attempting to meet their needs with the tools they have, e.g., creating a Facebook group to quickly inform members who is OK, and other groups to share pictures, comments, and additional contributions.
- Extending Retrieval with Stepping Stones and PathwaysFox, Edward A. (2003-08-01)This project researches an alternative interpretation of user queries and presentation of the results. Instead of returning a ranked list of documents, the result of a query is a connected network of chains of evidence. Each chain is made of a sequence of additional concepts (stepping stones). Each concept in the sequence is logically connected to the next and previous one, and the chains provide a rationale (a pathway) for the connection between the two original concepts. To increase the user's understanding of the chain, it is desirable that the stepping stones be justified by concrete documents, along with the connections (relationships) among those documents.
- High Performance Interoperable Digital Libraries in the Open Archives InitiativeFox, Edward A.; Sanchez, J. Alfredo; Garza-Salazar, David (2004-01-31)The scope of this project is high performance mechanisms for interoperable distributed digital repositories. We apply Open Archives Initiative ideas and concepts to the storage and retrieval of electronic theses and dissertations (ETDs), and work to make these more available to students by means of visualization tools.
- Indexing Large Collections of Small Text Records for Ranked RetrievalFrance, Robert K.; Fox, Edward A. (1993)The MARIAN online public access catalog system at Virginia Tech has been developed to apply advanced information retrieval methods and object-oriented technology to the needs of library patrons. We give a description of our data model, design, processing, data representations, and retrieval operation. By identifying objects of interest during the indexing process, storing them according to our "information graph" model, and applying weighting schemes that seem appropriate for this large collection of small text records, we hope to better serve user needs. Since every text word is important in this domain, we employ opportunistic matching algorithms and a mix of data structures to support searching, that will give good performance for a large campus community, even though MARIAN runs on a distributed collection of small workstations. An initial small experiment indicates that our new ad hoc weighting scheme is more effective than a more standard approach.
- Integrated Digital Event Archiving and Library (IDEAL): Preview of Award 1319578 - Annual Project ReportFox, Edward A.; Hanna, Kristine; Kavanaugh, Andrea L.; Sheetz, Steven D.; Shoemaker, Donald J. (2014-07-09)The goals of this project are to ingest tweets and Web-based content from social media and the general Web, including news and governmental information. In addition to archiving materials found, the project team will build an information system that includes related metadata and knowledge bases, consistent with the 5S (Societies, Scenarios, Spaces, Structures, Streams) framework, along with results from our intelligent focused crawler, to support comprehensive access to event related content. With the support of key partners, the IDEAL team will undertake important research, education, and dissemination efforts, to achieve three complementary objectives: 1. Collecting: The project team will spot, identify, and make sense of interesting events. We also will accept specific or general requests about types of events. Given resource and sampling constraints, we will integrate methods to identify appropriate URLs as seeds, and specify when to start crawling and when to stop, with regard to each event or sub-event. We will integrate focused crawling and filtering approaches in order to ingest content and generate new collections, with high precision and recall. 2. Archiving & Accessing: Permanent archiving, and access to those archives, will be ensured by our partner, Internet Archive (IA). Immediate access to ingested content will be facilitated through big data software built on top of our new Hadoop cluster. 3. Analyzing & Visualizing: We will provide a wide range of integrated services beyond the usual (faceted) browsing and searching, including: classification, clustering, summarization, text mining, theme and topic identification, and visualization.
- MARIAN DesignFrance, Robert K.; Cline, Ben E.; Fox, Edward A. (1995-02-14)MARIAN (Multiple Access Retrieval of library Information with ANotations) is an online library catalog information system. Intended for library end-users rather than catalogers, it provides controlled search by author, subject entry, and imprint; keyword search by title, subject, and other MARC text fields; feedback, locating the closest books to a relevant book or books; and user annotations of books.
- Microblogging in Crisis Situations: Mass Protests in Iran, Tunisia, EgyptKavanaugh, Andrea L.; Yang, Seungwon; Li, Lin Tzy; Sheetz, Steven D.; Fox, Edward A. (2011-05-01)In this paper we briefly examine the use of Twitter in Iran, Tunisia and Egypt during the mass political demonstrations and protests in June 2009, December 2010 and January 2011 respectively. We compare this usage with methods and findings from other studies on the use of Twitter in emergency situations, such as natural and man-made disasters. We draw on my own experiences and participant-observations as an eyewitness in Iran, and on Twitter data from Tunisia and Egypt. In these three cases, Twitter filled a unique technology and communication gap at least partially. We summarize suggested directions for future research with a view of placing this work in the larger context of social media use in conditions of crisis or social convergence.
- NSF 2nd Year Report: CTRnet: Integrated Digital Library Support for Crisis, Tragedy, and RecoveryFox, Edward A.; Shoemaker, Donald J.; Sheetz, Steven D.; Kavanaugh, Andrea L.; Ramakrishnan, Naren (2011-07-01)One of the important parts of this project is to collect and archive as much information as possible about various events that are related to crises, tragedies, and recovery (CTR). In order to do long-term archiving of information, we have worked with the Internet Archive (IA), a non-profit organization, whose goal is to archive the Internet. IA provides access to web crawlers that can be used to selectively crawl and archive webpages. In disaster situations, it is well known that people use micro-blogging sites such as Twitter to reach their family and friends especially when their cell phones are not working due to high volume of traffic on the cell phone network. For this reason, tweet posts sometimes report CTR events faster than the mainstream news media. Those tweets often contain more detailed information, too, reported by the affected people on the site. We have been archiving tweets (i.e., posts from Twitter.com) for both man-made and natural disaster events. Collected tweets can be exported in various formats including XSL, JSON, and HTML -- to be analyzed later using software tools.
- NSF 3rd Year Report: CTRnet: Integrated Digital Library Support for Crisis, Tragedy, and RecoveryFox, Edward A.; Shoemaker, Donald J.; Sheetz, Steven D.; Kavanaugh, Andrea L.; Ramakrishnan, Naren (2012-07-01)The Crisis, Tragedy and Recovery (CTR) network, or CTRnet, is a human and digital library network for providing a range of services relating to different kinds of tragic events, including broad collaborative studies related to Egypt, Tunisia, Mexico, and Arlington, Virginia. Through this digital library, we collect and archive different types of CTR related information, and apply advanced information analysis methods to this domain. It is hoped that services provided through CTRnet can help communities, as they heal and recover from tragic events. We have taken several major steps towards our goal of building a digital library for CTR events. Different strategies for collecting comprehensive information surrounding various CTR events have been explored, initially using school shooting events as a testbed. Many GBs worth of related data has been collected using the web crawling tools and methodologies we developed. Several different methods for removing non-relevant pages (noise) from the crawled data have been explored. A focused crawler is being developed with the aim of providing users the ability to build high quality collections for CTR events focused on their interests. Use of social media for CTRnet related research is being explored. Software to integrate the popular social networking site Facebook with the CTRnet digital library has been prototyped, and is being developed further. Integration of the popular micro-blogging site Twitter with the CTRnet digital library has proceeded well, and is being further automated, becoming a key part of our methodology.
- NSF Year 1 Report for CTRnet: Integrated Digital Library Support for Crisis, Tragedy, and RecoveryFox, Edward A.; Shoemaker, Donald J.; Sheetz, Steven D.; Kavanaugh, Andrea L.; Ramakrishnan, Naren (2010-07-08)The Crisis, Tragedy and Recovery network, or CTRnet, is a human and digital library network for providing a range of services relating to different kinds of tragic events. Through this digital library, we will collect and archive different types of CTR related information, and apply advanced information analysis methods to this domain. It is hoped that services provided through CTRnet can help communities, as they heal and recover from tragic events. We have taken several major steps towards our goal of building a digital library for CTR events. Different strategies for collecting comprehensive information surrounding various CTR events have been explored, using school shooting events as a testbed. Several GBs worth of school shootings related data has been collected using the web crawling tools and methodologies we developed. Several different methods for removing non-relevant pages (noise) from the crawled data have been explored. A focused crawler is being developed with the aim of providing users the ability to build high quality collections for CTR events focused on their interests. Use of social media for CTRnet related research is being explored. Software to integrate the popular social networking site Facebook with the CTRnet digital library has been prototyped, and is being developed further. Integration of the popular micro-blogging site Twitter with the CTRnet digital library is being explored.
- Open Archives: Distributed Services for Physicists and Graduate Students (OAD)Fox, Edward A.; Stamerjohanns, Heinrich; Hilf, Eberhard R.; Mittler, Elmar; Zia, Royce K. P. (2001)This 2001-2002 report evaluates the research done to improve distributed digital library services for two user communities: physicists and graduate students.
- Open Archives: Distributed Services for Physicists and Graduate Students (OAD)Fox, Edward A.; Stamerjohanns, Heinrich; Hilf, Eberhard R.; Mittler, Elmar; Zia, Royce K. P. (2002)The objective of this project - "Open Archives: Distributed Services for Physicists and Graduate Students OAD" - is to improve the quality of resources and distributed digital library services, aimed at two communities: physicists and graduate students. The approach is to apply Open Archives Initiative (OAI) ideas and concepts to the physics community and the Networked Digital Library of Theses and Dissertations (NDLTD).
- Open Archives: Distributed Services for Physicists and Graduate Students (OAD)Fox, Edward A.; Stamerjohanns, Heinrich; Hilf, Eberhard R.; Mittler, Elmar; Zia, Royce K. P. (2003)This 2003 report evaluates the research done to improve distributed digital library services for two user communities: physicists and graduate students.
- Social Media for Cities, Counties and CommunitiesKavanaugh, Andrea L.; Fox, Edward A.; Sheetz, Steven D.; Yang, Seungwon; Li, Lin Tzy; Whalen, Travis; Shoemaker, Donald J.; Natsev, Apostol; Xie, Lexing (Department of Computer Science, Virginia Polytechnic Institute & State University, 2011)Social media (i.e., Twitter, Facebook, Flickr, YouTube) and other tools and services with user- generated content have made a staggering amount of information (and misinformation) available. Some government officials seek to leverage these resources to improve services and communication with citizens, especially during crises and emergencies. Yet, the sheer volume of social data streams generates substantial noise that must be filtered. Potential exists to rapidly identify issues of concern for emergency management by detecting meaningful patterns or trends in the stream of messages and information flow. Similarly, monitoring these patterns and themes over time could provide officials with insights into the perceptions and mood of the community that cannot be collected through traditional methods (e.g., phone or mail surveys) due to their substantive costs, especially in light of reduced and shrinking budgets of governments at all levels. We conducted a pilot study in 2010 with government officials in Arlington, Virginia (and to a lesser extent representatives of groups from Alexandria and Fairfax, Virginia) with a view to contributing to a general understanding of the use of social media by government officials as well as community organizations, businesses and the public. We were especially interested in gaining greater insight into social media use in crisis situations (whether severe or fairly routine crises, such as traffic or weather disruptions).
- Use and Usability in a Digital Library Search SystemFrance, Robert K.; Nowell, Lucy Terry; Fox, Edward A.; Saad, Rani A.; Zhao, Jianxin (Virginia Tech Digital Library Research Laboratory, 1999)Digital libraries must reach out to users from all walks of life, serving information needs at all levels. To do this, they must attain high standards of usability over an extremely broad audience. This paper details the evolution of one important digital library component as it has grown in functionality and usefulness over several years of use by a live, unrestricted community. Central to its evolution have been user studies, analysis of use patterns, and formative usability evaluation. We extrapolate that all three components are necessary in the production of successful digital library systems.
- Why Students Use Social Networking Sites After Crisis SituationsSheetz, Steven D.; Fox, Edward A.; Fitzgerald, Andrew; Palmer, Sean; Shoemaker, Donald J.; Kavanaugh, Andrea L. (2011)Communities respond to tragedy by making virtuous use of social networking sites for a variety of purposes. We asked students to describe why they used a social networking site after the tragic shootings at Virginia Tech, then evaluated their responses using content analysis. Students went predominately to Facebook (99%). Most (59%) of the 426 students that responded went there because their friends were already there, and to find out if their friends were OK (28%) (and to let them know they were OK). Ideas related to relationships occurred more frequently in the responses than ideas related to the website's features. However, the ease of use of the website was mentioned often (22%). The results suggest this emergent phenomenon will recur.