  • A Hybrid Model for Role-related User Classification on Twitter
    Li, Liuqing; Song, Ziqian; Zhang, Xuan; Fox, Edward A. (Virginia Tech, 2018-11-15)
    To aid a variety of research studies, we propose TWIROLE, a hybrid model for role-related user classification on Twitter, which detects male-related, female-related, and brand-related (i.e., organization or institution) users. TWIROLE leverages features from tweet contents, user profiles, and profile images, and then applies our hybrid model to identify a user’s role. To evaluate it, we used two existing large datasets about Twitter users, and conducted both intra- and inter-comparison experiments. TWIROLE outperforms existing methods and obtains more balanced results over the several roles. We also confirm that user names and profile images are good indicators for this task. Our research extends prior work that does not consider brand-related users, and is an aid to future evaluation efforts relative to investigations that rely upon self-labeled datasets.
  • Results of a digital library curriculum field test
    Oh, Sanghee; Yang, Seungwon; Pomerantz, Jeffrey P.; Wildemuth, Barbara M.; Fox, Edward A. (Springer, 2015-05-20)
    The DL Curriculum Development project was launched in 2006, responding to an urgent need for consensus on DL curriculum across the fields of computer science and information and library science. Over the course of several years, 13 modules of a digital libraries (DL) curriculum were developed and were ready for field testing. The modules were evaluated in DL courses in real classroom environments in 37 classes by 15 instructors and their students. Interviews with instructors and questionnaires completed by their students were used to collect evaluative feedback. Findings indicate that the modules have been well designed to educate students on important topics and issues in DLs, in general. Suggestions to improve the modules based on the interviews and questionnaires were discussed as well. After the field test, module development has been continued, not only for the DL community but also others associated with DLs, such as information retrieval, big data, and multimedia. Currently, 56 modules are readily available for use through the project website or the Wikiversity site.
  • Interdisciplinary Curriculum Development for Digital Library Education
    Yang, Seungwon; Fox, Edward A.; Wildemuth, Barbara M.; Pomerantz, Jeffrey P.; Oh, Sanghee (2006-11-01)
    The Virginia Tech (VT) Department of Computer Science (CS) and the University of North Carolina at Chapel Hill (UNC-CH) School of Information and Library Science (LIS) are developing curricular materials for digital library (DL) education, appropriate for the CS and LIS communities. Educational modules will be designed, based on input from the project advisory board, Computing Curriculum 2001, the 5S framework, and workshop discussions. These modules will be evaluated, first through expert inspection and, second, through field testing. We are identifying and refining module definitions and scopes, collecting related resources, developing a module template, and creating example modules. These will be presented at the conference. The developed curriculum should contribute to producing well-balanced digital librarians who will graduate from CS or LIS programs.
  • Ensemble PDP-8: Eight Principles for Distributed Portals
    Fox, Edward A.; Chen, Yinlin; Akbar, Monika; Shaffer, Clifford A.; Edwards, Stephen H.; Brusilovsky, Peter; Garcia, Daniel D.; Delcambre, Lois M. L.; Decker, Felicia; Archer, David W.; Furuta, Richard; Shipman, Frank M., III; Carpenter, B. Stephen, II; Cassel, Lillian N. (2010)
    Ensemble, the National Science Digital Library (NSDL) Pathways project for Computing, builds upon a diverse group of prior NSDL, DL-I, and other projects. Ensemble has shaped its activities according to principles related to design, development, implementation, and operation of distributed portals. Here we articulate 8 key principles for distributed portals (PDPs). While our focus is on education and pedagogy, we expect that our experiences will generalize to other digital library application domains. These principles inform, facilitate, and enhance the Ensemble R&D and production activities. They allow us to provide a broad range of services, from personalization to coordination across communities. The eight PDPs can be briefly summarized as: (1) Articulation across communities using ontologies. (2) Browsing tailored to collections. (3) Integration across interfaces and virtual environments. (4) Metadata interoperability and integration. (5) Social graph construction using logging and metrics. (6) Superimposed information and annotation integrated across distributed systems. (7) Streamlined user access with IDs. (8) Web 2.0 multiple social network system interconnection.
  • Improving Education and Understanding of NDLTD
    Yang, Seungwon; Oh, Sanghee; Pomerantz, Jeffrey P.; Wildemuth, Barbara M.; Fox, Edward A. (2007-06-01)
    To understand ETDs, what NDLTD is, how it works, and the benefits of NDLTD, it is necessary to educate those involved, such as students who will create and submit their ETDs, as well as the library staff members who will be participating in NDLTD and administering their local system. To help educators prepare digital library (DL) courses supportive of their goals, our DL curriculum group has been developing educational modules and conducting field analyses since January 2006. This paper is a follow-up to our previous study of the subject distribution of ACM DL papers, JCDL papers, and D-Lib Magazine articles. In this paper, we focus on the selected DL modules that might help scholars conduct their research and share their knowledge.
  • Two Approaches to Enhance the Education for ETDs: Developing Educational Modules and Migrating the ETD Guide into a Community Wiki
    Yang, Seungwon; Levy, Jean; Miller, Kevin; Pomerantz, Jeffrey P.; Oh, Sanghee; Wildemuth, Barbara M.; Fox, Edward A. (2008-05-19)
    Two efforts have been made by the Digital Library (DL) Curriculum Development Project Group ( to help the ETD community. Our first activity is the preparation of multiple educational modules, which may be combined to create DL courses. In a paper presented at ETD 2007, the group identified the modules that might be most useful for scholars' research endeavors (i.e., for ETD authors). Since then, five modules from the selected module list have been developed and a formal review by subject experts has been completed for two draft modules. In this paper, the project team will present the details of the five modules. They are: 3-b: Digitization; 4-b: Metadata; 6-b: Online information seeking behaviors and search strategies; 7-e: Web publishing (e.g., wiki, RSS, blogs); and 9-e: Intellectual property. The second portion of this paper describes the recent migration activity of the ETD Guide (, which was written by several authors, with support by UNESCO, into a local wiki server. The ETD Guide has been supporting scholars, who would like to know more about ETDs, and/or utilize NDLTD systems effectively. However, there were problems such as outdated information in some sections, and the lack of easy means to update the information in the Guide. To address those problems, a wiki-based version of ETD Guide has been created with updated information ( Our plan is to move it into so that it could be exposed to an even larger community. It will allow the ETD community to update information on the Guide as new technologies and approaches arise related to ETDs. It is our hope that the efforts described will help with the understanding of digital libraries and of ETDs, and will promote the use of NDLTD-related systems and services.
  • Supporting Document Triage via Annotation-Based Multi-Application Visualizations
    Bae, Soonil; Kim, DoHyoung; Meintanis, Konstantinos; Moore, Michael; Zacchi, Anna; Shipman, Frank M., III; Hsieh, Haowei; Marshall, Cathy (2010)
    For open-ended information tasks, users must sift through many potentially relevant documents, a practice we refer to as document triage. Normally, people perform triage using multiple applications in concert: a search engine interface presents lists of potentially relevant documents; a document reader displays their contents; and a third tool - a text editor or personal information management application is used to record notes and assessments. To support document triage, we have developed an extensible multi-application architecture that initially includes an information workspace and a document reader. An Interest Profile Manager infers users' interests from their interactions with the triage applications, coupled with the characteristics of the documents they are interacting with. The resulting interest profile is used to generate visualizations that direct users' attention to documents or parts of documents that match their inferred interests. The novelty of our approach lies in the aggregation of activity records across applications to generate fine-grained models of user interest.
  • A Framework for Building Open Digital Libraries
    Suleman, Hussein; Fox, Edward A. (Corporation for National Research Initiatives, 2001-12-01)
    Digital libraries (DLs) have traditionally been positioned at the intersection of library science, computer science, and networked information systems. The different underlying philosophies of these three fields has had an unsettling influence on the development of DLs. While library science is fairly mature, networked information systems are constantly evolving to keep pace with Internet innovation. DLs are thus expected to demonstrate the careful management of libraries while supporting standards that evolve at an astonishing pace. This architectural moving target is a predicament that all DLs face sooner or later in their lifecycle, and one that few manage to deal with effectively. To exacerbate this problem, there has been a general desire for systems to be interoperable at the levels of data exchange and service collaboration. Such interoperability requirements necessitated the development of standards such as the Dublin Core Metadata Element Set and the Open Archives Initiative's Protocol for Metadata Harvesting (OAI-PMH). These standards have achieved a degree of success in the DL community largely because of their generality and simplicity. Informed by those lessons, this project is an attempt to consistently extend known interoperability standards to form the basis of a framework of components for building extensible DLs.
  • Extending the 5S Digital Library Framework: From a Minimal DL Towards a DL Reference Model
    Murthy, Uma; Gorton, Douglas; Torres, Ricardo da Silva; Goncalves, Marcos A.; Fox, Edward A.; Delcambre, Lois M. L. (2007-06-23)
    In this paper, we describe ongoing research in three DL projects that build upon a common foundation: the 5S DL framework. In each project, we extend the 5S framework to provide specifications for a particular type of DL service and/or system - finally, moving towards a DL reference model. In the first project, we are working on formalizing content-based image retrieval services in a DL. In the second project, we are developing specifications for a superimposed information-supported DL (combining annotation, hypertext, and knowledge management technologies). In the third effort, we have used the 5S framework to generate a practical DL system based on the DSpace software.
  • Social Media Use by Government: From the Routine to the Critical
    Kavanaugh, Andrea L.; Fox, Edward A.; Sheetz, Steven D.; Yang, Seungwon; Li, Lin Tzy; Whalen, Travis; Shoemaker, Donald J.; Natsev, Paul; Xie, Lexing (2011-06-01)
    Social media (i.e., Twitter, Facebook, Flickr, YouTube) and other services with user-generated content have made a staggering amount of information (and misinformation) available. Government officials seek to leverage these resources to improve services and communication with citizens. Yet, the sheer volume of social data streams generates substantial noise that must be filtered. Nonetheless, potential exists to identify issues in real time, such that emergency management can monitor and respond to issues concerning public safety. By detecting meaningful patterns and trends in the stream of messages and information flow, events can be identified as spikes in activity, while meaning can be deciphered through changes in content. This paper presents findings from a pilot study we conducted between June and December 2010 with government officials in Arlington, Virginia (and the greater National Capitol Region around Washington, DC) with a view to understanding the use of social media by government officials as well as community organizations, businesses and the public. We are especially interested in understanding social media use in crisis situations (whether severe or fairly common, such as traffic or weather crises).
  • 5SL: A Language for Declarative Specification and Generation of Digital Libraries
    Goncalves, Marcos A.; Fox, Edward A. (2002-07-01)
    Digital Libraries (DLs) are among the most complex kinds of information systems, due in part to their intrinsic multi-disciplinary nature. Nowadays DLs are built within monolithic, tightly integrated, and generally inflexible systems- or by assembling disparate components together in an ad-hoc way, with resulting problems in interoperability and adaptability. More importantly, conceptual modeling, requirements analysis, and software engineering approaches are rarely supported, making it extremely difficult to tailor DL content and behavior to the interests, needs, and preferences of particular communities. In this paper, we address these problems. In particular, we present 5SL, a declarative language for specifying and generating domain-specific digital libraries. 5L is based on the 5S formal theory for digital libraries and enables high-level specification of DLs in five complementary dimensions, including: the kinds of multimedia information the DL supports (Stream Model); how that information is structured and organized (Structural Model); different logical and presentational properties and operations of DL components (Spatial Model); the behavior of the DL (Scenario Model); and the different societies of actors and managers of services that act together to carry out the DL behavior (Societal Model). The practical feasibility of the approach is demonstrated by the presentation of a 5SL digital library generator for the MARIAN digital library system.
  • Building Digital Libraries Made Easy: Toward Open Digital Libraries
    Fox, Edward A.; Suleman, Hussein; Luo, Ming (2002)
    Digital libraries (DLs) promote a sharing culture among those who contribute and those who use resources. This same approach works when building Open Digital Libraries (ODLs). Leveraging the intellectual and practical investment made in the Open Archives Initiative through an eXtended Protocol for Metadata Harvesting (XPMH), one can build lightweight protocols to tie together key components that together make up the core of a DL. DL developers in various settings have learned how to apply this framework in a few hours. The ODL approach has been effective with the Computer Science Teaching Center (, the Networked Digital Library of Theses and Dissertations (, and Hence, to support our Computing and Information Technology Interactive Digital Educational Library ( and to provide a generic capability for other parts of the US National Science, technology, engineering, and mathematics education Digital Library (, we are developing a "DL-in-a-box" toolkit. When lightweight protocols, pools of components, and open standard reference mod-els are combined carefully, as suggested in the OCKHAM discussions, both the DL user and developer communities can benefit from the principle of sharing.
  • MARIAN: Flexible Interoperability for Federated Digital Libraries
    Goncalves, Marcos A.; France, Robert K.; Fox, Edward A.; Hilf, Eberhard R.; Zimmermann, Kerstin; Severiens, Thomas (2001)
    Federated digital libraries are composed of distributed autonomous (heterogeneous) information services but provide users with a transparent, integrated view of collected information respecting different information sources' autonomy. In this paper we discuss a federated system for the Networked Digital Library of Theses and Dissertations (NDLTD), an international consortium of universities, libraries, and other supporting institutions focused on electronic theses and dissertations (ETDs). The NDLTD has so far allowed its members considerable autonomy, though agreements are developing on metadata standards and on support of the Open Archives initiative that eventually will promote greater homogeneity. At present, federation requires dealing flexibly with differences among systems, ontologies, and data formats. Our solution involves adapting MARIAN, an object oriented digital library retrieval system developed with support by NLM and NSF, to serve as mediation middleware for the federated NDLTD collection. Components of the solution include: 1) the use of several harvesting techniques; 2) an architecture based on object-oriented ontologies of search modules and metadata; 3) diversity within the harvested data joined to a single collection view for the user; and 4) an integrated framework for addressing such questions as data quality, information compression, and flexible search. The system can handle very large dynamic collections. An adaptable relationship between the collection view and harvested data facilitates adding new sites to the federation and adapting to changes in existing sites. MARIAN's modular architecture and powerful and flexible data model work together to build an effective integrated solution within a simple uniform framework. We present both the general design of the system and operational details of a preliminary federated collection involving several thousand ETDs in four different formats and two languages from USA and Europe.
  • An XML Log Standard and Tool for Digital Library Logging Analysis
    Goncalves, Marcos A.; Luo, Ming; Shen, Rao; Ali, Mir Farooq; Fox, Edward A. (2002-09-01)
    Log analysis can be a primary source of knowledge about how digital library patrons actually use DL systems and services and how systems behave while trying to support user information seeking activities. Log recording and analysis allow evaluation assessment, and open opportunities to improvements and enhanced new services. In this paper, we propose an XML-based digital library log format standard that captures a rich, detailed set of system and user behaviors supported by current digital library services. The format is implemented in a generic log component tool, which can be plugged into any digital library system. The focus of the work is on interoperability, reusability, and completeness. Specifications, implementation details, and examples of use within the MARIAN digital library system are described.
  • The Open Archives Initiative: Realizing Simple and Effective Digital Library Interoperability
    Suleman, Hussein; Fox, Edward A. (2001-03-01)
    The Open Archives Initiative (OAI) is dedicated to solving problems of digital library interoperability. Its focus has been on defining simple protocols, most recently for the exchange of metadata from archives. The OAI evolved out of a need to increase access to scholarly publications by supporting the creation of interoperable digital libraries. As a first step towards such interoperability, a metadata harvesting protocol was developed to support the streaming of metadata from one repository to another, ultimately to a provider of user services such as browsing, searching, or annotation. This article provides an overview of the mission, philosophy, and technical framework of the OAI.
  • The Core: Digital Library Education in Library and Information Science Programs
    Pomerantz, Jeffrey P.; Oh, Sanghee; Yang, Seungwon; Fox, Edward A.; Wildemuth, Barbara M. (Corporation for National Research Initiatives, 2006-11-01)
    This paper identifies the "state of the art" in digital library education in Library and Information Science programs, by identifying the readings that are assigned in digital library courses and the topics of these readings. The most frequently-assigned readings are identified at multiple units of analysis, as are the topics on which readings are most frequently assigned. While no core set of readings emerged, there was significant consensus on the authors to be included in digital library course reading assignments, as well as the topics to be covered. Implications for the range of assigned readings and topics for digital library education in library science education are discussed.
  • In Brief: Digital Libraries Curriculum Development
    Pomerantz, Jeffrey P.; Wildemuth, Barbara M.; Oh, Sanghee; Fox, Edward A.; Yang, Seungwon (Corporation for National Research Initiatives, 2006)
    Hundreds of millions of dollars have been invested in digital library (DL) research. Much of this research has investigated how DLs can aid education, but there has been no parallel investment in supporting teaching and learning about DL development and management. The Digital Libraries Curriculum Development project ( is an effort to overcome this shortcoming in DL education.
  • Digital Library Education in Computer Science Programs
    Pomerantz, Jeffrey P.; Oh, Sanghee; Wildemuth, Barbara M.; Yang, Seungwon; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2007)
    In an effort to identify the “state of the art” in digital library education in computer science (CS) programs, we analyzed CS courses on digital libraries and digital library-related topics. Fifteen courses that mention digital libraries in the title or short description were identified; of these, five are concerned with digital libraries as the primary topic of the course. The readings from these five courses were analyzed further, in terms of their authors and the journals in which they were published.