Browsing by Author "Goncalves, Marcos A."
Now showing 1 - 17 of 17
Results Per Page
Sort Options
- 5SL: A Language for Declarative Specification and Generation of Digital LibrariesGoncalves, Marcos A.; Fox, Edward A. (2002-07-01)Digital Libraries (DLs) are among the most complex kinds of information systems, due in part to their intrinsic multi-disciplinary nature. Nowadays DLs are built within monolithic, tightly integrated, and generally inflexible systems- or by assembling disparate components together in an ad-hoc way, with resulting problems in interoperability and adaptability. More importantly, conceptual modeling, requirements analysis, and software engineering approaches are rarely supported, making it extremely difficult to tailor DL content and behavior to the interests, needs, and preferences of particular communities. In this paper, we address these problems. In particular, we present 5SL, a declarative language for specifying and generating domain-specific digital libraries. 5L is based on the 5S formal theory for digital libraries and enables high-level specification of DLs in five complementary dimensions, including: the kinds of multimedia information the DL supports (Stream Model); how that information is structured and organized (Structural Model); different logical and presentational properties and operations of DL components (Spatial Model); the behavior of the DL (Scenario Model); and the different societies of actors and managers of services that act together to carry out the DL behavior (Societal Model). The practical feasibility of the approach is demonstrated by the presentation of a 5SL digital library generator for the MARIAN digital library system.
- A Digital Library Framework for Biodiversity Information SystemsTorres, Ricardo da Silva; Medeiros, Claudia; Goncalves, Marcos A.; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2004)Biodiversity information systems (BISs) involve all kinds of heterogeneous data, which include ecological and geographical features. However, available information systems offer very limited support for managing such data in an integrated fashion. Furthermore, such systems do not fully support image content management (e.g., photos of landscapes or living organisms), a requirement of many BIS end-users. In order to meet their needs, these users - e.g., biologists, environmental experts - often have to alternate between distinct biodiversity and image information systems to combine information extracted from them. This cumbersome operational procedure is forced on users by lack of interoperability among these systems. This hampers the addition of new data sources, as well as cooperation among scientists. The approach provided in this paper to meet these issues is based on taking advantage of advances in Digital Library (DL) innovations to integrate networked collections of heterogeneous data. It focuses on creating the basis for a biodiversity information system under the digital library perspective, combining new techniques of content-based image retrieval and database query processing mechanisms. This approach solves the problem of system switching, and provides users with a flexible architecture from which to tailor a BIS to their needs. To illustrate the use of this architecture, it has been instantiated to support the creation of a BIS for fish species in a real application. The goal is to help researchers on ichthyology to identify fish specimen by using search retrieval techniques. Experimental results suggest that this new approach improves the effectiveness of the fish identification process, if compared to the tradition key-based method.
- ETANA-DL: A Digital Library for Integrated Handling of Heterogeneous Archaeological DataRavindranathan, Unni; Shen, Rao; Goncalves, Marcos A.; Fan, Weiguo; Fox, Edward A.; Flanagan, James (Department of Computer Science, Virginia Polytechnic Institute & State University, 2004)Archaeologists have to deal with vast quantities of information, generated both in the field and laboratory. That information is heterogeneous in nature, and different projects have their own systems to store and use it. This adds to the challenges regarding collaborative research between such projects as well as information retrieval for other more general purposes. This paper describes our approach towards creating ETANA-DL, a digital library (DL) to help manage these vast quantities of information and to provide various kinds of services. The 5S framework for modeling a DL gives us an edge in understanding this vast and complex information space, as well as in designing and prototyping a DL to satisfy information needs of archaeologists and other user communities.
- ETANA-DL: Managing Complex Information Applications - an Archaeology Digital LibraryRavindranathan, Unni; Shen, Rao; Goncalves, Marcos A.; Fan, Weiguo; Fox, Edward A.; Flanagan, James (Department of Computer Science, Virginia Polytechnic Institute & State University, 2004)Archaeological research results in the generation of large quantities of heterogeneous information managed by different projects using custom information systems. We will demonstrate a prototype Digital Library (DL) for integrating and managing archaeological data and providing services useful to various user communities. ETANA-DL is a model-based, componentized, extensible, archaeological DL that manages complex information sources using the client-server paradigm of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
- Extending the 5S Digital Library Framework: From a Minimal DL Towards a DL Reference ModelMurthy, Uma; Gorton, Douglas; Torres, Ricardo da Silva; Goncalves, Marcos A.; Fox, Edward A.; Delcambre, Lois M. L. (2007-06-23)In this paper, we describe ongoing research in three DL projects that build upon a common foundation: the 5S DL framework. In each project, we extend the 5S framework to provide specifications for a particular type of DL service and/or system - finally, moving towards a DL reference model. In the first project, we are working on formalizing content-based image retrieval services in a DL. In the second project, we are developing specifications for a superimposed information-supported DL (combining annotation, hypertext, and knowledge management technologies). In the third effort, we have used the 5S framework to generate a practical DL system based on the DSpace software.
- Extending the 5S Framework of Digital Libraries to support Complex Objects, Superimposed Information, and Content-Based Image Retrieval ServicesMurthy, Uma; Kozievitch, Nadia; Leidig, Jonathan; Torres, Ricardo da Silva; Yang, Seungwon; Goncalves, Marcos A.; Delcambre, Lois M. L.; Archer, David W.; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2010)Advanced services in digital libraries (DLs) have been developed and widely used to address the required capabilities of an assortment of systems as DLs expand into diverse application domains. These systems may require support for images (e.g., Content-Based Image Retrieval), Complex (information) Objects, and use of content at fine grain (e.g., Superimposed Information). Due to the lack of consensus on precise theoretical definitions for those services, implementation efforts often involve ad hoc development, leading to duplication and interoperability problems. This article presents a methodology to address those problems by extending a precisely specified minimal digital library (in the 5S framework) with formal definitions of aforementioned services. The theoretical extensions of digital library functionality presented here are reinforced with practical case studies as well as scenarios for the individual and integrative use of services to balance theory and practice. This methodology has implications that other advanced services can be continuously integrated into our current extended framework whenever they are identified. The theoretical definitions and case study we present may impact future development efforts and a wide range of digital library researchers, designers, and developers.
- Incremental, Semi-automatic, Mapping-Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case StudyRaghavan, Ananth; Vemuri, Naga Srinivas; Shen, Rao; Goncalves, Marcos A.; Fan, Weiguo; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2005)Automation is an important issue when integrating heterogeneous collections into archaeological digital libraries. We propose an incremental approach through intermediary- and mapping-based techniques. A visual schema mapping tool within the 5S framework allows semi-automatic mapping and in-cremental global schema enrichment. 5S also helped speed up development of a new multi-dimensional browsing service. Our approach helps integrate the Me-giddo excavation data into a growing union archaeological DL, ETANA-DL.
- Integration of Heterogeneous Digital Libraries with Semi-automatic Mapping and Browsing: From Formalization to Specification to VisualizationShen, Rao; Vemuri, Naga Srinivas; Raghavan, Ananth; Goncalves, Marcos A.; Rangarajan, Divya; Fan, Weiguo; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2005)In this paper, we formalize the digital library (DL) integration problem and propose an overall approach based on the 5S framework. We apply 5S to domain-specific (archaeological) DLs, illustrating our solutions for key problems in DL integration. We use ETANA-DL as a case study to describe the process of semi-automatically generating a union catalog and a unified browsing service in an archaeological DL. A visual schema mapping tool is developed for union catalog creation. A pilot user study aids tool evaluation. Our approach is further validated through application of a general browsing component to two integrated DLs.
- Intelligent Fusion of Structural and Citation-Based Evidence for Text ClassificationZhang, Baoping; Goncalves, Marcos A.; Fan, Weiguo; Chen, Yuxin; Fox, Edward A.; Calado, Pavel; Cristo, Marco (Department of Computer Science, Virginia Polytechnic Institute & State University, 2004)This paper investigates how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity, five derived from the citation structure of the collection, and three measures derived from the structural content, and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our empirical experiments using documents from the ACM digital library and the ACM classification scheme show that we can discover similarity functions that work better than any evidence in isolation and whose combined performance through a simple majority voting is comparable to that of Support Vector Machine classifiers.
- MARIAN: Flexible Interoperability for Federated Digital LibrariesGoncalves, Marcos A.; France, Robert K.; Fox, Edward A.; Hilf, Eberhard R.; Zimmermann, Kerstin; Severiens, Thomas (2001)Federated digital libraries are composed of distributed autonomous (heterogeneous) information services but provide users with a transparent, integrated view of collected information respecting different information sources' autonomy. In this paper we discuss a federated system for the Networked Digital Library of Theses and Dissertations (NDLTD), an international consortium of universities, libraries, and other supporting institutions focused on electronic theses and dissertations (ETDs). The NDLTD has so far allowed its members considerable autonomy, though agreements are developing on metadata standards and on support of the Open Archives initiative that eventually will promote greater homogeneity. At present, federation requires dealing flexibly with differences among systems, ontologies, and data formats. Our solution involves adapting MARIAN, an object oriented digital library retrieval system developed with support by NLM and NSF, to serve as mediation middleware for the federated NDLTD collection. Components of the solution include: 1) the use of several harvesting techniques; 2) an architecture based on object-oriented ontologies of search modules and metadata; 3) diversity within the harvested data joined to a single collection view for the user; and 4) an integrated framework for addressing such questions as data quality, information compression, and flexible search. The system can handle very large dynamic collections. An adaptable relationship between the collection view and harvested data facilitates adding new sites to the federation and adapting to changes in existing sites. MARIAN's modular architecture and powerful and flexible data model work together to build an effective integrated solution within a simple uniform framework. We present both the general design of the system and operational details of a preliminary federated collection involving several thousand ETDs in four different formats and two languages from USA and Europe.
- An OAI-based Digital Library Framework for Biodiversity Information SystemsTorres, Ricardo da Silva; Medeiros, Claudia; Goncalves, Marcos A.; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2004)Biodiversity information systems (BISs) involve all kinds of heterogeneous data, which include ecological and geographical features. However, available information systems offer very limited support for managing such data in an integrated fashion, and integration is often based on geographic coordinates alone. Furthermore, such systems do not fully support image content management (e.g., photos of landscapes or living organisms), a requirement of many BIS end-users. In order to meet their needs, these users - e.g., biologists, environmental experts - often have to alternate between distinct biodiversity and image information systems to combine information extracted from them. This cumbersome operational procedure is forced on users by lack of interoperability among these systems. This hampers the addition of new data sources, as well as cooperation among scientists. The approach provided in this paper to meet these issues is based on taking advantage of advances in Digital Library (DL) innovations to integrate networked collections of heterogeneous data. It focuses on creating the basis for a biodiversity information system under the digital library perspective, combining new techniques of content-based image retrieval and database query processing mechanisms. This approach solves the problem of system switching, and provides users with a flexible platform from which to tailor a BIS to their needs.
- Rapid Modeling, Prototyping, and Generation of Digital Libraries- a Theory-Based ApproachGoncalves, Marcos A.; Zhu, Qinwei; Kelapure, Rohit; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2003)Despite some development in the area of DL architectures and systems, there is still little support for the complete life cycle of DL development, including requirements gathering, conceptual modeling, rapid prototyping, and code generation and reuse. Even when partially supported, those activities are uncorrelated within the current systems, which can lead to inconsistencies and incompleteness. Moreover, the current few existing approaches are not supported by comprehensive and formal foundations and theories, which brings problems of interoperability and makes it extremely difficult to adapt and tailor systems to specific societal preferences and needs of the target community. In this paper, having the 5S formal theoretical framework as support, we present an architecture and a family of tools that allow rapid modeling, prototyping, and generation of digital libraries. 5S stands for Streams, Structures, Spaces, Scenarios, and Societies and is our formal theory for DLs. 5SL is a domain-specific, declarative language for DL conceptual modeling. 5SGraph is a visual modeling tool that helps designers to model a digital library without knowing the theoretical foundations and the syntactical details of 5SL. Furthermore, 5SGraph maintains semantic constraints specified by a 5S metamodel and enforces these constraints over the instance model to ensure semantic consistency and correctness. 5SGraph also enables component reuse to reduce the time and efforts of designers. 5SLGen is a DL generation tool that takes specifications in 5SL and a set of component pools and generates portions of a running DL system. The outputs of 5SLGen include user interface prototypes, in a generic UI markup language, for validation of services behavior and workflow representations of the running system, generated to support the desired scenarios.
- Requirements Gathering and Modeling of Domain-Specific Digital Libraries with the 5S Framework: An Archaeological Case Study with ETANAShen, Rao; Goncalves, Marcos A.; Fan, Weiguo; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2005)Requirements gathering and conceptual modeling are essential for the customization of digital libraries (DLs), to help attend the needs of target com-munities. In this paper, we show how to apply the 5S (Streams, Structures, Spaces, Scenarios, and Societies) formal framework to support both tasks. The intuitive nature of the framework allows for easy and systematic requirements analysis, while its formal nature ensures the precision and correctness required for semi-automatic DL generation. More specifically, we show how 5S can help us define a domain-specific DL metamodel in the field of archaeology. Finally, an archaeological DL case study (from the ETANA project) yields informal and formal descriptions of two DL models (instances of the metamodel).
- Schema Mapper: A Visualization Tool for DL IntegrationRaghavan, Ananth; Rangarajan, Divya; Shen, Rao; Goncalves, Marcos A.; Vemuri, Naga Srinivas; Fan, Weiguo; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2005)Schema mapping is a challenging problem. It has come to the fore in recent years; there are important applications like database schema integration and, more recently, digital library merging of heterogeneous data. Previous studies have approached the schema mapping process either from algorithmic or visualization perspectives, with few integrating both. With Schema Mapper we demonstrate a semi-automatic tool for schema integration that combines a novel visual interface with an algorithm-based recommendation engine. Schemas are visualized as hyperbolic trees (see Fig. 1), thus allowing more schema nodes to be displayed at one time. Matches to selections are recommended to the user, which makes the mapping operation easier and faster.
- Streams, Structures, Spaces, Scenarios, Societies (5S): A Formal Model for Digital LibrariesGoncalves, Marcos A.; Fox, Edward A.; Watson, Layne T.; Kipp, Neill A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2001)Digital libraries (DLs) are complex information systems and therefore demand formal foundations lest development efforts diverge and interoperability suffers. In this paper, we propose the fundamental abstractions of Streams, Structures, Spaces, Scenarios, and Societies (5S), which contribute to define digital libraries rigorously and usefully. Streams are sequences of abstract items used to describe static and dynamic content. Structures can be defined as labeled directed graphs, which impose organization. Spaces are sets of abstract items and operations on those sets that obey certain rules. Scenarios consist of sequences of events or actions that modify states of a computation in order to accomplish a functional requirement. Societies comprehend entities and the relationships between and among them. Together these abstractions relate and unify concepts, among others, of digital objects, metadata, collections, and services required to formalize and elucidate “digital libraries”. The applicability, versatility and unifying power of the theory is demonstrated through its use in three distinct applications: building and interpretation of a DL taxonomy, analysis of case studies of digital libraries, and utilization as a formal basis for a DL description language.
- Streams, Structures, Spaces, Scenarios, Societies (5S): A Formal Model for Digital LibrariesGoncalves, Marcos A.; Fox, Edward A.; Watson, Layne T.; Kipp, Neill A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2003)Digital libraries (DLs) are complex information systems and therefore demand formal foundations lest development efforts diverge and interoperability suffers. In this paper, we propose the fundamental abstractions of Streams, Structures, Spaces, Scenarios, and Societies (5S), which allow us to define digital libraries rigorously and usefully. Streams are sequences of arbitrary items used to describe both static and dynamic (e.g., video) content. Structures can be viewed as labeled directed graphs, which impose organization. Spaces are sets with operations on those sets that obey certain constraints. Scenarios consist of sequences of events or actions that modify states of a computation in order to accomplish a functional requirement. Societies are sets of entities and activities and the relationships between and among them. Together these abstractions provide a formal foundation to define, relate, and unify concepts - among others, of digital objects, metadata, collections, and services - required to formalize and elucidate "digital libraries". The applicability, versatility and unifying power of the 5S model are demonstrated through its use in three distinct applications: building and interpretation of a DL taxonomy, informal and formal analysis of case studies of digital libraries (NDLTD and OAI), and utilization as a formal basis for a DL description language.
- An XML Log Standard and Tool for Digital Library Logging AnalysisGoncalves, Marcos A.; Luo, Ming; Shen, Rao; Ali, Mir Farooq; Fox, Edward A. (2002-09-01)Log analysis can be a primary source of knowledge about how digital library patrons actually use DL systems and services and how systems behave while trying to support user information seeking activities. Log recording and analysis allow evaluation assessment, and open opportunities to improvements and enhanced new services. In this paper, we propose an XML-based digital library log format standard that captures a rich, detailed set of system and user behaviors supported by current digital library services. The format is implemented in a generic log component tool, which can be plugged into any digital library system. The focus of the work is on interoperability, reusability, and completeness. Specifications, implementation details, and examples of use within the MARIAN digital library system are described.