Browsing by Author "Torres, Ricardo da Silva"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
- Design and Evaluation of Techniques to Utilize Implicit Rating Data in Complex Information Systems.Kim, Seonho; Fox, Edward A.; Fan, Weiguo; North, Christopher L.; Tatar, Deborah Gail; Torres, Ricardo da Silva (Department of Computer Science, Virginia Polytechnic Institute & State University, 2007-05-01)Research in personalization, including recommender systems, focuses on applications such as in online shopping malls and simple information systems. These systems consider user profile and item information obtained from data explicitly entered by users - where it is possible to classify items involved and to make personalization based on a direct mapping from user or user group to item or item group. However, in complex, dynamic, and professional information systems, such as Digital Libraries, additional capabilities are needed to achieve personalization to support their distinctive features: large numbers of digital objects, dynamic updates, sparse rating data, biased rating data on specific items, and challenges in getting explicit rating data from users. In this report, we present techniques for collecting, storing, processing, and utilizing implicit rating data of Digital Libraries for analysis and decision support. We present our pilot study to find virtual user groups using implicit rating data. We demonstrate the effectiveness of implicit rating data for characterizing users and finding virtual user communities, through statistical hypothesis testing. Further, we describe a visual data mining tool named VUDM (Visual User model Data Mining tool) that utilizes implicit rating data. We provide the results of formative evaluation of VUDM and discuss the problems raised and plans for further studies.
- Digital Libraries with Superimposed Information: Supporting Scholarly Tasks that Involve Fine Grain InformationMurthy, Uma (Virginia Tech, 2011-01-28)Many scholarly tasks involve working with contextualized fine-grain information, such as a music professor creating a multimedia lecture on a musical style, while bringing together several snippets of compositions of that style. We refer to such contextualized parts of a larger unit of information (or whole documents), as subdocuments. Current approaches to work with subdocuments involve a mix of paper-based and digital techniques. With the increase in the volume and in the heterogeneity of information sources, the management, organization, access, retrieval, as well as reuse of subdocuments becomes challenging, leading to inefficient and ineffective task execution. A digital library (DL) facilitates management, access, retrieval, and use of collections of data and metadata through services. However, most DLs do not provide infrastructure or services to support working with subdocuments. Superimposed information (SI) refers to new information that is created to reference subdocuments in existing information resources. We combine this idea of SI with traditional DL services, to define and develop a DL with SI (an SI-DL). Our research questions are centered around one main question: how can we extend the notion of a DL to include SI, in order to support scholarly tasks that involve working with subdocuments? We pursued this question from a theoretical as well as a practical/user perspective. From a theoretical perspective, we developed a formal metamodel that precisely defines the components of an SI-DL, building upon related work in DLs, SI, annotations, and hypertext. From the practical/user perspective, we developed prototype superimposed applications and conducted user studies to explore the use of SI in scholarly tasks. We developed SuperIDR, a prototype SI-DL, which enables users to mark up subimages, annotate them, and retrieve information in multiple ways, including browsing, and text- and content-based image retrieval. We explored the use of subimages and evaluated the use of SuperIDR in fish species identification, a scholarly task that involves working with subimages. Findings from the user studies and other work in our research lead to theory- and experiment-based enhancements that can guide design of digital libraries with superimposed information.
- A Digital Library Framework for Biodiversity Information SystemsTorres, Ricardo da Silva; Medeiros, Claudia; Goncalves, Marcos A.; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2004)Biodiversity information systems (BISs) involve all kinds of heterogeneous data, which include ecological and geographical features. However, available information systems offer very limited support for managing such data in an integrated fashion. Furthermore, such systems do not fully support image content management (e.g., photos of landscapes or living organisms), a requirement of many BIS end-users. In order to meet their needs, these users - e.g., biologists, environmental experts - often have to alternate between distinct biodiversity and image information systems to combine information extracted from them. This cumbersome operational procedure is forced on users by lack of interoperability among these systems. This hampers the addition of new data sources, as well as cooperation among scientists. The approach provided in this paper to meet these issues is based on taking advantage of advances in Digital Library (DL) innovations to integrate networked collections of heterogeneous data. It focuses on creating the basis for a biodiversity information system under the digital library perspective, combining new techniques of content-based image retrieval and database query processing mechanisms. This approach solves the problem of system switching, and provides users with a flexible architecture from which to tailor a BIS to their needs. To illustrate the use of this architecture, it has been instantiated to support the creation of a BIS for fish species in a real application. The goal is to help researchers on ichthyology to identify fish specimen by using search retrieval techniques. Experimental results suggest that this new approach improves the effectiveness of the fish identification process, if compared to the tradition key-based method.
- Extending the 5S Digital Library Framework: From a Minimal DL Towards a DL Reference ModelMurthy, Uma; Gorton, Douglas; Torres, Ricardo da Silva; Goncalves, Marcos A.; Fox, Edward A.; Delcambre, Lois M. L. (2007-06-23)In this paper, we describe ongoing research in three DL projects that build upon a common foundation: the 5S DL framework. In each project, we extend the 5S framework to provide specifications for a particular type of DL service and/or system - finally, moving towards a DL reference model. In the first project, we are working on formalizing content-based image retrieval services in a DL. In the second project, we are developing specifications for a superimposed information-supported DL (combining annotation, hypertext, and knowledge management technologies). In the third effort, we have used the 5S framework to generate a practical DL system based on the DSpace software.
- Extending the 5S Framework of Digital Libraries to support Complex Objects, Superimposed Information, and Content-Based Image Retrieval ServicesMurthy, Uma; Kozievitch, Nadia; Leidig, Jonathan; Torres, Ricardo da Silva; Yang, Seungwon; Goncalves, Marcos A.; Delcambre, Lois M. L.; Archer, David W.; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2010)Advanced services in digital libraries (DLs) have been developed and widely used to address the required capabilities of an assortment of systems as DLs expand into diverse application domains. These systems may require support for images (e.g., Content-Based Image Retrieval), Complex (information) Objects, and use of content at fine grain (e.g., Superimposed Information). Due to the lack of consensus on precise theoretical definitions for those services, implementation efforts often involve ad hoc development, leading to duplication and interoperability problems. This article presents a methodology to address those problems by extending a precisely specified minimal digital library (in the 5S framework) with formal definitions of aforementioned services. The theoretical extensions of digital library functionality presented here are reinforced with practical case studies as well as scenarios for the individual and integrative use of services to balance theory and practice. This methodology has implications that other advanced services can be continuously integrated into our current extended framework whenever they are identified. The theoretical definitions and case study we present may impact future development efforts and a wide range of digital library researchers, designers, and developers.
- A High-quality Digital Library Supporting Computing Education: The Ensemble ApproachChen, Yinlin (Virginia Tech, 2017-08-28)Educational Digital Libraries (DLs) are complex information systems which are designed to support individuals' information needs and information seeking behavior. To have a broad impact on the communities in education and to serve for a long period, DLs need to structure and organize the resources in a way that facilitates the dissemination and the reuse of resources. Such a digital library should meet defined quality dimensions in the 5S (Societies, Scenarios, Spaces, Structures, Streams) framework - including completeness, consistency, efficiency, extensibility, and reliability - to ensure that a good quality DL is built. In this research, we addressed both external and internal quality aspects of DLs. For internal qualities, we focused on completeness and consistency of the collection, catalog, and repository. We developed an application pipeline to acquire user-generated computing-related resources from YouTube and SlideShare for an educational DL. We applied machine learning techniques to transfer what we learned from the ACM Digital Library dataset. We built classifiers to catalog resources according to the ACM Computing Classification System from the two new domains that were evaluated using Amazon Mechanical Turk. For external qualities, we focused on efficiency, scalability, and reliability in DL services. We proposed cloud-based designs and applications to ensure and improve these qualities in DL services using cloud computing. The experimental results show that our proposed methods are promising for enhancing and enriching an educational digital library. This work received support from ACM, as well as the National Science Foundation under Grant Numbers DUE-0836940, DUE-0937863, and DUE-0840719, and IMLS LG-71-16-0037-16.
- A Novel Hybrid Focused Crawling Algorithm to Build Domain-Specific CollectionsChen, Yuxin (Virginia Tech, 2007-02-05)The Web, containing a large amount of useful information and resources, is expanding rapidly. Collecting domain-specific documents/information from the Web is one of the most important methods to build digital libraries for the scientific community. Focused Crawlers can selectively retrieve Web documents relevant to a specific domain to build collections for domain-specific search engines or digital libraries. Traditional focused crawlers normally adopting the simple Vector Space Model and local Web search algorithms typically only find relevant Web pages with low precision. Recall also often is low, since they explore a limited sub-graph of the Web that surrounds the starting URL set, and will ignore relevant pages outside this sub-graph. In this work, we investigated how to apply an inductive machine learning algorithm and meta-search technique, to the traditional focused crawling process, to overcome the above mentioned problems and to improve performance. We proposed a novel hybrid focused crawling framework based on Genetic Programming (GP) and meta-search. We showed that our novel hybrid framework can be applied to traditional focused crawlers to accurately find more relevant Web documents for the use of digital libraries and domain-specific search engines. The framework is validated through experiments performed on test documents from the Open Directory Project. Our studies have shown that improvement can be achieved relative to the traditional focused crawler if genetic programming and meta-search methods are introduced into the focused crawling process.
- An OAI-based Digital Library Framework for Biodiversity Information SystemsTorres, Ricardo da Silva; Medeiros, Claudia; Goncalves, Marcos A.; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2004)Biodiversity information systems (BISs) involve all kinds of heterogeneous data, which include ecological and geographical features. However, available information systems offer very limited support for managing such data in an integrated fashion, and integration is often based on geographic coordinates alone. Furthermore, such systems do not fully support image content management (e.g., photos of landscapes or living organisms), a requirement of many BIS end-users. In order to meet their needs, these users - e.g., biologists, environmental experts - often have to alternate between distinct biodiversity and image information systems to combine information extracted from them. This cumbersome operational procedure is forced on users by lack of interoperability among these systems. This hampers the addition of new data sources, as well as cooperation among scientists. The approach provided in this paper to meet these issues is based on taking advantage of advances in Digital Library (DL) innovations to integrate networked collections of heterogeneous data. It focuses on creating the basis for a biodiversity information system under the digital library perspective, combining new techniques of content-based image retrieval and database query processing mechanisms. This approach solves the problem of system switching, and provides users with a flexible platform from which to tailor a BIS to their needs.
- Use of Subimages in Fish Species Identification: A Qualitative StudyMurthy, Uma; Li, Lin Tzy; Hallerman, Eric M.; Fox, Edward A.; Pérez-Quiñones, Manuel A.; Delcambre, Lois M. L.; Torres, Ricardo da Silva (Department of Computer Science, Virginia Polytechnic Institute & State University, 2011-03-01)Many scholarly tasks involve working with subdocuments, or contextualized fine-grain information, i.e., with information that is part of some larger unit. A digital library (DL) facil- itates management, access, retrieval, and use of collections of data and metadata through services. However, most DLs do not provide infrastructure or services to support working with subdocuments. Superimposed information (SI) refers to new information that is created to reference subdocu- ments in existing information resources. We combine this idea of SI with traditional DL services, to define and develop a DL with SI (SI-DL). We explored the use of subimages and evaluated the use of a prototype SI-DL (SuperIDR) in fish species identification, a scholarly task that involves work- ing with subimages. The contexts and strategies of working with subimages in SuperIDR suggest new and enhanced sup- port (SI-DL services) for scholarly tasks that involve working with subimages, including new ways of querying and search- ing for subimages and associated information. The main contribution of our work are the insights gained from these findings of use of subimages and of SuperIDR (a prototype SI-DL), which lead to recommendations for the design of digital libraries with superimposed information.
- Visualizing Users, User Communities, and Usage Trends in Complex Information Systems Using Implicit Rating DataKim, Seonho (Virginia Tech, 2008-04-14)Research on personalization, including recommender systems, focuses on applications such as in online shopping malls and simple information systems. These systems consider user profile and item information obtained from data explicitly entered by users. There it is possible to classify items involved and to personalize based on a direct mapping from user or user group to item or item group. However, in complex, dynamic, and professional information systems, such as digital libraries, additional capabilities are needed to achieve personalization to support their distinctive features: large numbers of digital objects, dynamic updates, sparse rating data, biased rating data on specific items, and challenges in getting explicit rating data from users. For this reason, more research on implicit rating data is recommended, because it is easy to obtain, suffers less from terminology issues, is more informative, and contains more user-centered information. In previous reports on my doctoral work, I discussed collecting, storing, processing, and utilizing implicit rating data of digital libraries for analysis and decision support. This dissertation presents a visualization tool, VUDM (Visual User-model Data Mining tool), utilizing implicit rating data, to demonstrate the effectiveness of implicit rating data in characterizing users, user communities, and usage trends of digital libraries. The results of user studies, performed both with typical end-users and with library experts, to test the usefulness of VUDM, support that implicit rating data is useful and can be utilized for digital library analysis software, so that both end users and experts can benefit.