Computer Science Technical Reports
Permanent URI for this collection
The Department of Computer Science collection of technical
reports began in 1973. Please use the subject headings listed below for all submissions.
Subject Headings:
- Algorithms
- Big Data
- Bioinformatics
- Computational Biology
- Computational Science and Engineering
- Computer Graphics/Animation
- Computer Science Education
- Computer Systems
- Cyberarts
- Cybersecurity
- Data and Text Mining
- Digital Education
- Digital Libraries
- Discrete Event Simulation
- High Performance Computing
- Human Computer Interaction
- Information Retrieval
- Machine Learning
- Mathematical Programming
- Mathematical Software
- Modeling and Simulation
- Networking
- Numerical Analysis
- Parallel and Distributed Computing
- Problem Solving Environments
- Software Engineering
- Theoretical Computer Science
- Virtual/Augmented Reality
- Visualization
Browse
Browsing Computer Science Technical Reports by Subject "Artificial intelligence"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
- Algorithms for StorytellingKumar, Deept; Ramakrishnan, Naren; Helm, Richard F.; Potts, Malcolm (Department of Computer Science, Virginia Polytechnic Institute & State University, 2006)We formulate a new data mining problem called "storytelling" as a generalization of redescription mining. In traditional redescription mining, we are given a set of objects and a collection of subsets defined over these objects. The goal is to view the set system as a vocabulary and identify two expressions in this vocabulary that induce the same set of objects. Storytelling, on the other hand, aims to explicitly relate object sets that are disjoint (and hence, maximally dissimilar) by finding a chain of (approximate) redescriptions between the sets. This problem finds applications in bioinformatics, for instance, where the biologist is trying to relate a set of genes expressed in one experiment to another set, implicated in a different pathway. We outline an efficient storytelling implementation that embeds the CARTwheels redescription mining algorithm in an A* search procedure, using the former to supply next move operators on search branches to the latter. This approach is practical and effective for mining large datasets and, at the same time, exploits the structure of partitions imposed by the given vocabulary. Three application case studies are presented: a study of word overlaps in large English dictionaries, exploring connections between genesets in a bioinformatics dataset, and relating publications in the PubMed index of abstracts.
- Clustering constrained by dependenciesTadepalli, Satish; Ramakrishnan, Naren; Watson, Layne T. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2009)Clustering is the unsupervised method of grouping data samples to form a partition of a given dataset. Such grouping is typically done based on homogeneity assumptions of clusters over an attribute space and hence the precise definition of the similarity metric affects the clusters inferred. In recent years, new formulations of clustering have emerged that posit indirect constraints on clustering, typically in terms of preserving dependencies between data samples and auxiliary variables. These formulations find applications in bioinformatics, web mining, social network analysis, and many other domains. The purpose of this survey is to provide a gentle introduction to these formulations, their mathematical assumptions, and the contexts under which they are applicable.
- Nonreciprocating Sharing Methods in Cooperative Q-Learning EnvironmentsCunningham, Bryan; Cao, Yong (Department of Computer Science, Virginia Polytechnic Institute & State University, 2012-06-01)Past research on multiagent simulation with cooperative reinforcement learning (RL) focuses on developing sharing strategies that are adopted and used by all agents in the environment. In this paper, we target situations where this assumption of a single sharing strategy that is employed by all agents is not valid. We seek to address how agents with no predetermined sharing partners can exploit groups of cooperatively learning agents to improve learning performance when compared to independent learning. Specifically, we propose three intra-agent methods that do not assume a reciprocating sharing relationship and leverage the pre-existing agent interface associated with Q-Learning to expedite learning.
- Personalizing the GAMS Cross-IndexPerugini, Saverio; Lakshminarayanan, Priya; Ramakrishnan, Naren (Department of Computer Science, Virginia Polytechnic Institute & State University, 2000-03-01)The NIST Guide to Available Mathematical Software (GAMS)system at http://gams.nist.gov serves as the gateway to thousands of scientific codes and modules for numerical com-putation.We describe the PIPE personalization facility for GAMS,whereby content from the cross-index is specialized for a user desiring software recommendations for a specific problem instance.The key idea is to (i)mine structure,and (ii)exploit it in a programmatic manner to generate personalized web pages.Our approach supports both content based and collaborative personalization and enables information integration from multiple (and complementary)web resources.We present case studies for the domain of linear,second-order,elliptic partial differential equations that indicate strong empirical evidence for the usefulness of our semi-automatic approach.
- Probability-one Homotopy Maps for Constrained Clustering ProblemsEasterling, David R.; Watson, Layne T.; Ramakrishnan, Naren; Hossain, M. Shahriar (Department of Computer Science, Virginia Polytechnic Institute & State University, 2013-12-31)Many algorithms for constrained clustering have been developed in the literature that aim to balance vector quantization requirements of cluster prototypes against the discrete satisfaction requirements of constraint (must-link or cannot-link) sets. A significant amount of research has been devoted to designing new algorithms for constrained clustering and understanding when constraints help clustering. However, no method exists to systematically characterize solution sets as constraints are gently introduced and how to assist practitioners in choosing a sweet spot between vector quantization and constraint satisfaction. A homotopy method is presented that can smoothly track solutions from unconstrained to constrained formulations of clustering. Beginning the homotopy zero curve tracking where the solution is (fairly) well-understood, the curve can then be tracked into regions where there is only a qualitative understanding of the solution set, finding multiple local solutions along the way. Experiments demonstrate how the new homotopy method helps identify better tradeoffs and reveals insight into the structure of solution sets not obtainable using pointwise exploration of parameters.
- Programming Environments for Multidisciplinary Grid CommunitiesRamakrishnan, Naren; Watson, Layne T.; Kafura, Dennis G.; Ribbens, Calvin J.; Shaffer, Clifford A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2001-07-01)Rapid advances in technological infrastructure as well as the emphasis on application support systems have signaled the maturity of grid computing. Today’s grid computing environments (GCEs) extend the notion of a programming environment beyond the compile-schedule-execute paradigm to include functionality such as networked access, information services, data management, and collaborative application composition. In this article, we present GCEs in the context of supporting multidisciplinary communities of scientists and engineers. We present a high-level design framework for building GCEs and a space of characteristics that help identify requirements for GCEs for multidisciplinary communities. By describing integrated systems for five different multidisciplinary communities, we outline the unique responsibility (and opportunity) for GCEs to exploit the larger context of the scientific or engineering application, defined by the ongoing activities of the pertinent community. Finally, we describe several core systems support technologies that we have developed to support multidisciplinary GCE applications.
- Science of Digital Libraries(SciDL)Fox, Edward A.; Carroll, John M.; Fan, Patrick; Cassel, Lillian N.; Zubair, Mohammad; Maly, Kurt; McMillan, Gail; Ramakrishnan, Naren; Halbert, Martin (Department of Computer Science, Virginia Polytechnic Institute & State University, 2003)Our purpose is to ensure that people and institutions better manage information through digital libraries (DLs). Thus we address a fundamental human and social need, which is particularly urgent in the modern Information (and Knowledge) Age. Our goal is to significantly advance both the theory and state-of-theart of DLs (and other advanced information systems) - thoroughly validating our approach using highly visible testbeds. Our research objective is to leverage our formal, theory-based approach to the problems of defining, understanding, modeling, building, personalizing, and evaluating DLs. We will construct models and tools based on that theory so organizations and individuals can easily create and maintain fully functional DLs, whose components can interoperate with corresponding components of related DLs. This research should be highly meritorious intellectually. We bring together a team of senior researchers with expertise in information retrieval, human-computer interaction, scenario-based design, personalization, and componentized system development and expect to make important contributions in each of those areas. Of crucial import, however, is that we will integrate our prior research and experience to achieve breakthrough advances in the field of DLs, regarding theory, methodology, systems, and evaluation. We will extend the 5S theory, which has identified five key dimensions or onstructs underlying effective DLs: Streams, Structures, Spaces, Scenarios, and Societies. We will use that theory to describe and develop metamodels, models, and systems, which can be tailored to disciplines and/or groups, as well as personalized. We will disseminate our findings as well as provide toolkits as open source software, encouraging wide use. We will validate our work using testbeds, ensuring broad impact. We will put powerful tools into the hands of digital librarians so they may easily plan and configure tailored systems, to support an extensible set of services, including publishing, discovery, searching, browsing, recommending, and access control, handling diverse types of collections, and varied genres and classes of digital objects. With these tools, end-users will for be able to design personal DLs. Testbeds are crucial to validate scientific theories and will be thoroughly integrated into SciDL research and evaluation. We will focus on two application domains, which together should allow comprehensive validation and increase the significance of SciDL's impact on scholarly communities. One is education (through CITIDEL); the other is libraries (through DLA and OCKHAM). CITIDEL deals with content from publishers (e.g, ACM Digital Library), corporate research efforts e.g., CiteSeer), volunteer initiatives (e.g., DBLP, based on the database and logic rogramming literature), CS departments (e.g., NCSTRL, mostly technical reports), educational initiatives (e.g., Computer Science Teaching Center), and universities (e.g., theses and dissertations). DLA is a unit of the Virginia Tech library that virtually publishes scholarly communication such as faculty-edited journals and rare and unique resources including image collections and finding aids from Special Collections. The OCKHAM initiative, calling for simplicity in the library world, emphasizes a three-part solution: lightweightprotocols, component-based development, and open reference models. It provides a framework to research the deployment of the SciDL approach in libraries. Thus our choice of testbeds also will nsure that our research will have additional benefit to and impact on the fields of computing and library and information science, supporting transformations in how we learn and deal with information.