VTechWorks Repository :: Browsing by Author "Shaffer, Clifford A."

Browsing by Author "Shaffer, Clifford A."

Now showing 1 - 20 of 138

Algorithm Visualization: The State of the Field
Cooper, Matthew Lenell (Virginia Tech, 2007-04-19)
We report on the state of the field of algorithm visualization, both quantitatively and qualitatively. Computer science educators seem to find algorithm and data structure visualizations attractive for their classrooms. Educational research shows that some are effective while many are not. Clearly, then, visualizations are difficult to create and use right. There is little in the way of a supporting community, and many visualizations are downright poor. Topic distribution is heavily skewed towards simple concepts with advanced topics receiving little to no attention. We have cataloged nearly 400 visualizations available on the Internet. We have a wiki-based catalog which includes availability, platform, strengths and weaknesses, responsible personnel and institutions, and other data about each visualization. We have developed extraction and analysis tools to gather statistics about the corpus of visualizations. Based on analysis of this collection, we point out areas where improvements may be realized and suggest techniques for implementing such improvements. We pay particular attention to the free and open source software movement as a model which the visualization community may do well to emulate, from both a software engineering perspective and a community-building standpoint.
Algorithms and Orders for Finding Noncummutative Gröbner Bases
Keller, Benjamin J. (Virginia Tech, 1997-03-17)
The problem of choosing efficient algorithms and good admissible orders for computing Gröbner bases in noncommutative algebras is considered. Gröbner bases are an important tool that make many problems in polynomial algebra computationally tractable. However, the computation of Gröbner bases is expensive, and in noncommutative algebras is not guaranteed to terminate. The algorithm, together with the order used to determine the leading term of each polynomial, are known to affect the cost of the computation, and are the focus of this thesis. A Gröbner basis is a set of polynomials computed, using Buchberger's algorithm, from another set of polynomials. The noncommutative form of Buchberger's algorithm repeatedly constructs a new polynomial from a triple, which is a pair of polynomials whose leading terms overlap and form a nontrivial common multiple. The algorithm leaves a number of details underspecified, and can be altered to improve its behavior. A significant improvement is the development of a dynamic dictionary matching approach that efficiently solves the pattern matching problems of noncommutative Gröbner basis computations. Three algorithmic alternatives are considered: the strategy for selecting triples (selection), the strategy for removing triples from consideration (triple elimination), and the approach to keeping the set interreduced (set reduction). Experiments show that the selection strategy is generally more significant than the other techniques, with the best strategy being the one that chooses the triple with the shortest common multiple. The best triple elimination strategy ignoring resource constraints is the Gebauer-Müller strategy. However, another strategy is defined that can perform as well as the Gebauer-Müller strategy in less space. The experiments also show that the admissible order used to determine the leading term of a polynomial is more significant than the algorithm. Experiments indicate that the choice of order is dependent on the input set of polynomials, but also suggest that the length lexicographic order is a good choice for many problems. A more practical approach to chosing an order may be to develop heuristics that attempt to find an order that minimizes the number of overlaps considered during the computation.
The AlgoViz Project: Building an Algorithm Visualization Web Community
Alon, Alexander Joel Dacara (Virginia Tech, 2010-07-15)
Algorithm visualizations (AVs) have become a popular teaching aid in classes on algorithms and data structures. The AlgoViz Project attempts to provide an online venue for educators, students, developers,researchers, and other AV users. The Project is comprised of two websites. The first, the AlgoViz Portal, provides two major informational resources: an AV catalog that provides both descriptive and evaluative metadata of indexed visualizations, and an annotated bibliography of research literature. Both resources have over 500 entries and are actively updated by the AV community. The Portal also provides field reports, discussion forums, and other community-building mechanisms. The second website, OpenAlgoViz, is a SourceForge site intended to showcase exemplary AVs, as well as provide logistical and hosting support to AV developers.
Analysis and Reduction of Moire Patterns in Scanned Halftone Pictures
Liu, Xiangdong (Virginia Tech, 1996-05-01)
In this dissertation we provide a comprehensive theory for the formation of a moire pattern in a sampled halftone image. We explore techniques for restoring a sampled halftone image with a moire pattern and techniques for preventing a moire pattern when a halftone picture is scanned. Specifically, we study the frequency, phase, and spatial geometry of a moire pattern. We observe and explain the half period phase reversal phenomenon that a moire pattern may exhibit. As a case study, we examine the moire patterns generated by a commercial scanner. We propose three restoration methods, including a notch filtering method, a simulation method, and a relaxation method. We also describe a moire prevention method, the partial inverse Fourier transform method. Finally, we propose a research agenda for further investigation.
Analysis of the Worst Case Space Complexity of a PR Quadtree
Pemmaraju, Sriram V.; Shaffer, Clifford A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1992)
We demonstrate that a resolution-r PR quadtree containing n points has, in the worst case, at most nodes. This captures the fact that as n tends towards 4r, the number of nodes in a PR quadtree quickly approaches O(n). This is a more precise estimation of the worst case space requirement of a PR quadtree than has been attempted before.
Analyzing Student Session Data in an eTextbook
Heo, Samnyeong (Virginia Tech, 2022-07-18)
As more students interact with online learning platforms and eTextbooks, they generate massive amounts of data. For example, the OpenDSA eTextbook system collects clickstream data as users interact with prose, visualizations, and interactive auto-graded exercises. Ideally, instructors and system developers can harness this information to create better instructional experiences. But in its raw event-level form, it is difficult for developers or instructors to understand student behaviors, or to make testable hypotheses about relationships between behavior and performance. In this study, we describe our efforts to break raw event-level data first into sessions (a continuous series of work by a student) and then to meaningfully abstract the events into higher-level descriptions of that session. The goal of this abstraction is to help instructors and researchers gain insights into the students' learning behaviors. For example, we can distinguish when students read material and then attempt the associated exercise, versus going straight to the exercise and then hunting for the answers in the associated material. We first bundle events into related activities, such as the events associated with stepping through a given visualization, or with working a given exercise. Each such group of events defines a state. A state is a basic unit that characterizes the interaction log data, and there are multiple state types including reading prose, interacting with visual contents, and solving exercises. We harnessed the abstracted data to analyze studying behavior and compared it with course performance based on GPA. We analyzed data from the Fall 2020 and Spring 2021 sections of a senior-level Formal Languages course, and also from the Fall 2020 and Spring 2021 sections of a data structures course.
Applying Curricular Alignment to Improve the Effectiveness of CS Education
Elsherbiny, Noha Ibrahim Mohamed (Virginia Tech, 2020-07-13)
According to Fossati and Guzdail, many CS instructors rely on their intuition, feedback from students and anecdotal advice from other instructors to make course improvements. Guzdail noted that: "Computing educators' practice would dramatically improve if we drew on evidence, rather than intuition". This means that Computer Science instructors may benefit from processes and tools that help them make informed changes to their curriculum. An evidence-based approach to course improvement is curriculum alignment, which evaluates the degree which the learning objectives, course content, and assessment methods are in agreement with each other. This provides instructors with a detailed view of their course and areas that need improvement. Current alignment processes are impractical for a course instructor to apply, requiring a panel of experts and training on the process. In this research, I developed a computer-assisted alignment process (CAAP) that uses the concept of traceability from software engineering, to define a process that is applicable by a single course instructor limiting the need for a panel of experts. In an initial application CAAP took 75 hours to apply, consequently a prototype alignment tool (AlignET) was designed to automates the new alignment process providing instructors with results they can use to make course improvements. I evaluated the practicality of AlignET by conducting collective case studies with four participants. Observations and interviews were used to collect data. AlignET reduced the time to complete CAAP to less than 11 hours and the participants identified course improvements, gaps in their instructional methods, and learning objectives they emphasized more than others. The findings from the case study presented key improvements to AlignET.
Applying Information Visualization Techniques to Visual Debugging
Costigan, John A. (Virginia Tech, 2003-04-24)
In the arena of software development, implementing a software design (no matter how perfect the design) is rarely done right the first time. Consequently, debugging one's own (or someone else's) software is inevitable, and tools that assist in this often-arduous task become very important with respect to reducing the cost of debugging as well as the cost of the software life cycle as a whole. Many tools exist with this aim, but all are lacking in a key area: information visualization. Applying information visualization techniques such as zooming, focus and context, or graphical representation of numeric data may enhance the visual debugging experience. To this end, drawing data structures as graphs is potentially a step in the right direction, but more must be done to maximize the value of time spent debugging and to minimize the actual amount of time spent debugging. This thesis will address some information visualization techniques that may be helpful in debugging (specifically with respect to visual debugging) and will present the results of a small pilot study intended to illustrate the potential value of such techniques.
Approaches to map anamorphosis
Klipsch, Colin (Virginia Tech, 1993-12-15)
Map anamorphosis is the distortion of a map to show graphically the variation of some quantity from region to region. The process of anamorphosis modifies the original map regions, keeping the inter-region topology, to produce new regions whose areas are proportional to their respective values of the relevant quantity. A typical example would be a distorted map of the United States, where each state's area is proportional to its population, yet the states still fit together in the correct way. Such maps, called "cartograms", can provide a good visual sense of where a quantity such as population, is distributed. In this paper we look at five separate attempts to design a computer algorithm for generating cartograms, all of which use triangulation as a basis, and all of which, unfortunately, are unsuccessful. We also examine a working algorithm, in the literature that uses similar ideas in its initial approach. While this algorithm produces aesthetically displeasing results, it may indicate a way to solve the map anamorphosis problem robustly using triangulation.
Arabic News Text Classification and Summarization: A Case of the Electronic Library Institute SeerQ (ELISQ)
Kan'an, Tarek Ghaze (Virginia Tech, 2015-07-21)
Arabic news articles in heterogeneous electronic collections are difficult for users to work with. Two problems are: that they are not categorized in a way that would aid browsing, and that there are no summaries or detailed metadata records that could be easier to work with than full articles. To address the first problem, schema mapping techniques were adapted to construct a simple taxonomy for Arabic news stories that is compatible with the subject codes of the International Press Telecommunications Council. So that each article would be labeled with the proper taxonomy category, automatic classification methods were researched, to identify the most appropriate. Experiments showed that the best features to use in classification resulted from a new tailored stemming approach (i.e., a new Arabic light stemmer called P-Stemmer). When coupled with binary classification using SVM, the newly developed approach proved to be superior to state-of-the-art techniques. To address the second problem, i.e., summarization, preliminary work was done with English corpora. This was in the context of a new Problem Based Learning (PBL) course wherein students produced template summaries of big text collections. The techniques used in the course were extended to work with Arabic news. Due to the lack of high quality tools for Named Entity Recognition (NER) and topic identification for Arabic, two new tools were constructed: RenA for Arabic NER, and ALDA for Arabic topic extraction tool (using the Latent Dirichlet Algorithm). Controlled experiments with each of RenA and ALDA, involving Arabic speakers and a randomly selected corpus of 1000 Qatari news articles, showed the tools produced very good results (i.e., names, organizations, locations, and topics). Then the categorization, NER, topic identification, and additional information extraction techniques were combined to produce approximately 120,000 summaries for Qatari news articles, which are searchable, along with the articles, using LucidWorks Fusion, which builds upon Solr software. Evaluation of the summaries showed high ratings based on the 1000-article test corpus. Contributions of this research with Arabic news articles thus include a new: test corpus, taxonomy, light stemmer, classification approach, NER tool, topic identification tool, and template-based summarizer – all shown through experimentation to be highly effective.
Automated Detection of Surface Defects on Barked Hardwood Logs and Stems Using 3-D Laser Scanned Data
Thomas, Liya (Virginia Tech, 2006-09-08)
This dissertation presents an automated detection algorithm that identifies severe external defects on the surfaces of barked hardwood logs and stems. The defects detected are at least 0.5 inch in height and at least 3 inches in diameter, which are severe, medium to large in size, and have external surface rises. Hundreds of real log defect samples were measured, photographed, and categorized to summarize the main defect features and to build a defect knowledge base. Three-dimensional laser-scanned range data capture the external log shapes and portray bark pattern, defective knobs, and depressions. The log data are extremely noisy, have missing data, and include severe outliers induced by loose bark that dangles from the log trunk. Because the circle model is nonlinear and presents both additive and non-additive errors, a new robust generalized M-estimator has been developed that is different from the ones proposed in the statistical literature for linear regression. Circle fitting is performed by standardizing the residuals via scale estimates calculated by means of projection statistics and incorporated in the Huber objective function to bound the influence of the outliers in the estimates. The projection statistics are based on 2-D radial-vector coordinates instead of the row vectors of the Jacobian matrix as proposed in the statistical literature dealing with linear regression. This approach proves effective in that it makes the GM-estimator to be influence bounded and thereby, robust against outliers. Severe defects are identified through the analysis of 3-D log data using decision rules obtained from analyzing the knowledge base. Contour curves are generated from radial distances, which are determined by robust 2-D circle fitting to the log-data cross sections. The algorithm detected 63 from a total of 68 severe defects. There were 10 non-defective regions falsely identified as defects. When these were calculated as areas, the algorithm locates 97.6% of the defect area, and falsely identifies 1.5% of the total clear area as defective.
Bridging the Gap between Deterministic and Stochastic Modeling with Automatic Scaling and Conversion
Wang, Pengyuan (Virginia Tech, 2008-05-07)
During the past decade, many successful deterministic models of macromolecular regulatory networks have been built. Deterministic simulations of these models can show only average dynamics of the systems. However, stochastic simulations of macromolecular regulatory models can account for behaviors that are introduced by the noisy nature of the systems but not revealed by deterministic simulations. Thus, converting an existing model of value from the most common deterministic formulation to one suitable for stochastic simulation enables further investigation of the regulatory network. Although many different stochastic models can be developed and evolved from deterministic models, a direct conversion is the first step in practice. This conversion process is tedious and error-prone, especially for complex models. Thus, we seek to automate as much of the conversion process as possible. However, deterministic models often omit key information necessary for a stochastic formulation. Specifically, values in the model have to be scaled before a complete conversion, and the scaling factors are typically not given in the deterministic model. Several functionalities helping model scaling and converting are introduced and implemented in the JigCell modeling environment. Our tool makes it easier for the modeler to include complete details as well as to convert the model. Stochastic simulations are known for being computationally intensive, and thus require high performance computing facilities to be practical. With parallel computation on Virginia Tech's System X supercomputer, we are able to obtain the first stochastic simulation results for realistic cell cycle models. Stochastic simulation results for several mutants, which are thought to be biologically significant, are presented. Successful deployment of the enhanced modeling environment demonstrates the power of our techniques.
BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving Environments
Verstak, Alex; Ramakrishnan, Naren; Watson, Layne T.; He, Jian; Shaffer, Clifford A.; Bae, Kyung Kyoon; Jiang, Jing; Tranter, William H.; Rappaport, Theodore S. (Hindawi, 2003-01-01)
We describe a binding schema markup language (BSML) for describing data interchange between scientific codes. Such a facility is an important constituent of scientific problem solving environments (PSEs). BSML is designed to integrate with a PSE or application composition system that views model specification and execution as a problem of managing semistructured data. The data interchange problem is addressed by three techniques for processing semistructured data: validation, binding, and conversion. We present BSML and describe its application to a PSE for wireless communications system design.
Building a Trustworthy Question Answering System for Covid-19 Tracking
Liu, Yiqing (Virginia Tech, 2021-09-02)
During the unprecedented global pandemic of Covid-19, the general public is suffering from inaccurate Covid-19 related information including outdated information and fake news. The most used media: TV, social media, newspaper, and radio are incompetent in providing certitude and flash updates that people are seeking. In order to cope with this challenge, several public data resources that are dedicated to providing Covid-19 information were born. They rallied with experts from different fields to provide authoritative and up-to-date pandemic updates. However, the general public cannot still make complete use of such resources since the learning curve is too steep, especially for the aged and under-aged users. To address this problem, in this Thesis, we propose a question answering system that can be interacted with using simple natural language-based sentences. While building this system, we investigate qualified public data resources and from the data content they are providing, and we collect a set of frequently asked questions for Covid-19 tracking. We further build a dedicated dataset named CovidQA for evaluating the performance of the question answering system with different models. Based on the new dataset, we assess multiple machine learning-based models that are built for retrieving relevant information from databases, and then propose two empirical models which utilize the pre-defined templates to generate SQL queries. In our experiments, we demonstrate both quantitative and qualitative results and provide a comprehensive comparison between different types of methods. The results show that the proposed template-based methods are simple but effective in building question answering systems for specific domain problems.
Building and Evaluating a Learning Environment for Data Structures and Algorithms Courses
Fouh Mbindi, Eric Noel (Virginia Tech, 2015-04-29)
Learning technologies in computer science education have been most closely associated with teaching of programming, including automatic assessment of programming exercises. However, when it comes to teaching computer science content and concepts, learning technologies have not been heavily used. Perhaps the best known application today is Algorithm Visualization (AV), of which there are hundreds of examples. AVs tend to focus on presenting the procedural aspects of how a given algorithm works, rather than more conceptual content. There are also new electronic textbooks (eTextbooks) that incorporate the ability to edit and execute program examples. For many traditional courses, a longstanding problem is lack of sufficient practice exercises with feedback to the student. Automated assessment provides a way to increase the number of exercises on which students can receive feedback. Interactive eTextbooks have the potential to make it easy for instructors to introduce both visualizations and practice exercises into their courses. OpenDSA is an interactive eTextbook for data structures and algorithms (DSA) courses. It integrates tutorial content with AVs and automatically assessed interactive exercises. Since Spring 2013, OpenDSA has been regularly used to teach a fundamental data structures and algorithms course (CS2), and also a more advanced data structures, algorithms, and analysis course (CS3) at various institutions of higher education. In this thesis, I report on findings from early adoption of the OpenDSA system. I describe how OpenDSA's design addresses obstacles in the use of AV systems. I identify a wide variety of use for OpenDSA in the classroom. I found that instructors used OpenDSA exercises as graded assignments in all the courses where it was used. Some instructors assigned an OpenDSA assignment before lectures and started spending more time teaching higher-level concepts. OpenDSA also supported implementing a ``flipped classroom'' by some instructors. I found that students are enthusiastic about OpenDSA and voluntarily used the AVs embedded within OpenDSA. Students found OpenDSA beneficial and expressed a preference for a class format that included using OpenDSA as part of the assigned graded work. The relationship between OpenDSA and students' performance was inconclusive, but I found that students with higher grades tend to complete more exercises.
Cell Cycle Modeling for Budding Yeast with Stochastic Simulation Algorithms
Ahn, Tae-Hyuk; Watson, Layne T.; Cao, Yang; Shaffer, Clifford A.; Baumann, William T. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2008-11-01)
For biochemical systems, where some chemical species are represented by small numbers of molecules, discrete and stochastic approaches are more appropriate than continuous and deterministic approaches. The continuous deterministic approach using ordinary differential equations is adequate for understanding the average behavior of cells, while the discrete stochastic approach accurately captures noisy events in the growth-division cycle. Since the emergence of the stochastic simulation algorithm (SSA) by Gillespie, alternative algorithms have been developed whose goal is to improve the computational efficiency of the SSA. This paper explains and empirically compares the performance of some of these SSA alternatives on a realistic model. The budding yeast cell cycle provides an excellent example of the need for modeling stochastic effects in mathematical modeling of biochemical reactions. This paper presents a stochastic approximation of the cell cycle for budding yeast using Gillespie’s stochastic simulation algorithm. To compare the stochastic results with the average behavior, the simulation must be run thousands of times. Many of the proposed techniques to accelerate the SSA are not effective on the budding yeast problem, because of the scale of the problem or because underlying assumptions are not satisfied. A load balancing algorithm improved overall performance on a parallel supercomputer.
CodeWorkout: Design and Implementation of an Online Drill-and-Practice System for Introductory Programming
Panamalai Murali, Krishnan (Virginia Tech, 2016-06-14)
The massive rise in Computer Science enrollments in both traditional classroom courses and in Massively Open Online Courses (MOOCs) shows the enormous opportunities in engaging students to learn programming. While the number of students in CS courses continues to increase, there has been no concomitant increase in the number of instructors for such courses. This leads to a completely lopsided learning environment where the already-stretched instructor is pressed to spend more time on ancillary tasks like grading and course bookkeeping. CodeWorkout is an online drill-and-practice system with course management features that aims to address these issues. CodeWorkout hosts an online repository of programming questions that instructors can incorporate into their courses. It also provides instructors with a facility to create their own programming questions so that exercises can be tailored according to the needs of the class. CodeWorkout has an open gradual engagement model that allows students who are not enrolled in a course to use it. CodeWorkout also creates an open environment for instructors to collaborate by sharing exercises that they create. CodeWorkout has been used in four courses at Virginia Tech. It has been shown to significantly improve the student's skills in introductory programming through providing a number of online practice questions.
Collaboration Transparency in Java through Event Broadcasting
Begole, James M.A.; Struble, Craig A.; Shaffer, Clifford A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1997-02-01)
Widespread use of the Internet for education and research is yielding many opportunities for network-based synchronous collaboration. For example, the Java runtime environment provides a platform independent vehicle for new collaborative applications. While many toolkits are becoming available that support development of collaborative applications, they do not enable collaborative use of existing single-user Java applets, a process called collaboration transparency. This paper discusses two approaches to collaboration transparency: display broadcasting and event broadcasting. We then consider the suitability of each within the context of the Java runtime environment. Unfortunately, simple modifications to the standard Java class libraries as they are currently implemented are not sufficient to support collaboration transparency. We describe the problems and suggest solutions.
Comparison of Computational Notebook Platforms for Interactive Visual Analytics: Case Study of Andromeda Implementations
Liu, Han (Virginia Tech, 2022-09-22)
Existing notebook platforms have different capabilities for supporting visual analytics use. It is not clear which platform to choose for implementing visual analytics notebooks. In this work, we investigated the problem using Andromeda, an interactive dimension reduction algorithm, and implemented it using three different notebook platforms: 1) Python-based Jupyter Notebook, 2) JavaScript-based Observable Notebook, and 3) Jupyter Notebook embedding both Python (data science use) and JavaScript (visual analytics use). We also made comparisons for all the notebook platforms via a case study based on metrics such as programming difficulty, notebook organization, interactive performance, and UI design choice. Furthermore, guidelines are provided for data scientists to choose one notebook platform for implementing their visual analytics notebooks in various situations. Laying the groundwork for future developers, advice is also given on architecting better notebook platforms.
A Computational Framework for Assessing and Optimizing the Performance of Observational Networks in 4D-Var Data Assimilation
Cioaca, Alexandru (Virginia Tech, 2013-09-04)
A deep scientific understanding of complex physical systems, such as the atmosphere, can be achieved neither by direct measurements nor by numerical simulations alone. Data assimilation is a rigorous procedure to fuse information from a priori knowledge of the system state, the physical laws governing the evolution of the system, and real measurements, all with associated error statistics. Data assimilation produces best (a posteriori) estimates of model states and parameter values, and results in considerably improved computer simulations. The acquisition and use of observations in data assimilation raises several important scientific questions related to optimal sensor network design, quantification of data impact, pruning redundant data, and identifying the most beneficial additional observations. These questions originate in operational data assimilation practice, and have started to attract considerable interest in the recent past. This dissertation advances the state of knowledge in four dimensional variational (4D-Var) - data assimilation by developing, implementing, and validating a novel computational framework for estimating observation impact and for optimizing sensor networks. The framework builds on the powerful methodologies of second-order adjoint modeling and the 4D-Var sensitivity equations. Efficient computational approaches for quantifying the observation impact include matrix free linear algebra algorithms and low-rank approximations of the sensitivities to observations. The sensor network configuration problem is formulated as a meta-optimization problem. Best values for parameters such as sensor location are obtained by optimizing a performance criterion, subject to the constraint posed by the 4D-Var optimization. Tractable computational solutions to this "optimization-constrained" optimization problem are provided. The results of this work can be directly applied to the deployment of intelligent sensors and adaptive observations, as well as to reducing the operating costs of measuring networks, while preserving their ability to capture the essential features of the system under consideration.

Browsing by Author "Shaffer, Clifford A."

Results Per Page

Sort Options