Browsing by Author "Shaffer, Clifford A."
Now showing 1 - 20 of 141
Results Per Page
Sort Options
- Algorithm Visualization: The State of the FieldCooper, Matthew Lenell (Virginia Tech, 2007-04-19)We report on the state of the field of algorithm visualization, both quantitatively and qualitatively. Computer science educators seem to find algorithm and data structure visualizations attractive for their classrooms. Educational research shows that some are effective while many are not. Clearly, then, visualizations are difficult to create and use right. There is little in the way of a supporting community, and many visualizations are downright poor. Topic distribution is heavily skewed towards simple concepts with advanced topics receiving little to no attention. We have cataloged nearly 400 visualizations available on the Internet. We have a wiki-based catalog which includes availability, platform, strengths and weaknesses, responsible personnel and institutions, and other data about each visualization. We have developed extraction and analysis tools to gather statistics about the corpus of visualizations. Based on analysis of this collection, we point out areas where improvements may be realized and suggest techniques for implementing such improvements. We pay particular attention to the free and open source software movement as a model which the visualization community may do well to emulate, from both a software engineering perspective and a community-building standpoint.
- Algorithms and Orders for Finding Noncummutative Gröbner BasesKeller, Benjamin J. (Virginia Tech, 1997-03-17)The problem of choosing efficient algorithms and good admissible orders for computing Gröbner bases in noncommutative algebras is considered. Gröbner bases are an important tool that make many problems in polynomial algebra computationally tractable. However, the computation of Gröbner bases is expensive, and in noncommutative algebras is not guaranteed to terminate. The algorithm, together with the order used to determine the leading term of each polynomial, are known to affect the cost of the computation, and are the focus of this thesis. A Gröbner basis is a set of polynomials computed, using Buchberger's algorithm, from another set of polynomials. The noncommutative form of Buchberger's algorithm repeatedly constructs a new polynomial from a triple, which is a pair of polynomials whose leading terms overlap and form a nontrivial common multiple. The algorithm leaves a number of details underspecified, and can be altered to improve its behavior. A significant improvement is the development of a dynamic dictionary matching approach that efficiently solves the pattern matching problems of noncommutative Gröbner basis computations. Three algorithmic alternatives are considered: the strategy for selecting triples (selection), the strategy for removing triples from consideration (triple elimination), and the approach to keeping the set interreduced (set reduction). Experiments show that the selection strategy is generally more significant than the other techniques, with the best strategy being the one that chooses the triple with the shortest common multiple. The best triple elimination strategy ignoring resource constraints is the Gebauer-Müller strategy. However, another strategy is defined that can perform as well as the Gebauer-Müller strategy in less space. The experiments also show that the admissible order used to determine the leading term of a polynomial is more significant than the algorithm. Experiments indicate that the choice of order is dependent on the input set of polynomials, but also suggest that the length lexicographic order is a good choice for many problems. A more practical approach to chosing an order may be to develop heuristics that attempt to find an order that minimizes the number of overlaps considered during the computation.
- The AlgoViz Project: Building an Algorithm Visualization Web CommunityAlon, Alexander Joel Dacara (Virginia Tech, 2010-07-15)Algorithm visualizations (AVs) have become a popular teaching aid in classes on algorithms and data structures. The AlgoViz Project attempts to provide an online venue for educators, students, developers,researchers, and other AV users. The Project is comprised of two websites. The first, the AlgoViz Portal, provides two major informational resources: an AV catalog that provides both descriptive and evaluative metadata of indexed visualizations, and an annotated bibliography of research literature. Both resources have over 500 entries and are actively updated by the AV community. The Portal also provides field reports, discussion forums, and other community-building mechanisms. The second website, OpenAlgoViz, is a SourceForge site intended to showcase exemplary AVs, as well as provide logistical and hosting support to AV developers.
- Analysis and Reduction of Moire Patterns in Scanned Halftone PicturesLiu, Xiangdong (Virginia Tech, 1996-05-01)In this dissertation we provide a comprehensive theory for the formation of a moire pattern in a sampled halftone image. We explore techniques for restoring a sampled halftone image with a moire pattern and techniques for preventing a moire pattern when a halftone picture is scanned. Specifically, we study the frequency, phase, and spatial geometry of a moire pattern. We observe and explain the half period phase reversal phenomenon that a moire pattern may exhibit. As a case study, we examine the moire patterns generated by a commercial scanner. We propose three restoration methods, including a notch filtering method, a simulation method, and a relaxation method. We also describe a moire prevention method, the partial inverse Fourier transform method. Finally, we propose a research agenda for further investigation.
- Analysis of the Worst Case Space Complexity of a PR QuadtreePemmaraju, Sriram V.; Shaffer, Clifford A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1992)We demonstrate that a resolution-r PR quadtree containing n points has, in the worst case, at most nodes. This captures the fact that as n tends towards 4r, the number of nodes in a PR quadtree quickly approaches O(n). This is a more precise estimation of the worst case space requirement of a PR quadtree than has been attempted before.
- Analyzing Student Session Data in an eTextbookHeo, Samnyeong (Virginia Tech, 2022-07-18)As more students interact with online learning platforms and eTextbooks, they generate massive amounts of data. For example, the OpenDSA eTextbook system collects clickstream data as users interact with prose, visualizations, and interactive auto-graded exercises. Ideally, instructors and system developers can harness this information to create better instructional experiences. But in its raw event-level form, it is difficult for developers or instructors to understand student behaviors, or to make testable hypotheses about relationships between behavior and performance. In this study, we describe our efforts to break raw event-level data first into sessions (a continuous series of work by a student) and then to meaningfully abstract the events into higher-level descriptions of that session. The goal of this abstraction is to help instructors and researchers gain insights into the students' learning behaviors. For example, we can distinguish when students read material and then attempt the associated exercise, versus going straight to the exercise and then hunting for the answers in the associated material. We first bundle events into related activities, such as the events associated with stepping through a given visualization, or with working a given exercise. Each such group of events defines a state. A state is a basic unit that characterizes the interaction log data, and there are multiple state types including reading prose, interacting with visual contents, and solving exercises. We harnessed the abstracted data to analyze studying behavior and compared it with course performance based on GPA. We analyzed data from the Fall 2020 and Spring 2021 sections of a senior-level Formal Languages course, and also from the Fall 2020 and Spring 2021 sections of a data structures course.
- Applying Curricular Alignment to Improve the Effectiveness of CS EducationElsherbiny, Noha Ibrahim Mohamed (Virginia Tech, 2020-07-13)According to Fossati and Guzdail, many CS instructors rely on their intuition, feedback from students and anecdotal advice from other instructors to make course improvements. Guzdail noted that: "Computing educators' practice would dramatically improve if we drew on evidence, rather than intuition". This means that Computer Science instructors may benefit from processes and tools that help them make informed changes to their curriculum. An evidence-based approach to course improvement is curriculum alignment, which evaluates the degree which the learning objectives, course content, and assessment methods are in agreement with each other. This provides instructors with a detailed view of their course and areas that need improvement. Current alignment processes are impractical for a course instructor to apply, requiring a panel of experts and training on the process. In this research, I developed a computer-assisted alignment process (CAAP) that uses the concept of traceability from software engineering, to define a process that is applicable by a single course instructor limiting the need for a panel of experts. In an initial application CAAP took 75 hours to apply, consequently a prototype alignment tool (AlignET) was designed to automates the new alignment process providing instructors with results they can use to make course improvements. I evaluated the practicality of AlignET by conducting collective case studies with four participants. Observations and interviews were used to collect data. AlignET reduced the time to complete CAAP to less than 11 hours and the participants identified course improvements, gaps in their instructional methods, and learning objectives they emphasized more than others. The findings from the case study presented key improvements to AlignET.
- Applying Information Visualization Techniques to Visual DebuggingCostigan, John A. (Virginia Tech, 2003-04-24)In the arena of software development, implementing a software design (no matter how perfect the design) is rarely done right the first time. Consequently, debugging one's own (or someone else's) software is inevitable, and tools that assist in this often-arduous task become very important with respect to reducing the cost of debugging as well as the cost of the software life cycle as a whole. Many tools exist with this aim, but all are lacking in a key area: information visualization. Applying information visualization techniques such as zooming, focus and context, or graphical representation of numeric data may enhance the visual debugging experience. To this end, drawing data structures as graphs is potentially a step in the right direction, but more must be done to maximize the value of time spent debugging and to minimize the actual amount of time spent debugging. This thesis will address some information visualization techniques that may be helpful in debugging (specifically with respect to visual debugging) and will present the results of a small pilot study intended to illustrate the potential value of such techniques.
- Approaches to map anamorphosisKlipsch, Colin (Virginia Tech, 1993-12-15)Map anamorphosis is the distortion of a map to show graphically the variation of some quantity from region to region. The process of anamorphosis modifies the original map regions, keeping the inter-region topology, to produce new regions whose areas are proportional to their respective values of the relevant quantity. A typical example would be a distorted map of the United States, where each state's area is proportional to its population, yet the states still fit together in the correct way. Such maps, called "cartograms", can provide a good visual sense of where a quantity such as population, is distributed. In this paper we look at five separate attempts to design a computer algorithm for generating cartograms, all of which use triangulation as a basis, and all of which, unfortunately, are unsuccessful. We also examine a working algorithm, in the literature that uses similar ideas in its initial approach. While this algorithm produces aesthetically displeasing results, it may indicate a way to solve the map anamorphosis problem robustly using triangulation.
- Arabic News Text Classification and Summarization: A Case of the Electronic Library Institute SeerQ (ELISQ)Kan'an, Tarek Ghaze (Virginia Tech, 2015-07-21)Arabic news articles in heterogeneous electronic collections are difficult for users to work with. Two problems are: that they are not categorized in a way that would aid browsing, and that there are no summaries or detailed metadata records that could be easier to work with than full articles. To address the first problem, schema mapping techniques were adapted to construct a simple taxonomy for Arabic news stories that is compatible with the subject codes of the International Press Telecommunications Council. So that each article would be labeled with the proper taxonomy category, automatic classification methods were researched, to identify the most appropriate. Experiments showed that the best features to use in classification resulted from a new tailored stemming approach (i.e., a new Arabic light stemmer called P-Stemmer). When coupled with binary classification using SVM, the newly developed approach proved to be superior to state-of-the-art techniques. To address the second problem, i.e., summarization, preliminary work was done with English corpora. This was in the context of a new Problem Based Learning (PBL) course wherein students produced template summaries of big text collections. The techniques used in the course were extended to work with Arabic news. Due to the lack of high quality tools for Named Entity Recognition (NER) and topic identification for Arabic, two new tools were constructed: RenA for Arabic NER, and ALDA for Arabic topic extraction tool (using the Latent Dirichlet Algorithm). Controlled experiments with each of RenA and ALDA, involving Arabic speakers and a randomly selected corpus of 1000 Qatari news articles, showed the tools produced very good results (i.e., names, organizations, locations, and topics). Then the categorization, NER, topic identification, and additional information extraction techniques were combined to produce approximately 120,000 summaries for Qatari news articles, which are searchable, along with the articles, using LucidWorks Fusion, which builds upon Solr software. Evaluation of the summaries showed high ratings based on the 1000-article test corpus. Contributions of this research with Arabic news articles thus include a new: test corpus, taxonomy, light stemmer, classification approach, NER tool, topic identification tool, and template-based summarizer – all shown through experimentation to be highly effective.
- Automated Detection of Surface Defects on Barked Hardwood Logs and Stems Using 3-D Laser Scanned DataThomas, Liya (Virginia Tech, 2006-09-08)This dissertation presents an automated detection algorithm that identifies severe external defects on the surfaces of barked hardwood logs and stems. The defects detected are at least 0.5 inch in height and at least 3 inches in diameter, which are severe, medium to large in size, and have external surface rises. Hundreds of real log defect samples were measured, photographed, and categorized to summarize the main defect features and to build a defect knowledge base. Three-dimensional laser-scanned range data capture the external log shapes and portray bark pattern, defective knobs, and depressions. The log data are extremely noisy, have missing data, and include severe outliers induced by loose bark that dangles from the log trunk. Because the circle model is nonlinear and presents both additive and non-additive errors, a new robust generalized M-estimator has been developed that is different from the ones proposed in the statistical literature for linear regression. Circle fitting is performed by standardizing the residuals via scale estimates calculated by means of projection statistics and incorporated in the Huber objective function to bound the influence of the outliers in the estimates. The projection statistics are based on 2-D radial-vector coordinates instead of the row vectors of the Jacobian matrix as proposed in the statistical literature dealing with linear regression. This approach proves effective in that it makes the GM-estimator to be influence bounded and thereby, robust against outliers. Severe defects are identified through the analysis of 3-D log data using decision rules obtained from analyzing the knowledge base. Contour curves are generated from radial distances, which are determined by robust 2-D circle fitting to the log-data cross sections. The algorithm detected 63 from a total of 68 severe defects. There were 10 non-defective regions falsely identified as defects. When these were calculated as areas, the algorithm locates 97.6% of the defect area, and falsely identifies 1.5% of the total clear area as defective.
- Beyond Curation: A Validation and Classification Infrastructure for an Educational Content CatalogAina, Adeyemi Babatunde (Virginia Tech, 2025-01-21)To address the challenge of discovering computer science learning resources, the Smart Learning Content (SLC) catalog is designed to simplify access to the growing body of educational content. As part of the Standards, Protocols, and Learning Infrastructure for Computing Education (SPLICE) research community's efforts, the catalog functions as a centralized platform supporting SPLICE's objectives of improving interoperability, enabling comprehensive data collection, and facilitating data analysis in computer science education. The SLC catalog stands out from previous catalogs with an approach that applies an ontology-based content organization and validation services. Additionally, it serves as a platform where educators can contribute, access, and share a wide range of resources—including slideshows, interactive exercises, programming tasks, and Learning Tools Interoperability (LTI)-integrated content from various learning tools. While the primary goal of the catalog is to disseminate high-quality learning materials, its extensive and varied content requires robust organization and validation mechanisms to ensure educators can efficiently locate and utilize resources. The catalog is designed to further support diverse content types, including both standalone resources and content bundles. For one of our key contributors, OpenDSA—an e-textbook system—we have adopted latest LTI 1.3 standard. This implementation enables the catalog to disseminate content in both LTI 1.1 and LTI 1.3 standards, ensuring compatibility. One key improvement in LTI 1.3 is its security features, incorporating robust authentication methods to ensure stronger protection of sensitive student information. This updated standard enables learning tools to meet the evolving demands of digital education, providing educators and learners with more secure, flexible, and effective resources.
- Bridging the Gap between Deterministic and Stochastic Modeling with Automatic Scaling and ConversionWang, Pengyuan (Virginia Tech, 2008-05-07)During the past decade, many successful deterministic models of macromolecular regulatory networks have been built. Deterministic simulations of these models can show only average dynamics of the systems. However, stochastic simulations of macromolecular regulatory models can account for behaviors that are introduced by the noisy nature of the systems but not revealed by deterministic simulations. Thus, converting an existing model of value from the most common deterministic formulation to one suitable for stochastic simulation enables further investigation of the regulatory network. Although many different stochastic models can be developed and evolved from deterministic models, a direct conversion is the first step in practice. This conversion process is tedious and error-prone, especially for complex models. Thus, we seek to automate as much of the conversion process as possible. However, deterministic models often omit key information necessary for a stochastic formulation. Specifically, values in the model have to be scaled before a complete conversion, and the scaling factors are typically not given in the deterministic model. Several functionalities helping model scaling and converting are introduced and implemented in the JigCell modeling environment. Our tool makes it easier for the modeler to include complete details as well as to convert the model. Stochastic simulations are known for being computationally intensive, and thus require high performance computing facilities to be practical. With parallel computation on Virginia Tech's System X supercomputer, we are able to obtain the first stochastic simulation results for realistic cell cycle models. Stochastic simulation results for several mutants, which are thought to be biologically significant, are presented. Successful deployment of the enhanced modeling environment demonstrates the power of our techniques.
- BSML: A Binding Schema Markup Language for Data Interchange in Problem Solving EnvironmentsVerstak, Alex; Ramakrishnan, Naren; Watson, Layne T.; He, Jian; Shaffer, Clifford A.; Bae, Kyung Kyoon; Jiang, Jing; Tranter, William H.; Rappaport, Theodore S. (Hindawi, 2003-01-01)We describe a binding schema markup language (BSML) for describing data interchange between scientific codes. Such a facility is an important constituent of scientific problem solving environments (PSEs). BSML is designed to integrate with a PSE or application composition system that views model specification and execution as a problem of managing semistructured data. The data interchange problem is addressed by three techniques for processing semistructured data: validation, binding, and conversion. We present BSML and describe its application to a PSE for wireless communications system design.
- Building a Trustworthy Question Answering System for Covid-19 TrackingLiu, Yiqing (Virginia Tech, 2021-09-02)During the unprecedented global pandemic of Covid-19, the general public is suffering from inaccurate Covid-19 related information including outdated information and fake news. The most used media: TV, social media, newspaper, and radio are incompetent in providing certitude and flash updates that people are seeking. In order to cope with this challenge, several public data resources that are dedicated to providing Covid-19 information were born. They rallied with experts from different fields to provide authoritative and up-to-date pandemic updates. However, the general public cannot still make complete use of such resources since the learning curve is too steep, especially for the aged and under-aged users. To address this problem, in this Thesis, we propose a question answering system that can be interacted with using simple natural language-based sentences. While building this system, we investigate qualified public data resources and from the data content they are providing, and we collect a set of frequently asked questions for Covid-19 tracking. We further build a dedicated dataset named CovidQA for evaluating the performance of the question answering system with different models. Based on the new dataset, we assess multiple machine learning-based models that are built for retrieving relevant information from databases, and then propose two empirical models which utilize the pre-defined templates to generate SQL queries. In our experiments, we demonstrate both quantitative and qualitative results and provide a comprehensive comparison between different types of methods. The results show that the proposed template-based methods are simple but effective in building question answering systems for specific domain problems.
- Building and Evaluating a Learning Environment for Data Structures and Algorithms CoursesFouh Mbindi, Eric Noel (Virginia Tech, 2015-04-29)Learning technologies in computer science education have been most closely associated with teaching of programming, including automatic assessment of programming exercises. However, when it comes to teaching computer science content and concepts, learning technologies have not been heavily used. Perhaps the best known application today is Algorithm Visualization (AV), of which there are hundreds of examples. AVs tend to focus on presenting the procedural aspects of how a given algorithm works, rather than more conceptual content. There are also new electronic textbooks (eTextbooks) that incorporate the ability to edit and execute program examples. For many traditional courses, a longstanding problem is lack of sufficient practice exercises with feedback to the student. Automated assessment provides a way to increase the number of exercises on which students can receive feedback. Interactive eTextbooks have the potential to make it easy for instructors to introduce both visualizations and practice exercises into their courses. OpenDSA is an interactive eTextbook for data structures and algorithms (DSA) courses. It integrates tutorial content with AVs and automatically assessed interactive exercises. Since Spring 2013, OpenDSA has been regularly used to teach a fundamental data structures and algorithms course (CS2), and also a more advanced data structures, algorithms, and analysis course (CS3) at various institutions of higher education. In this thesis, I report on findings from early adoption of the OpenDSA system. I describe how OpenDSA's design addresses obstacles in the use of AV systems. I identify a wide variety of use for OpenDSA in the classroom. I found that instructors used OpenDSA exercises as graded assignments in all the courses where it was used. Some instructors assigned an OpenDSA assignment before lectures and started spending more time teaching higher-level concepts. OpenDSA also supported implementing a ``flipped classroom'' by some instructors. I found that students are enthusiastic about OpenDSA and voluntarily used the AVs embedded within OpenDSA. Students found OpenDSA beneficial and expressed a preference for a class format that included using OpenDSA as part of the assigned graded work. The relationship between OpenDSA and students' performance was inconclusive, but I found that students with higher grades tend to complete more exercises.
- Cell Cycle Modeling for Budding Yeast with Stochastic Simulation AlgorithmsAhn, Tae-Hyuk; Watson, Layne T.; Cao, Yang; Shaffer, Clifford A.; Baumann, William T. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2008-11-01)For biochemical systems, where some chemical species are represented by small numbers of molecules, discrete and stochastic approaches are more appropriate than continuous and deterministic approaches. The continuous deterministic approach using ordinary differential equations is adequate for understanding the average behavior of cells, while the discrete stochastic approach accurately captures noisy events in the growth-division cycle. Since the emergence of the stochastic simulation algorithm (SSA) by Gillespie, alternative algorithms have been developed whose goal is to improve the computational efficiency of the SSA. This paper explains and empirically compares the performance of some of these SSA alternatives on a realistic model. The budding yeast cell cycle provides an excellent example of the need for modeling stochastic effects in mathematical modeling of biochemical reactions. This paper presents a stochastic approximation of the cell cycle for budding yeast using Gillespie’s stochastic simulation algorithm. To compare the stochastic results with the average behavior, the simulation must be run thousands of times. Many of the proposed techniques to accelerate the SSA are not effective on the budding yeast problem, because of the scale of the problem or because underlying assumptions are not satisfied. A load balancing algorithm improved overall performance on a parallel supercomputer.
- CodeWorkout: Design and Implementation of an Online Drill-and-Practice System for Introductory ProgrammingPanamalai Murali, Krishnan (Virginia Tech, 2016-06-14)The massive rise in Computer Science enrollments in both traditional classroom courses and in Massively Open Online Courses (MOOCs) shows the enormous opportunities in engaging students to learn programming. While the number of students in CS courses continues to increase, there has been no concomitant increase in the number of instructors for such courses. This leads to a completely lopsided learning environment where the already-stretched instructor is pressed to spend more time on ancillary tasks like grading and course bookkeeping. CodeWorkout is an online drill-and-practice system with course management features that aims to address these issues. CodeWorkout hosts an online repository of programming questions that instructors can incorporate into their courses. It also provides instructors with a facility to create their own programming questions so that exercises can be tailored according to the needs of the class. CodeWorkout has an open gradual engagement model that allows students who are not enrolled in a course to use it. CodeWorkout also creates an open environment for instructors to collaborate by sharing exercises that they create. CodeWorkout has been used in four courses at Virginia Tech. It has been shown to significantly improve the student's skills in introductory programming through providing a number of online practice questions.
- Collaboration Transparency in Java through Event BroadcastingBegole, James M.A.; Struble, Craig A.; Shaffer, Clifford A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1997-02-01)Widespread use of the Internet for education and research is yielding many opportunities for network-based synchronous collaboration. For example, the Java runtime environment provides a platform independent vehicle for new collaborative applications. While many toolkits are becoming available that support development of collaborative applications, they do not enable collaborative use of existing single-user Java applets, a process called collaboration transparency. This paper discusses two approaches to collaboration transparency: display broadcasting and event broadcasting. We then consider the suitability of each within the context of the Java runtime environment. Unfortunately, simple modifications to the standard Java class libraries as they are currently implemented are not sufficient to support collaboration transparency. We describe the problems and suggest solutions.
- Comparison of Computational Notebook Platforms for Interactive Visual Analytics: Case Study of Andromeda ImplementationsLiu, Han (Virginia Tech, 2022-09-22)Existing notebook platforms have different capabilities for supporting visual analytics use. It is not clear which platform to choose for implementing visual analytics notebooks. In this work, we investigated the problem using Andromeda, an interactive dimension reduction algorithm, and implemented it using three different notebook platforms: 1) Python-based Jupyter Notebook, 2) JavaScript-based Observable Notebook, and 3) Jupyter Notebook embedding both Python (data science use) and JavaScript (visual analytics use). We also made comparisons for all the notebook platforms via a case study based on metrics such as programming difficulty, notebook organization, interactive performance, and UI design choice. Furthermore, guidelines are provided for data scientists to choose one notebook platform for implementing their visual analytics notebooks in various situations. Laying the groundwork for future developers, advice is also given on architecting better notebook platforms.