Computer Science Technical Reports

Permanent URI for this collection

The Department of Computer Science collection of technical reports began in 1973. Please use the subject headings listed below for all submissions.
Subject Headings:
  • Algorithms
  • Big Data
  • Bioinformatics
  • Computational Biology
  • Computational Science and Engineering
  • Computer Graphics/Animation
  • Computer Science Education
  • Computer Systems
  • Cyberarts
  • Cybersecurity
  • Data and Text Mining
  • Digital Education
  • Digital Libraries
  • Discrete Event Simulation
  • High Performance Computing
  • Human Computer Interaction
  • Information Retrieval
  • Machine Learning
  • Mathematical Programming
  • Mathematical Software
  • Modeling and Simulation
  • Networking
  • Numerical Analysis
  • Parallel and Distributed Computing
  • Problem Solving Environments
  • Software Engineering
  • Theoretical Computer Science
  • Virtual/Augmented Reality
  • Visualization

Browse

Recent Submissions

Now showing 1 - 20 of 1036
  • Visualize This: Lessons from the Front-lines of High Performance Visualization
    Mohammed, Ayat; Polys, Nicholas F.; Farrah, Duncan (Department of Computer Science, Virginia Polytechnic Institute & State University, 2020-04-02)
    This paper presents a comprehensive workflow to address two major factors in multivariate multidimensional (MVMD) scientific visualization: the scalability of rendering and the scalability of representation (for perception). Our workflow integrates the metrics of scientific computing and visualization across di fferent STEM domains to deliver perceivable visualizations that meet scientists’ expectations. Our approach attempts to balance the performance of MVMD visualizations using techniques such as sub-sampling, domain decomposition, and parallel rendering. When mapping data to visual form we considered: the nature of the data (dimensionality, type, and distribution), the computing power (serial or parallel), and the rendering power (rendering mechanism, format, and display spectrum). We used HPC clusters to perform remote parallel processing and visualization of large-scale data sets such as 3D point clouds, galaxy catalogs, and airflow simulations. Our workflow brings these considerations into a structured form to guide the decisions of visualization designers who deal with large heterogeneous data sets.
  • Compiler-Directed Failure Atomicity for Nonvolatile Memory
    Liu, Qingrui; Izraelevitz, Joseph; Lee, Se Kwon; Scott, Michael L.; Noh, Sam H.; Jung, Changhee (Department of Computer Science, Virginia Polytechnic Institute & State University, 2019-07-15)
    This paper presents iDO, a compiler-directed approach to failure atomicity with nonvolatile memory. Unlike most prior work, which instruments each store of persistent data for redo or undo logging, the iDO compiler identifies idempotent instruction sequences, whose re-execution is guaranteed to be side effect-free, thereby eliminating the need to log every persistent store. Using an extension of prior work on JUSTDO logging, the compiler then arranges, during recovery from failure, to back up each thread to the beginning of the current idempotent region and re-execute to the end of the current failure-atomic section. This extension transforms JUSTDO logging from a technique of value only on hypothetical future machines with nonvolatile caches into a technique that also significantly outperforms state-of-the art lock-based persistence mechanisms on current hardware during normal execution, while preserving very fast recovery times.
  • ELASTIN: Achieving Stagnation-Free Intermittent Computation with Boundary-Free Adaptive Execution
    Choi, Jongouk; Joe, Hyunwoo; Kim, Yongjoo; Jung, Changhee (Department of Computer Science, Virginia Polytechnic Institute & State University, 2019-07-15)
    This paper presents ELASTIN, a stagnation-free intermittent computing system for energy-harvesting devices that ensures forward progress in the presence of frequent power outages without partitioning program into recoverable regions or tasks. ELASTIN leverages both timer-based checkpointing of volatile registers and copy-on-write mappings of nonvolatile memory pages to restore them in the wake of power failure. During each checkpoint interval, ELASTIN tracks memory writes on a per-page basis and backs up the original page using custom software-controlled memory protection without MMU or TLB. When a new interval starts at each timer expiration, ELASTIN clears the write permission of all the pages written during the previous interval and checkpoints all registers including a program counter as a recovery point. In particular, ELASTIN dynamically reconfigures both the checkpoint interval and the page size to achieve stagnation-free intermittent computation and maximize forward progress across power outages. The experiments on TI’s MSP430 board with energy harvesting traces show that ELASTIN outperforms the state-of-the-art scheme by 3.5X on average (up to orders of magnitude speedup) and guarantees forward progress.
  • BenchPrime: Accurate Benchmark Subsetting with Optimized Clustering Algorithm Selection
    Liu, Qingrui; Wu, Xiaolong; Kittinger, Larry; Levy, Markus; Jung, Changhee (Department of Computer Science, Virginia Polytechnic Institute & State University, 2018-08-24)
    This paper presents BenchPrime, an automated benchmark analysis toolset that is systematic and extensible to analyze the similarity and diversity of benchmark suites. BenchPrime takes multiple benchmark suites and their evaluation metrics as inputs and generates a hybrid benchmark suite comprising only essential applications. Unlike prior work, BenchPrime uses linear discriminant analysis rather than principal component analysis, as well as selects the best clustering algorithm and the optimized number of clusters in an automated and metric-tailored way, thereby achieving high accuracy. In addition, BenchPrime ranks the benchmark suites in terms of their application set diversity and estimates how unique each benchmark suite is compared to other suites. As a case study, this work for the first time compares the DenBench with the MediaBench and MiBench using four different metrics to provide a multi-dimensional understanding of the benchmark suites. For each metric, BenchPrime measures to what degree DenBench applications are irreplaceable with those in MediaBench and MiBench. This provides means for identifying an essential subset from the three benchmark suites without compromising the application balance of the full set. The experimental results show that the necessity of including DenBench applications varies across the target metrics and that significant redundancy exists among the three benchmark suites.
  • A Composable Workflow for Productive FPGA Computing via Whole-Program Analysis and Transformation (with Code Excerpts)
    Sathre, Paul; Helal, Ahmed E.; Feng, Wu-chun (Department of Computer Science, Virginia Polytechnic Institute & State University, 2018-07-24)
    We present a composable workflow to enable highly-productive heterogeneous computing on FPGAs. The workflow consists of a trio of static analysis and transformation tools: (1) a whole-program, source-to-source translator to transform existing parallel code to OpenCL, (2) a set of OpenCL kernel linters, which target FPGAs to detect possible semantic errors and performance traps, and (3) a whole-program OpenCL linter to validate the host-to-device interface of OpenCL programs. The workflow promotes rapid realization of heterogeneous parallel code across a multitude of heterogeneous computing environments, particularly FPGAs, by providing complementary tools for automatic CUDA-to-OpenCL translation and compile-time OpenCL validation in advance of very expensive compilation, placement, and routing on FPGAs. The proposed tools perform whole-program analysis and transformation to tackle realworld, large-scale parallel applications. The efficacy of the workflow tools is demonstrated via a representative translation and analysis of a sizable CUDA finite automata processing engine as well as the analysis and validation of an additional 96 OpenCL benchmarks.
  • MOANA: Modeling and Analyzing I/O Variability in Parallel System Experimental Design
    Cameron, Kirk W.; Anwar, Ali; Cheng, Yue; Xu, Li; Li, Bo; Ananth, Uday; Lux, Thomas; Hong, Yili; Watson, Layne T.; Butt, Ali R. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2018-04-19)
    Exponential increases in complexity and scale make variability a growing threat to sustaining HPC performance at exascale. Performance variability in HPC I/O is common, acute, and formidable. We take the first step towards comprehensively studying linear and nonlinear approaches to modeling HPC I/O system variability. We create a modeling and analysis approach (MOANA) that predicts HPC I/O variability for thousands of software and hardware configurations on highly parallel shared-memory systems. Our findings indicate nonlinear approaches to I/O variability prediction are an order of magnitude more accurate than linear regression techniques. We demonstrate the use of MOANA to accurately predict the confidence intervals of unmeasured I/O system configurations for a given number of repeat runs – enabling users to quantitatively balance experiment duration with statistical confidence.
  • Modeling Influence using Weak Supervision: A joint Link and Post-level Analysis
    Chen, Liangzhe; Prakash, B. Aditya (Department of Computer Science, Virginia Polytechnic Institute & State University, 2018-04-09)
    Microblogging websites, like Twitter and Weibo, are used by billions of people to create and spread information. This activity depends on various factors such as the friendship links between users, their topic interests and social influence between them. Making sense of these behaviors is very important for fully understanding and utilizing these platforms. Most prior work on modeling social-media either ignores the effect of social influence, or considers its effect only on link formation or post generation. In contrast, in this paper we propose POLIM, which jointly models the effect of influence on both link and post generation, leveraging weak supervision. We also give POLIM-FIT, an efficient parallel inference algorithm for POLIM which scales to large datasets. In our experiments on a large tweets corpus, we detect meaningful topical communities, celebrities, as well as the influence strengths patterns among them. Further, we find that there are significant portions of posts and links that are caused by influence, and this portion increases when the data focuses on a specific event. We also show that differentiating and identifying these influenced content benefits other quantitative downstream tasks as well, like predicting future tweets and link formation.
  • Segmentations with Explanations for Outage Analysis
    Chen, Liangzhe; Muralidhar, Nikhil; Chinthavali, Supriya; Ramakrishnan, Naren; Prakash, B. Aditya (Department of Computer Science, Virginia Polytechnic Institute & State University, 2018-04-09)
    Recent hurricane events have caused unprecedented amounts of damage and severely threatened our public safety and economy. The most observable (and severe) impact of these hurricanes is the loss of electric power in many regions, which causes the breakdown of many public services. Understanding the power outages and how they evolve during a hurricane provide insights on how to reduce outages in the future, and how to improve the robustness of the underlying critical infrastructure systems. In this paper, we propose a novel segmentation with explanations framework to help experts understand such datasets. Our method, CUT-n-REVEAL, first finds a segmentation of the outage sequences to capture pattern changes in the sequences. We then propose a novel explanation optimization problem to find an intuitive explanation of the segmentation, that highlights the culprit of the change. Via extensive experiments, we show that our method performs consistently in multiple datasets with ground truth. We further study real county-level power outage data from several recent hurricanes (Matthew, Harvey, Irma) and show that CUT-n-REVEAL recovers important, nontrivial and actionable patterns for domain experts.
  • GPU Power Prediction via Ensemble Machine Learning for DVFS Space Exploration
    Dutta, Bishwajit; Adhinarayanan, Vignesh; Feng, Wu-chun (Department of Computer Science, Virginia Polytechnic Institute & State University, 2018-02-02)
    A software-based approach to achieve high performance within a power budget often involves dynamic voltage and frequency scaling (DVFS). Consequently, accurately predicting the power consumption of an application at different DVFS levels (or more generally, different processor configurations) is paramount for the energy-efficient functioning of a high-performance computing (HPC) system. The increasing prevalence of graphics processing units (GPUs) in HPC systems presents new multi-dimensional challenges in power management, and machine learning presents an unique opportunity to improve the software-based power management of these HPC systems. As such, we explore the problem of predicting the power consumption of a GPU at different DVFS states via machine learning. Specifically, we perform statistically rigorous experiments to quantify eight machine-learning techniques (i.e., ZeroR, simple linear regression (SLR), KNN, bagging, random forest, sequential minimal optimization regression (SMOreg), decision tree, and neural networks) to predict GPU power consumption at different frequencies. Based on these results, we propose a hybrid ensemble technique that incorporates SMOreg, SLR, and decision tree, which, in turn, reduces the mean absolute error (MAE) to 3.5%.
  • ETH: A Framework for the Design-Space Exploration of Extreme-Scale Visualization
    Abrams, Gregory; Adhinarayanan, Vignesh; Feng, Wu-chun; Rogers, David; Ahrens, Jams; Wilson, Luke (Department of Computer Science, Virginia Polytechnic Institute & State University, 2017-09-29)
    As high-performance computing (HPC) moves towards the exascale era, large-scale scientific simulations are generating enormous datasets. A variety of techniques (e.g., in-situ methods, data sampling, and compression) have been proposed to help visualize these large datasets under various constraints such as storage, power, and energy. However, evaluating these techniques and understanding the various trade-offs (e.g., performance, efficiency, quality) remains a challenging task. To enable the investigation and optimization across such tradeoffs, we propose a toolkit for the early-stage exploration of visualization and rendering approaches, job layout, and visualization pipelines. Our framework covers a broader parameter space than existing visualization applications such as ParaView and VisIt. It also promotes the study of simulation-visualization coupling strategies through a data-centric approach, rather than requiring the code itself. Furthermore, with experimentation on an extensively instrumented supercomputer, we study more metrics of interest than was previously possible. Overall, our framework will help to answer important what-if scenarios and trade-off questions in early stages of pipeline development, helping scientists to make informed choices about how to best couple a simulation code with visualization at extreme scale.
  • CommAnalyzer: Automated Estimation of Communication Cost on HPC Clusters Using Sequential Code
    Helal, Ahmed E.; Jung, Changhee; Feng, Wu-chun; Hanafy, Yasser Y. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2017-08-14)
    MPI+X is the de facto standard for programming applications on HPC clusters. The performance and scalability on such systems is limited by the communication cost on different number of processes and compute nodes. Therefore, the current communication analysis tools play a critical role in the design and development of HPC applications. However, these tools require the availability of the MPI implementation, which might not exist in the early stage of the development process due to the parallel programming effort and time. This paper presents CommAnalyzer, an automated tool for communication model generation from a sequential code. CommAnalyzer uses novel compiler analysis techniques and graph algorithms to capture the inherent communication characteristics of sequential applications, and to estimate their communication cost on HPC systems. The experiments with real-world, regular and irregular scientific applications demonstrate the utility of CommAnalyzer in estimating the communication cost on HPC clusters with more than 95% accuracy on average.
  • Personal Reflections on 50 Years of Scientific Computing: 1967–2017
    Watson, Layne T. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2017-08-10)
    Computer hardware, software, numerical algorithms, and science and engineering applications are traced for a half century from the author's perspective.
  • Understanding Recurring Software Quality Problems of Novice Programmers
    Techapalokul, Peeratham; Tilevich, Eli (Department of Computer Science, Virginia Polytechnic Institute & State University, 2017-07-12)
    It remains unclear when is the right time to introduce software quality into the computing curriculum. Introductory students often cannot afford to also worry about software quality, while advanced students may have been groomed into undisciplined development practices already. To be able to answer these questions, educators need strong quantitative evidence about the persistence of software quality problems in programs written by novice programmers. This technical report presents a comprehensive study of software quality in programs written by novice programmers. By leveraging the patterns of recurring quality problems, known as code smells, we analyze a longitudinal dataset of more than 100 novice Scratch programmers and close to 3,000 of their programs. Even after gaining proficiency, students continue to introduce certain quality problems into their programs, suggesting the need for educational interventions. Given the importance of software quality for modern society, computing educators should teach quality-promoting practices alongside the core computing concepts.
  • DroidCat: Unified Dynamic Detection of Android Malware
    Cai, Haipeng; Meng, Na; Ryder, Barbara G.; Yao, Danfeng (Daphne) (Department of Computer Science, Virginia Polytechnic Institute & State University, 2016)
    Various dynamic approaches have been developed to detect or categorize Android malware. These approaches execute software, collect call traces, and then detect abnormal system calls or sensitive API usage. Consequently, attackers can evade these approaches by intentionally obfuscating those calls under focus. Additionally, existing approaches treat detection and categorization of malware as separate tasks, although intuitively both tasks are relevant and could be performed simultaneously. This paper presents DroidCat, the first unified dynamic malware detection approach, which not only detects malware, but also pinpoints the malware family. DroidCat leverages supervised machine learning to train a multi-class classifier using diverse behavioral profiles of benign apps and different kinds of malware. Compared with prior heuristics-based machine learning-based approaches, the feature set used in DroidCat is decided purely based on a systematic dynamic characterization study of benign and malicious apps. All differentiating features that show behavioral differences between benign and malicious apps are included. In this way, DroidCat is robust to existing evasion attacks. We evaluated DroidCat using leave-one-out cross validation with 136 benign apps and 135 malicious apps. The evaluation shows that DroidCat provided an effective and scalable unified malware detection solution with 81% precision, 82% recall, and 92% accuracy.
  • A Numerical Investigation of Matrix-Free Implicit Time-Stepping Methods for Large CFD Simulations
    Sarshar, Arash; Tranquilli, Paul; Pickering, Brent P.; McCall, Andrew; Sandu, Adrian; Roy, Christopher J. (2016)
    This paper is concerned with the development and testing of advanced time-stepping methods suited for the integration of time-accurate, real-world applications of computational fluid dynamics (CFD). The performance of several time discretization methods is studied numerically with regards to computational efficiency, order of accuracy, and stability, as well as the ability to treat effectively stiff problems. We consider matrix-free implementations, a popular approach for time-stepping methods applied to large CFD applications due to its adherence to scalable matrix-vector operations and a small memory footprint. We compare explicit methods with matrix-free implementations of implicit, linearly-implicit, as well as Rosenbrock-Krylov methods. We show that Rosenbrock-Krylov methods are competitive with existing techniques excelling for a number of prob- lem types and settings.
  • LIRK-W: Linearly-implicit Runge-Kutta methods with approximate matrix factorization
    Tranquilli, Paul; Sandu, Adrian; Zhang, H. (2016-11-22)
    This paper develops a new class of linearly implicit time integration schemes called Linearly-Implicit Runge-Kutta-W (LIRK-W) methods. These schemes are based on an implicit-explicit approach which does not require a splitting of the right hand side and allow for arbitrary, time dependent, and stage varying approximations of the linear systems appearing in the method. Several formulations of LIRK-W schemes, each designed for specific approximation types, and their associated order condition theories are presented.
  • Multivariate predictions of local reduced-order-model errors and dimensions
    Moosavi, Azam; Stefanescu, Razvan; Sandu, Adrian (2017-01-16)
    This paper introduces multivariate input-output models to predict the errors and bases dimensions of local parametric Proper Orthogonal Decomposition reduced-order models. We refer to these multivariate mappings as the MP-LROM models. We employ Gaussian Processes and Artificial Neural Networks to construct approximations of these multivariate mappings. Numerical results with a viscous Burgers model illustrate the performance and potential of the machine learning based regression MP-LROM models to approximate the characteristics of parametric local reduced-order models. The predicted reduced-order models errors are compared against the multi-fidelity correction and reduced order model error surrogates methods predictions, whereas the predicted reduced-order dimensions are tested against the standard method based on the spectrum of snapshots matrix. Since the MP-LROM models incorporate more features and elements to construct the probabilistic mappings they achieve more accurate results. However, for high-dimensional parametric spaces, the MP-LROM models might suffer from the curse of dimensionality. Scalability challenges of MP-LROM models and the feasible ways of addressing them are also discussed in this study.
  • AutoMatch: Automated Matching of Compute Kernels to Heterogeneous HPC Architectures
    Helal, Ahmed E.; Feng, Wu-chun; Jung, Changhee; Hanafy, Yasser Y. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2016-12-13)
    HPC systems contain a wide variety of heterogeneous computing resources, ranging from general-purpose CPUs to specialized accelerators. Porting sequential applications to such systems for achieving high performance requires significant software and hardware expertise as well as extensive manual analysis of both the target architectures and applications to decide the best performing architecture and implementation technique for each application. To streamline this tedious process, this paper presents AutoMatch, a tool for automated matching of compute kernels to heterogeneous HPC architectures. AutoMatch analyzes the sequential application code and automatically predicts the performance of the best parallel implementation of its compute kernels on different hardware architectures. AutoMatch leverages such prediction results to identify the best device for each kernel from a set of devices including multi-core CPUs and many-core GPUs. In addition, it estimates the relative execution cost between the different architectures to drive a workload distribution scheme, which enables end users to efficiently exploit the available compute resources across multiple heterogeneous architectures. We demonstrate the efficacy of AutoMatch, using a set of open-source HPC applications and benchmarks with different parallelism profiles and memory-access patterns. The empirical evaluation shows that AutoMatch is highly accurate across five different heterogeneous architectures, identifying the best architecture for each workload in 96% of the test cases, and its workload distribution scheme has a comparable performance to a profiling-driven oracle.
  • Understanding Application Behaviours for Android Security: A Systematic Characterization
    Cai, Haipeng; Ryder, Barbara G. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2016)
    In contrast to most existing research on Android focusing on specific security issues, there is little broad understanding of Android application run-time characteristics and their security implications. To mitigate this gap, we present the first dynamic characterization study of Android applications that targets such a broad understanding for Android security. Through lightweight method-level profiling, we have collected 33GB traces of method calls and inter-component communication (ICC) from 114 popular Android applications on Google Play and 61 communicating pairs among them that enabled an extensive empirical investigation of the run-time behaviours of Android applications. Our study revealed that (1) the Android framework was the target of 88.3% of all calls during application executions, (2) callbacks accounted for merely 3% of the total method calls, (3) 75% of ICCs did not carry any data payloads with those doing so preferring bundles over URIs, (4) 85% of sensitive data sources and sinks targeted one or two top categories of information or operations which were also most likely to constitute data leaks. We discuss the security implications of our findings to secure development and effective security defense of modern Android applications.
  • Accelerating Workloads on FPGAs via OpenCL: A Case Study with OpenDwarfs
    Verma, Anshuman; Helal, Ahmed E.; Krommydas, Konstantinos; Feng, Wu-chun (Department of Computer Science, Virginia Polytechnic Institute & State University, 2016-05-13)
    For decades, the streaming architecture of FPGAs has delivered accelerated performance across many application domains, such as option pricing solvers in finance, computational fluid dynamics in oil and gas, and packet processing in network routers and firewalls. However, this performance comes at the expense of programmability. FPGA developers use hardware design languages (HDLs) to implement the application data and control path and to design hardware modules for computational pipelines, memory management, synchronization, and communication. This process requires extensive knowledge of logic design, design automation tools, and low-level details of FPGA architecture, this consumes significant development time and effort. To address this lack of programmability of FPGAs, OpenCL provides an easy-to-use and portable programming model for CPUs, GPUs, APUs, and now, FPGAs. Although this significantly improved programmability yet an optimized GPU implementation of kernel may lack performance portability for FPGA. To improve the performance of OpenCL kernels on FPGAs we identify general techniques to optimize OpenCL kernels for FPGAs under device-specific hardware constraints. We then apply these optimizations techniques to the OpenDwarfs benchmark suite, which has diverse parallelism profiles and memory access patterns, in order to evaluate the effectiveness of the optimizations in terms of performance and resource utilization. Finally, we present the performance of structured grids and N-body dwarf-based benchmarks in the context of various optimization along with their potential re-factoring. We find that careful design of kernels for FPGA can result in a highly efficient pipeline achieving 91% of theoretical throughput for the structured grids dwarf. Index Terms—OpenDwarfs; FPGA; OpenCL; GPU; MIC; Accelerators; Performance Portability