Browsing by Author "Butt, Ali R."
Now showing 1 - 20 of 73
Results Per Page
Sort Options
- Accelerated Storage SystemsKhasymski, Aleksandr Sergeev (Virginia Tech, 2015-03-11)Today's large-scale, high-performance, data-intensive applications put a tremendous stress on data centers to store, index, and retrieve large amounts of data. Exemplified by technologies such as social media, photo and video sharing, and e-commerce, the rise of the real-time web demands data stores support minimal latencies, always-on availability and ever-growing capacity. These requirements have fostered the development of a large number of high-performance storage systems, arguably the most important of which are Key-Value (KV) stores. An emerging trend for achieving low latency and high throughput in this space is a solution, which utilizes both DRAM and flash by storing an efficient index for the data in memory and minimizing accesses to flash, where both keys and values are stored. Many proposals have examined how to improve KV store performance in this area. However, these systems have shortcomings, including expensive sorting and excessive read and write amplification, which is detrimental to the life of the flash. Another trend in recent years equips large scale deployments with energy-efficient, high performance co-processors, such as Graphics Processing Units (GPUs). Recent work has explored using GPUs to accelerate compute-intensive I/O workloads, including RAID parity generation, encryption, and compression. While this research has proven the viability of GPUs to accelerate these workloads, we argue that there are significant benefits to be had by developing methods and data structures for deep integration of GPUs inside the storage stack, in order to achieve better performance, scalability, and reliability. In this dissertation, we propose comprehensive frameworks that leverage emerging technologies, such as GPUs and flash-based SSDs, to accelerate modern storage systems. For our accelerator-based solution, we focus on developing a system that features deep integration of the GPU in a distributed parallel file system. We utilize a framework that builds on the resources available in the file system and coordinates the workload in such a way that minimizes data movement across the PCIe bus, while exposing data parallelism to maximize the potential for acceleration on the GPU. Our research aims to improve the overall reliability of a PFS by developing a distributed per-file parity generation that provides end-to-end data integrity and unprecedented flexibility. Finally, we design a high-performance KV store utilizing a novel data structure tailored to specific flash requirements; it arranges data on flash in such a way as to minimize write amplification, which is detrimental to the flash cells. The system delivers outstanding read amplification through the use of a trie index and false positive filter.
- An Adaptive Framework for Managing Heterogeneous Many-Core ClustersRafique, Muhammad Mustafa (Virginia Tech, 2011-09-22)The computing needs and the input and result datasets of modern scientific and enterprise applications are growing exponentially. To support such applications, High-Performance Computing (HPC) systems need to employ thousands of cores and innovative data management. At the same time, an emerging trend in designing HPC systems is to leverage specialized asymmetric multicores, such as IBM Cell and AMD Fusion APUs, and commodity computational accelerators, such as programmable GPUs, which exhibit excellent price to performance ratio as well as the much needed high energy efficiency. While such accelerators have been studied in detail as stand-alone computational engines, integrating the accelerators into large-scale distributed systems with heterogeneous computing resources for data-intensive computing presents unique challenges and trade-offs. Traditional programming and resource management techniques cannot be directly applied to many-core accelerators in heterogeneous distributed settings, given the complex and custom instruction sets architectures, memory hierarchies and I/O characteristics of different accelerators. In this dissertation, we explore the design space of using commodity accelerators, specifically IBM Cell and programmable GPUs, in distributed settings for data-intensive computing and propose an adaptive framework for programming and managing heterogeneous clusters. The proposed framework provides a MapReduce-based extended programming model for heterogeneous clusters, which distributes tasks between asymmetric compute nodes by considering workload characteristics and capabilities of individual compute nodes. The framework provides efficient data prefetching techniques that leverage general-purpose cores to stage the input data in the private memories of the specialized cores. We also explore the use of an advanced layered-architecture based software engineering approach and provide mixin-layers based reusable software components to enable easy and quick deployment of heterogeneous clusters. The framework also provides multiple resource management and scheduling policies under different constraints, e.g., energy-aware and QoS-aware, to support executing concurrent applications on multi-tenant heterogeneous clusters. When applied to representative applications and benchmarks, our framework yields significantly improved performance in terms of programming efficiency and optimal resource management as compared to conventional, hand-tuned, approaches to program and manage accelerator-based heterogeneous clusters.
- Algorithms and Frameworks for Accelerating Security Applications on HPC PlatformsYu, Xiaodong (Virginia Tech, 2019-09-09)Typical cybersecurity solutions emphasize on achieving defense functionalities. However, execution efficiency and scalability are equally important, especially for real-world deployment. Straightforward mappings of cybersecurity applications onto HPC platforms may significantly underutilize the HPC devices' capacities. On the other hand, the sophisticated implementations are quite difficult: they require both in-depth understandings of cybersecurity domain-specific characteristics and HPC architecture and system model. In our work, we investigate three sub-areas in cybersecurity, including mobile software security, network security, and system security. They have the following performance issues, respectively: 1) The flow- and context-sensitive static analysis for the large and complex Android APKs are incredibly time-consuming. Existing CPU-only frameworks/tools have to set a timeout threshold to cease the program analysis to trade the precision for performance. 2) Network intrusion detection systems (NIDS) use automata processing as its searching core and requires line-speed processing. However, achieving high-speed automata processing is exceptionally difficult in both algorithm and implementation aspects. 3) It is unclear how the cache configurations impact time-driven cache side-channel attacks' performance. This question remains open because it is difficult to conduct comparative measurement to study the impacts. In this dissertation, we demonstrate how application-specific characteristics can be leveraged to optimize implementations on various types of HPC for faster and more scalable cybersecurity executions. For example, we present a new GPU-assisted framework and a collection of optimization strategies for fast Android static data-flow analysis that achieve up to 128X speedups against the plain GPU implementation. For network intrusion detection systems (IDS), we design and implement an algorithm capable of eliminating the state explosion in out-of-order packet situations, which reduces up to 400X of the memory overhead. We also present tools for improving the usability of Micron's Automata Processor. To study the cache configurations' impact on time-driven cache side-channel attacks' performance, we design an approach to conducting comparative measurement. We propose a quantifiable success rate metric to measure the performance of time-driven cache attacks and utilize the GEM5 platform to emulate the configurable cache.
- An Analysis of Conventional & Heterogenous Workloads on Production Supercomputing ResourcesBerkhahn, Jonathan Allen (Virginia Tech, 2013-06-06)Cloud computing setups are a huge investment of resources and personnel to maintain. As
the workload on a system is a major contributing factor to both the performance of the
system and a representation of the needs of system users, a clear understanding of the
workload is critical to organizations that support supercomputing systems. In this paper,
we analyze traces from two production level supercomputers to infer the characteristics of
their workloads, and make observations as to the needs of supercomputer users based on
them. We particularly focus on the usage of graphical processing units by domain
scientists. Based on this analysis, we generate a synthetic workload that can be used for
testing future systems, and make observations as to e"cient resource provisioning. - AnalyzeThis: An Analysis Workflow-Aware Storage SystemSim, Hyogi (Virginia Tech, 2014-12-17)Supercomputing application simulations on hundreds of thousands of cores produce vast amounts of data that need to be analyzed on smaller-scale clusters to glean insights. The process is referred to as an end-to-end workflow. Extant workflow systems are stymied by the storage wall, resulting from both the disk-based parallel file system (PFS) failing to keep pace with the compute and memory subsystems as well as the inefficiencies in end-to-end workflow processing. In the post-petaflop era, supercomputers are provisioned with flash devices, as an intermediary between compute nodes and the PFS, enabling novel paradigms not just for expediting I/O, but also for the in-situ analysis of the simulation output data on the flash device. An array of such active flash elements allows us to fundamentally rethink the way data analysis workflows interact with storage systems. By blending the flash storage array and data analysis together in a seamless fashion, we create an analysis workflow-aware storage system, AnalyzeThis. Our guiding principle is that analysis-awareness be deeply ingrained in each and every layer of the storage system—active flash fabric, analysis object abstraction layer, scheduling layer within the storage, and an easy-to-use file system interface—thereby elevating data analyses as first-class citizens. Together, these concepts transform AnalyzeThis into a potent analytics-aware appliance.
- An Application-Attuned Framework for Optimizing HPC Storage SystemsPaul, Arnab Kumar (Virginia Tech, 2020-08-19)High performance computing (HPC) is routinely employed in diverse domains such as life sciences, and Geology, to simulate and understand the behavior of complex phenomena. Big data driven scientific simulations are resource intensive and require both computing and I/O capabilities at scale. There is a crucial need for revisiting the HPC I/O subsystem to better optimize for and manage the increased pressure on the underlying storage systems from big data processing. Extant HPC storage systems are designed and tuned for a specific set of applications targeting a range of workload characteristics, but they lack the flexibility in adapting to the ever-changing application behaviors. The complex nature of modern HPC storage systems along with the ever-changing application behaviors present unique opportunities and engineering challenges. In this dissertation, we design and develop a framework for optimizing HPC storage systems by making them application-attuned. We select three different kinds of HPC storage systems - in-memory data analytics frameworks, parallel file systems and object storage. We first analyze the HPC application I/O behavior by studying real-world I/O traces. Next we optimize parallelism for applications running in-memory, then we design data management techniques for HPC storage systems, and finally focus on low-level I/O load balance for improving the efficiency of modern HPC storage systems.
- Automatic Internet of Things Device Category Identification using Traffic RatesHsu, Alexander Sirui (Virginia Tech, 2019-03-12)Due to the ever increasing supply of new Internet of Things (IoT) devices being added onto a network, it is vital secure the devices from incoming cyber threats. The manufacturing process of creating and developing a new IoT device allows many new companies to come out with their own device. These devices also increase the network risk because many IoT devices are created without proper security implementation. Utilizing traffic patterns as a method of device type detection will allow behavior identification using only Internet Protocol (IP) header information. The network traffic captured from 20 IoT devices belonging to 4 distinct types (IP camera, on/off switch, motion sensor, and temperature sensor) are generalized and used to identify new devices previously unseen on the network. Our results indicate some categories have patterns that are easier to generalize, while other categories are harder but we are still able recognize some unique characteristics. We also are able to deploy this in a test production network and adapted previous methods to handle streaming traffic and an additional noise categorization capable of identify non-IoT devices. The performance of our model is varied between classes, signifying that much future work has to be done to increase the classification score and overall usefulness.
- Bug Finding Methods for Multithreaded Student Programming ProjectsNaciri, William Malik (Virginia Tech, 2017-08-04)The fork-join framework project is one of the more challenging programming assignments in the computer science curriculum at Virginia Tech. Students in Computer Systems must manage a pool of threads to facilitate the shared execution of dynamically created tasks. This project is difficult because students must overcome the challenges of concurrent programming and conform to the project's specific semantic requirements. When working on the project, many students received inconsistent test results and were left confused when debugging. The suggested debugging tool, Helgrind, is a general-purpose thread error detector. It is limited in its ability to help fix bugs because it lacks knowledge of the specific semantic requirements of the fork-join framework. Thus, there is a need for a special-purpose tool tailored for this project. We implemented Willgrind, a debugging tool that checks the behavior of fork-join frameworks implemented by students through dynamic program analysis. Using the Valgrind framework for instrumentation, checking statements are inserted into the code to detect deadlock, ordering violations, and semantic violations at run-time. Additionally, we extended Willgrind with happens-before based checking in WillgrindPlus. This tool checks for ordering violations that do not manifest themselves in a given execution but could in others. In a user study, we provided the tools to 85 students in the Spring 2017 semester and collected over 2,000 submissions. The results indicate that the tools are effective at identifying bugs and useful for fixing bugs. This research makes multithreaded programming easier for students and demonstrates that special-purpose debugging tools can be beneficial in computer science education.
- Coexistence of Vehicular Communication Technologies and Wi-Fi in the 5 and 6 GHz bandsNaik, Gaurang Ramesh (Virginia Tech, 2020-11-20)The unlicensed wireless spectrum offers exciting opportunities for developing innovative wireless applications. This has been true ever since the 2.4 GHz band and parts of the 5 GHz bands were first opened for unlicensed access worldwide. In recent years, the 5 GHz unlicensed bands have been one of the most coveted for launching new wireless services and applications due to their relatively superior propagation characteristics and the abundance of spectrum therein. However, the appetite for unlicensed spectrum seems to remain unsatiated; the demand for additional unlicensed bands has been never-ending. To meet this demand, regulators in the US and Europe have been considering unlicensed operations in the 5.9 GHz bands and in large parts of the 6 GHz bands. In the last two years alone, the Federal Communications Commission in the US has added more than 1.2 GHz of spectrum in the pool of unlicensed bands. Wi-Fi networks are likely to be the biggest beneficiaries of this spectrum. Such abundance of spectrum would allow massive improvements in the peak throughput and potentially allow a considerable reduction of latency, thereby enabling support for emerging wireless applications such as augmented and virtual reality, and mobile gaming using Wi-Fi over unlicensed bands. However, access to these bands comes with its challenges. Across the globe, a wide range of incumbent wireless technologies operate in the 5 GHz and 6 GHz bands. This includes weather and military radars, and vehicular communication systems in the 5 GHz bands, and fixed-service systems, satellite systems, and television pick-up stations in the 6 GHz bands. Furthermore, due to the development of several cellular-based unlicensed technologies (such as Licensed Assisted Access and New Radio Unlicensed, NR-U), the competition for channel access among unlicensed devices has also been increasing. Thus, coexistence across wireless technologies in the 5 GHz and 6 GHz bands has emerged as an extremely challenging and interesting research problem. In this dissertation, we first take a comprehensive look at the various coexistence scenarios that emerge in the 5 GHz and 6 GHz bands as a consequence of new regulatory decisions. These scenarios include coexistence between Wi-Fi and incumbent users (both in the 5 GHz and 6 GHz bands), coexistence of Wi-Fi and vehicular communication systems, coexistence across different vehicular communication technologies, and coexistence across different unlicensed systems. Since a vast majority of these technologies are fundamentally different from each other and serve diverse use-cases each coexistence problem is unique. Insights derived from an in-depth study of one coexistence problem do not help much when the coexisting technologies change. Thus, we study each scenario separately and in detail. In this process, we highlight the need for the design of novel coexistence mechanisms in several cases and outline potential research directions. Next, we shift our attention to coexistence between Wi-Fi and vehicular communication technologies designed to operate in the 5.9 GHz intelligent transportation systems (ITS) bands. Until the development of Cellular V2X (C-V2X), dedicated short range communications (DSRC) was the only major wireless technology that was designed for communication in high-speed and potentially dense vehicular settings. Since DSRC uses the IEEE 802.11p standard for its physical (PHY) and medium access control (MAC) layers, the manner in which DSRC and Wi-Fi devices try to gain access to the channel is fundamentally similar. Consequently, we show that spectrum sharing between these two technologies in the 5.9 GHz bands can be easily achieved by simple modifications to the Wi-Fi MAC layer. Since the design of C-V2X in 2017, however, the vehicular communication landscape has been fast evolving. Because DSRC systems were not widely deployed, automakers and regulators had an opportunity to look at the two technologies, consider their benefits and drawbacks and take a fresh look at the spectrum sharing scenario. Since Wi-Fi can now potentially share the spectrum with C-V2X at least in certain regions, we take an in-depth look at various Wi-Fi and C-V2X configurations and study whether C-V2X and Wi-Fi can harmoniously coexist with each other. We determine that because C-V2X is built atop cellular LTE, Wi-Fi and C-V2X systems are fundamentally incompatible with each other. If C-V2X and Wi-Fi devices are to share the spectrum, considerable modifications to the Wi-Fi MAC protocol would be required. Another equally interesting scenario arises in the 6 GHz bands, where 5G NR-U and Wi-Fi devices are likely to operate on a secondary shared basis. Since the 6 GHz bands were only recently considered for unlicensed access, these bands are free from Wi-Fi and NR-U devices. As a result, the greenfield 6 GHz bands provide a unique and rare opportunity to freshly evaluate the coexistence between Wi-Fi and cellular-based unlicensed wireless technologies. We study this coexistence problem by developing a stochastic geometry-based analytical model. We see that by disabling the listen before talk based legacy contention mechanism---which has been used by Wi-Fi devices ever since their conception---the performance of both Wi-Fi and NR-U systems can improve. This has important implications in the 6 GHz bands, where such legacy transmissions can indeed be disabled because Wi-Fi devices, for the first time since the design of IEEE 802.11a, can operate in the 6 GHz bands without any backward compatibility issues. In the course of studying the aforementioned coexistence problems, we identified several gaps in the literature on the performance analysis of C-V2X and IEEE 802.11ax---the upcoming Wi-Fi standard. We address three such gaps in this dissertation. First, we study the performance of C-V2X sidelink mode 4, which is the communication mode in C-V2X that allows direct vehicular communications (i.e., without assistance from the cellular infrastructure). Using our in-house standards-compliant network simulator-3 (ns-3) simulator, we perform simulations to evaluate the performance of C-V2X sidelink mode 4 in highway environments. In doing so, we identify that packet re-transmissions, which is a feature introduced in C-V2X to provide frequency and time diversity, thereby improving the system performance, can have the opposite effect if the vehicular density increases. In fact, packet re-transmissions are beneficial for C-V2X system performance only at low vehicular densities. Thus, if vehicles are statically configured to always use/disable re-transmissions, the maximum potential of this feature is not realized. Therefore, we propose a simple and effective, distributed re-transmission control mechanism named Channel Congestion Based Re-transmission Control (C2RC), which leverages the locally available channel sensing results to allow vehicles to autonomously decide when to switch on re-transmissions and when to switch them off. Second, we present a detailed analysis of the performance of Multi User Orthogonal Frequency Division Multiple Access (MU OFDMA)---a feature newly introduced in IEEE 802.11ax---in a wide range of deployment scenarios. We consider the performance of 802.11ax networks when the network comprises of only 802.11ax as well as a combination of 802.11ax and legacy stations. The latter is a practical scenario, especially during the initial phases of 802.11ax deployments. Simulation results, obtained from our ns-3 based simulator, give encouraging signs for 802.11ax performance in many real-world scenarios. That being said, there are some scenarios where naive usage of MU OFDMA by an 802.11ax-capable Wi-Fi AP can be detrimental to the overall system performance. Our results indicate that careful consideration of network dynamics is critical in exploiting the best performance, especially in a heterogeneous Wi-Fi network. Finally, we perform a comprehensive simulation study to characterize the performance of Multi Link Aggregation (MLA) in IEEE 802.11be. MLA is a novel feature that is likely to be introduced in next-generation Wi-Fi (i.e., Wi-Fi 7) devices and is aimed at reducing the worst-case latency experienced by Wi-Fi devices in dense traffic environments. We study the impact of different traffic densities on the 90 percentile latency of Wi-Fi packets and identify that the addition of a single link is sufficient to substantially bring down the 90 percentile latency in many practical scenarios. Furthermore, we show that while the addition of subsequent links is beneficial, the largest latency gain in most scenarios is experienced when the second link (i.e., one additional) link is added. Finally, we show that even in extremely dense traffic conditions, if a sufficient number of links are available at the MLA-capable transmitter and receiver, MLA can help Wi-Fi devices to meet the latency requirements of most real-time applications.
- Computational Cost Analysis of Large-Scale Agent-Based Epidemic SimulationsKamal, Tariq (Virginia Tech, 2016-09-21)Agent-based epidemic simulation (ABES) is a powerful and realistic approach for studying the impacts of disease dynamics and complex interventions on the spread of an infection in the population. Among many ABES systems, EpiSimdemics comes closest to the popular agent-based epidemic simulation systems developed by Eubank, Longini, Ferguson, and Parker. EpiSimdemics is a general framework that can model many reaction-diffusion processes besides the Susceptible-Exposed-Infectious-Recovered (SEIR) models. This model allows the study of complex systems as they interact, thus enabling researchers to model and observe the socio-technical trends and forces. Pandemic planning at the world level requires simulation of over 6 billion agents, where each agent has a unique set of demographics, daily activities, and behaviors. Moreover, the stochastic nature of epidemic models, the uncertainty in the initial conditions, and the variability of reactions require the computation of several replicates of a simulation for a meaningful study. Given the hard timelines to respond, running many replicates (15-25) of several configurations (10-100) (of these compute-heavy simulations) can only be possible on high-performance clusters (HPC). These agent-based epidemic simulations are irregular and show poor execution performance on high-performance clusters due to the evolutionary nature of their workload, large irregular communication and load imbalance. For increased utilization of HPC clusters, the simulation needs to be scalable. Many challenges arise when improving the performance of agent-based epidemic simulations on high-performance clusters. Firstly, large-scale graph-structured computation is central to the processing of these simulations, where the star-motif quality nodes (natural graphs) create large computational imbalances and communication hotspots. Secondly, the computation is performed by classes of tasks that are separated by global synchronization. The non-overlapping computations cause idle times, which introduce the load balancing and cost estimation challenges. Thirdly, the computation is overlapped with communication, which is difficult to measure using simple methods, thus making the cost estimation very challenging. Finally, the simulations are iterative and the workload (computation and communication) may change through iterations, as a result introducing load imbalances. This dissertation focuses on developing a cost estimation model and load balancing schemes to increase the runtime efficiency of agent-based epidemic simulations on high-performance clusters. While developing the cost model and load balancing schemes, we perform the static and dynamic load analysis of such simulations. We also statically quantified the computational and communication workloads in EpiSimdemics. We designed, developed and evaluated a cost model for estimating the execution cost of large-scale parallel agent-based epidemic simulations (and more generally for all constrained producer-consumer parallel algorithms). This cost model uses computational imbalances and communication latencies, and enables the cost estimation of those applications where the computation is performed by classes of tasks, separated by synchronization. It enables the performance analysis of parallel applications by computing its execution times on a number of partitions. Our evaluations show that the model is helpful in performance prediction, resource allocation and evaluation of load balancing schemes. As part of load balancing algorithms, we adopted the Metis library for partitioning bipartite graphs. We have also developed lower-overhead custom schemes called Colocation and MetColoc. We performed an evaluation of Metis, Colocation, and MetColoc. Our analysis showed that the MetColoc schemes gives a performance similar to Metis, but with half the partitioning overhead (runtime and memory). On the other hand, the Colocation scheme achieves a similar performance to Metis on a larger number of partitions, but at extremely lower partitioning overhead. Moreover, the memory requirements of Colocation scheme does not increase as we create more partitions. We have also performed the dynamic load analysis of agent-based epidemic simulations. For this, we studied the individual and joint effects of three disease parameter (transmissiblity, infection period and incubation period). We quantified the effects using an analytical equation with separate constants for SIS, SIR and SI disease models. The metric that we have developed in this work is useful for cost estimation of constrained producer-consumer algorithms, however, it has some limitations. The applicability of the metric is application, machine and data-specific. In the future, we plan to extend the metric to increase its applicability to a larger set of machine architectures, applications, and datasets.
- Data-Intensive Biocomputing in the CloudMeeramohideen Mohamed, Nabeel (Virginia Tech, 2013-09-25)Next-generation sequencing (NGS) technologies have made it possible to rapidly sequence the human genome, heralding a new era of health-care innovations based on personalized genetic information. However, these NGS technologies generate data at a rate that far outstrips Moore\'s Law. As a consequence, analyzing this exponentially increasing data deluge requires enormous computational and storage resources, resources that many life science institutions do not have access to. As such, cloud computing has emerged as an obvious, but still nascent, solution. This thesis intends to investigate and design an efficient framework for running and managing large-scale data-intensive scientific applications in the cloud. Based on the learning from our parallel implementation of a genome analysis pipeline in the cloud, we aim to provide a framework for users to run such data-intensive scientific workflows using a hybrid setup of client and cloud resources. We first present SeqInCloud, our highly scalable parallel implementation of a popular genetic variant pipeline called genome analysis toolkit (GATK), on the Windows Azure HDInsight cloud platform. Together with a parallel implementation of GATK on Hadoop, we evaluate the potential of using cloud computing for large-scale DNA analysis and present a detailed study on efficiently utilizing cloud resources for running data-intensive, life-science applications. Based on our experience from running SeqInCloud on Azure, we present CloudFlow, a feature rich workflow manager for running MapReduce-based bioinformatic pipelines utilizing both client and cloud resources. CloudFlow, built on the top of an existing MapReduce-based workflow manager called Cloudgene, provides unique features that are not offered by existing MapReduce-based workflow managers, such as enabling simultaneous use of client and cloud resources, automatic data-dependency handling between client and cloud resources, and the flexibility of implementing user-defined plugins for data transformations. In-general, we believe that our work attempts to increase the adoption of cloud resources for running data-intensive scientific workloads.
- A Defense-In-Depth Security Architecture for Software Defined Radio SystemsHitefield, Seth D. (Virginia Tech, 2020-01-27)Modern wireless communications systems are constantly evolving and growing more complex. Recently, there has been a shift towards software defined radios due to the flexibility soft- ware implementations provide. This enables an easier development process, longer product lifetimes, and better adaptability for congested environments than conventional hardware systems. However, this shift introduces new attack surfaces where vulnerable implementa- tions can be exploited to disrupt communications or gain unauthorized access to a system. Previous research concerning wireless security mainly focuses on vulnerabilities within pro- tocols rather than in the radios themselves. This dissertation specifically addresses this new threat against software radios and introduces a new security model intended to mitigate this threat. We also demonstrate example exploits of waveforms which can result in either a denial-of-service or a compromise of the system from a wireless attack vector. These example exploits target vulnerabilities such as overflows, unsanitized control inputs, and unexpected state changes. We present a defense-in-depth security architecture for software radios that protects the system by isolating components within a waveform into different security zones. Exploits against vulnerabilities within blocks are contained by isolation zones which protects the rest of the system from compromise. This architecture is inspired by the concept of a microkernel and provides a minimal trusted computing base for developing secure radio systems. Unlike other previous security models, our model protects from exploits within the radio protocol stack itself and not just the higher layer application. Different isolation mechanisms such as containers or virtual machines can be used depending on the security risk imposed by a component and any security requirements. However, adding these isolation environments incurs a performance overhead for applications. We perform an analysis of multiple example waveforms to characterize the impact of isolation environments on the overall performance of an application and demonstrate the overhead generated from the added isolation can be minimal. Because of this, our defense-in-depth architecture should be applied to real-world, production systems. We finally present an example integration of the model within the GNU Radio framework that can be used to develop any waveform using the defense-in-depth se- curity architecture.
- Design and Implementation of the VirtuOS Operating SystemNikolaev, Ruslan (Virginia Tech, 2014-01-21)Most operating systems provide protection and isolation to user processes, but not to critical system components such as device drivers or other systems code. Consequently, failures in these components often lead to system failures. VirtuOS is an operating system that exploits a new method of decomposition to protect against such failures. VirtuOS exploits virtualization to isolate and protect vertical slices of existing OS kernels in separate service domains. Each service domain represents a partition of an existing kernel, which implements a subset of that kernel's functionality. Service domains directly service system calls from user processes. VirtuOS exploits an exceptionless model, avoiding the cost of a system call trap in many cases. We illustrate how to apply exceptionless system calls across virtualized domains. To demonstrate the viability of VirtuOS's approach, we implemented a prototype based on the Linux kernel and Xen hypervisor. We created and evaluated a network and a storage service domain. Our prototype retains compatibility with existing applications, can survive the failure of individual service domains while outperforming alternative approaches such as isolated driver domains and even exceeding the performance of native Linux for some multithreaded workloads. The evaluation of VirtuOS revealed costs due to decomposition, memory management, and communication, which necessitated a fine-grained analysis to understand their impact on the system's performance. The interaction of virtual machines with multiple underlying software and hardware layers in virtualized environment makes this task difficult. Moreover, performance analysis tools commonly used in native environments were not available in virtualized environments. Our work addresses this problem to enable an in-depth performance analysis of VirtuOS. Our Perfctr-Xen framework provides capabilities for per-thread analysis with both accumulative event counts and interrupt-driven event sampling. Perfctr-Xen is a flexible and generic tool, supports different modes of virtualization, and can be used for many applications outside of VirtuOS.
- DeviceGuard: External Device-Assisted System And Data SecurityDeng, Yipan (Virginia Tech, 2011-05-02)This thesis addresses the threat that personal computer faced from malware when the personal computer is connected to the Internet. Traditional host-based security approaches, such as anti-virus scanning protect the host from virus, worms, Trojans and other malwares. One of the issues of the host-based security approaches is that when the operating system is compromised by the malware, the antivirus software also becomes vulnerable. In this thesis, we present a novel approach through using an external device to enhance the host security by offloading the security solution from the host to the external device. We describe the design of the DeviceGuard framework that separate the security solution from the host and offload it to the external device, a Trusted Device. The architecture of the DeviceGuard consists of two components, the DeviceGuard application on the Trusted Device and a DeviceGuard daemon on the host. Our prototype based on Android Development Phone (ADP) shows the feasibilities and efficiency of our approach to provide security features including system file and user data integrity monitoring, secure signing and secure decryption. We use Bluetooth as the communication protocol between the host and the Trusted Device. Our experiment results indicates a practical Bluetooth throughput at about 2M Bytes per second is sufficient for short range communication between the host and the Trusted Device; Message digest with SHA-512, digital signing with 1024 bits signature and secure decryption with AES 256 bits on the Trusted device takes only the scale of 10? and 10? ms for 1K bytes and 1M bytes respectively which are also shows the feasibility and efficiency of the DeviceGuard solution. We also investigated the use of embedded system as the Trusted Device. Our solution takes advantage of the proliferation of devices, such as Smartphone, for stronger system and data security.
- Efficient In-Depth I/O Tracing and its Application for Optimizing SystemsMantri, Sushil Govindnarayan (Virginia Tech, 2014-08-13)Understanding user and system behavior is most vital for designing efficient systems. Most systems are designed with certain user workload in mind. However, such workloads evolve over time, or the underlying hardware assumptions change. Further, most modern systems are not built or deployed in isolation, they interact with other systems whose behavior might not be exactly understood. Thus in order to understand the performance of a system, it must be inspected closely while user workloads are running. Such close inspection must be done with minimum disturbance to the user workload. Thus tracing or collection of all the user and system generated events becomes an important approach in gaining comprehensive insight in user behavior. As part of this work, we have three major contributions. We designed and implemented an in-depth block level I/O tracer, which would collect block level information like sector number, size of the I/O, actual contents of the I/O, along with certain file system information like filename, and offset in the file, for every I/O request. Next, to minimize the impact of the tracing to the running workload, we introduce and implement a sampling mechanism which traces fewer I/O requests. We validate that this sampling preserves certain I/O access patterns. Finally, as one of the application of our tracer, we use it as a crucial component of a system designed to do VM placements according to user workload.
- Empirical Evaluation of Edge Computing for Smart Building Streaming IoT ApplicationsGhaffar, Talha (Virginia Tech, 2019-03-13)Smart buildings are one of the most important emerging applications of Internet of Things (IoT). The astronomical growth in IoT devices, data generated from these devices and ubiquitous connectivity have given rise to a new computing paradigm, referred to as "Edge computing", which argues for data analysis to be performed at the "edge" of the IoT infrastructure, near the data source. The development of efficient Edge computing systems must be based on advanced understanding of performance benefits that Edge computing can offer. The goal of this work is to develop this understanding by examining the end-to-end latency and throughput performance characteristics of Smart building streaming IoT applications when deployed at the resource-constrained infrastructure Edge and to compare it against the performance that can be achieved by utilizing Cloud's data-center resources. This work also presents a real-time streaming application to detect and localize the footstep impacts generated by a building's occupant while walking. We characterize this application's performance for Edge and Cloud computing and utilize a hybrid scheme that (1) offers maximum of around 60% and 65% reduced latency compared to Edge and Cloud respectively for similar throughput performance and (2) enables processing of higher ingestion rates by eliminating network bottleneck.
- An End-to-End High-performance Deduplication Scheme for Docker Registries and Docker Container Storage SystemsZhao, Nannan; Lin, Muhui; Albahar, Hadeel; Paul, Arnab K.; Huan, Zhijie; Abraham, Subil; Chen, Keren; Tarasov, Vasily; Skourtis, Dimitrios; Anwar, Ali; Butt, Ali R. (ACM, 2024)The wide adoption of Docker containers for supporting agile and elastic enterprise applications has led to a broad proliferation of container images. The associated storage performance and capacity requirements place high pressure on the infrastructure of container registries that store and distribute images and container storage systems on the Docker client side that manage image layers and store ephemeral data generated at container runtime. The storage demand is worsened by the large amount of duplicate data in images. Moreover, container storage systems that use Copy-on-Write (CoW) file systems as storage drivers exacerbate the redundancy. Exploiting the high file redundancy in real-world images is a promising approach to drastically reduce the growing storage requirements of container registries and improve the space efficiency of container storage systems. However, existing deduplication techniques significantly degrade the performance of both registries and container storage systems because of data reconstruction overhead as well as the deduplication cost. We propose DupHunter, an end-to-end deduplication that deduplicates layers for both Docker registries and container storage systems while maintaining a high image distribution speed and container I/O performance. DupHunter is divided into 3 tiers: Docker registry tier, middle tier, and client tier. Specifically, we first build a high-performance deduplication engine at the Docker registry tier that not only natively deduplicates layers for space savings but also reduces layer restore overhead. Then, we use deduplication offloading at the middle tier that utilizes the deduplication engine to eliminate the redundant files from the client tier, which avoids introducing deduplication overhead to the Docker client side. To further reduce the data duplicates caused by CoW and improve the container I/O performance, we use a container-aware backing file system at the client tier that preallocates space for each container and ensures that files in a container and its modifications are placed and redirected closer on the disk to maintain locality. Under real workloads, DupHunter reduces storage space by up to 6.9× and reduces the GET layer latency by up to 2.8× compared to the state-of-the-art. Moreover, DupHunter can improve the container I/O performance by up to 93% for reads and 64% for writes.
- Energy-aware Thread and Data Management in Heterogeneous Multi-Core, Multi-Memory SystemsSu, Chun-Yi (Virginia Tech, 2015-02-03)By 2004, microprocessor design focused on multicore scaling"increasing the number of cores per die in each generation "as the primary strategy for improving performance. These multicore processors typically equip multiple memory subsystems to improve data throughput. In addition, these systems employ heterogeneous processors such as GPUs and heterogeneous memories like non-volatile memory to improve performance, capacity, and energy efficiency. With the increasing volume of hardware resources and system complexity caused by heterogeneity, future systems will require intelligent ways to manage hardware resources. Early research to improve performance and energy efficiency on heterogeneous, multi-core, multi-memory systems focused on tuning a single primitive or at best a few primitives in the systems. The key limitation of past efforts is their lack of a holistic approach to resource management that balances the tradeoff between performance and energy consumption. In addition, the shift from simple, homogeneous systems to these heterogeneous, multicore, multi-memory systems requires in-depth understanding of efficient resource management for scalable execution, including new models that capture the interchange between performance and energy, smarter resource management strategies, and novel low-level performance/energy tuning primitives and runtime systems. Tuning an application to control available resources efficiently has become a daunting challenge; managing resources in automation is still a dark art since the tradeoffs among programming, energy, and performance remain insufficiently understood. In this dissertation, I have developed theories, models, and resource management techniques to enable energy-efficient execution of parallel applications through thread and data management in these heterogeneous multi-core, multi-memory systems. I study the effect of dynamic concurrent throttling on the performance and energy of multi-core, non-uniform memory access (NUMA) systems. I use critical path analysis to quantify memory contention in the NUMA memory system and determine thread mappings. In addition, I implement a runtime system that combines concurrent throttling and a novel thread mapping algorithm to manage thread resources and improve energy efficient execution in multi-core, NUMA systems. In addition, I propose an analytical model based on the queuing method that captures important factors in multi-core, multi-memory systems to quantify the tradeoff between performance and energy. The model considers the effect of these factors in a holistic fashion that provides a general view of performance and energy consumption in contemporary systems. Finally, I focus on resource management of future heterogeneous memory systems, which may combine two heterogeneous memories to scale out memory capacity while maintaining reasonable power use. I present a new memory controller design that combines the best aspects of two baseline heterogeneous page management policies to migrate data between two heterogeneous memories so as to optimize performance and energy.
- Epidemiology Experimentation and Simulation Management through Scientific Digital LibrariesLeidig, Jonathan Paul (Virginia Tech, 2012-07-20)Advances in scientific data management, discovery, dissemination, and sharing are changing the manner in which scientific studies are being conducted and repurposed. Data-intensive scientific practices increasingly require data management related services not available in existing digital libraries. Complicating the issue are the diversity of functional requirements and content in scientific domains as well as scientists' lack of expertise in information and library sciences. Researchers that utilize simulation and experimentation systems need digital libraries to maintain datasets, input configurations, results, analyses, and related documents. A digital library may be integrated with simulation infrastructures to provide automated support for research components, e.g., simulation interfaces to models, data warehouses, simulation applications, computational resources, and storage systems. Managing and provisioning simulation content allows streamlined experimentation, collaboration, discovery, and content reuse within a simulation community. Formal definitions of this class of digital libraries provide a foundation for producing a software toolkit and the semi-automated generation of digital library instances. We present a generic, component-based SIMulation-supporting Digital Library (SimDL) framework. The framework is formally described and provides a deployable set of domain-free services, schema-based domain knowledge representations, and extensible lower and higher level service abstractions. Services in SimDL are specialized for semi-structured simulation content and large-scale data producing infrastructures, as exemplified in data storage, indexing, and retrieval service implementations. Contributions to the scientific community include previously unavailable simulation-specific services, e.g., incentivizing public contributions, semi-automated content curating, and memoizing simulation-generated data products. The practicality of SimDL is demonstrated through several case studies in computational epidemiology and network science as well as performance evaluations.
- Evaluating MapReduce System Performance: A Simulation ApproachWang, Guanying (Virginia Tech, 2012-08-27)Scale of data generated and processed is exploding in the Big Data era. The MapReduce system popularized by open-source Hadoop is a powerful tool for the exploding data problem, and is widely employed in many areas involving large scale of data. In many circumstances, hypothetical MapReduce systems must be evaluated, e.g. to provision a new MapReduce system to provide certain performance goal, to upgrade a currently running system to meet increasing business demands, to evaluate novel network topology, new scheduling algorithms, or resource arrangement schemes. The traditional trial-and-error solution involves the time-consuming and costly process in which a real cluster is first built and then benchmarked. In this dissertation, we propose to simulate MapReduce systems and evaluate hypothetical MapReduce systems using simulation. This simulation approach offers significantly lower turn-around time and lower cost than experiments. Simulation cannot entirely replace experiments, but can be used as a preliminary step to reveal potential flaws and gain critical insights. We studied MapReduce systems in detail and developed a comprehensive performance model for MapReduce, including sub-task phase level performance models for both map and reduce tasks and a model for resource contention between multiple processes running in concurrent. Based on the performance model, we developed a comprehensive simulator for MapReduce, MRPerf. MRPerf is the first full-featured MapReduce simulator. It supports both workload simulation and resource contention, and it still offers the most complete features among all MapReduce simulators to date. Using MRPerf, we conducted two case studies to evaluate scheduling algorithms in MapReduce and shared storage in MapReduce, without building real clusters. Furthermore, in order to further integrate simulation and performance prediction into MapReduce systems and leverage predictions to improve system performance, we developed online prediction framework for MapReduce, which periodically runs simulations within a live Hadoop MapReduce system. The framework can predict task execution within a window in near future. These predictions can be used by other components in MapReduce systems in order to improve performance. Our results show that the framework can achieve high prediction accuracy and incurs negligible overhead. We present two potential use cases, prefetching and dynamic adapting scheduler.