Browsing by Author "Tilevich, Eli"
Now showing 1 - 20 of 121
Results Per Page
Sort Options
- An Adaptive Framework for Managing Heterogeneous Many-Core ClustersRafique, Muhammad Mustafa (Virginia Tech, 2011-09-22)The computing needs and the input and result datasets of modern scientific and enterprise applications are growing exponentially. To support such applications, High-Performance Computing (HPC) systems need to employ thousands of cores and innovative data management. At the same time, an emerging trend in designing HPC systems is to leverage specialized asymmetric multicores, such as IBM Cell and AMD Fusion APUs, and commodity computational accelerators, such as programmable GPUs, which exhibit excellent price to performance ratio as well as the much needed high energy efficiency. While such accelerators have been studied in detail as stand-alone computational engines, integrating the accelerators into large-scale distributed systems with heterogeneous computing resources for data-intensive computing presents unique challenges and trade-offs. Traditional programming and resource management techniques cannot be directly applied to many-core accelerators in heterogeneous distributed settings, given the complex and custom instruction sets architectures, memory hierarchies and I/O characteristics of different accelerators. In this dissertation, we explore the design space of using commodity accelerators, specifically IBM Cell and programmable GPUs, in distributed settings for data-intensive computing and propose an adaptive framework for programming and managing heterogeneous clusters. The proposed framework provides a MapReduce-based extended programming model for heterogeneous clusters, which distributes tasks between asymmetric compute nodes by considering workload characteristics and capabilities of individual compute nodes. The framework provides efficient data prefetching techniques that leverage general-purpose cores to stage the input data in the private memories of the specialized cores. We also explore the use of an advanced layered-architecture based software engineering approach and provide mixin-layers based reusable software components to enable easy and quick deployment of heterogeneous clusters. The framework also provides multiple resource management and scheduling policies under different constraints, e.g., energy-aware and QoS-aware, to support executing concurrent applications on multi-tenant heterogeneous clusters. When applied to representative applications and benchmarks, our framework yields significantly improved performance in terms of programming efficiency and optimal resource management as compared to conventional, hand-tuned, approaches to program and manage accelerator-based heterogeneous clusters.
- Advancing the Development and Utilization of Data Infrastructure for Smart HomesAnik, Sheik Murad Hassan (Virginia Tech, 2024-09-12)The smart home era is inevitably arising towards our everyday life. However, the scarcity of publicly available data remains a major hurdle in the domain, limiting people's capability of performing data analysis and their effectiveness in creating smart home automations. To mitigate this hurdle and its influence, our research explored three research directions to (1) create a better infrastructure that effectively collects and visualizes indoor-environment sensing data, (2) create a machine learning-based approach to demonstrate a novel way of analyzing indoor-environment data to facilitate human-centered building design, and (3) conduct an empirical study to explore the challenges and opportunities in existing smart home development. Specifically, we conducted three research projects. First, we created an open-source IoT-based cost-effective, distributed, scalable, and portable indoor environmental data collection system, Building Data Lite (BDL). We deployed this research prototype in 12 households, which deployment so far has collected more than 2 million records that are available to public in general. Second, building occupant persona is a very important component in human-centered smart home design, so we investigated an approach of applying state-of-the-art machine-learning models to data collected by an existing infrastructure, to enable the automatic creation of building occupant persona while minimizing human effort. Third, Home Assistant (HA) is an open-source off-the-shelf smart home platform that users frequently use to transform their residences into smart homes. However, many users seem to be stuck with the configuration scripts of home automations. We conducted an empirical study by (1) crawling posts on HA forum, (2) manually analyzing those posts to understand users' common technical concerns as well as frequently recommended resolutions, and (3) applying existing tools to assess the tool usefulness in alleviating users' pain. All our research projects will shed light on future directions in smart home design and development.
- Algorithm Visualization: The State of the FieldCooper, Matthew Lenell (Virginia Tech, 2007-04-19)We report on the state of the field of algorithm visualization, both quantitatively and qualitatively. Computer science educators seem to find algorithm and data structure visualizations attractive for their classrooms. Educational research shows that some are effective while many are not. Clearly, then, visualizations are difficult to create and use right. There is little in the way of a supporting community, and many visualizations are downright poor. Topic distribution is heavily skewed towards simple concepts with advanced topics receiving little to no attention. We have cataloged nearly 400 visualizations available on the Internet. We have a wiki-based catalog which includes availability, platform, strengths and weaknesses, responsible personnel and institutions, and other data about each visualization. We have developed extraction and analysis tools to gather statistics about the corpus of visualizations. Based on analysis of this collection, we point out areas where improvements may be realized and suggest techniques for implementing such improvements. We pay particular attention to the free and open source software movement as a model which the visualization community may do well to emulate, from both a software engineering perspective and a community-building standpoint.
- Analysis and Abstraction of Parallel Sequence SearchGoddard, Christopher Joseph (Virginia Tech, 2007-09-05)The ability to compare two biological sequences is extremely valuable, as matches can suggest evolutionary origins of genes or the purposes of particular amino acids. Results of such comparisons can be used in the creation of drugs, can help combat newly discovered viruses, or can assist in treating diseases. Unfortunately, the rate of sequence acquisition is outpacing our ability to compute on these data. Further, traditional dynamic programming algorithms are too slow to meet the needs of biologists, who wish to compare millions of sequences daily. While heuristic algorithms improve upon the performance of these dated applications, they still cannot keep up with the steadily expanding search space. Parallel sequence search implementations were developed to address this issue. By partitioning databases into work units for distributed computation, applications like mpiBLAST are able to achieve super-linear speedup over their sequential counterparts. However, such implementations are limited to clusters and require significant effort to work in a grid environment. Further, their parallelization strategies are typically specific to the target sequence search, so future applications require a reimplementation if they wish to run in parallel. This thesis analyzes the performance of two versions of mpiBLAST, noting trends as well as differences between them. Results suggest that these embarrassingly parallel applications are dominated by the time required to search vast amounts of data, and not by the communication necessary to support such searches. Consequently, a framework named gridRuby is introduced which alleviates two main issues with current parallel sequence search applications; namely, the requirement of a tightly knit computing environment and the specific, hand-crafted nature of parallelization. Results show that gridRuby can parallelize an application across a set of machines through minimal implementation effort, and can still exhibit super-linear speedup.
- Analysis of the Relationships between Changes in Distributed System Behavior and Group DynamicsLazem, Shaimaa (Virginia Tech, 2012-04-06)The rapid evolution of portable devices and social media has enabled pervasive forms of distributed cooperation. A group could perform a task using a heterogeneous set of the devices (desktop, mobile), connections (wireless, wired, 3G) and software clients. We call this form of systems Distributed Dynamic Cooperative Environments (DDCEs). Content in DDCEs is created and shared by the users. The content could be static (e.g., video or audio), dynamic (e.g.,wikis), and/or Objects with behavior. Objects with behavior are programmed objects that take advantage of the available computational services (e.g., cloud-based services). Providing a desired Quality of Experience (QoE) in DDCEs is a challenge for cooperative systems designers. DDCEs are expected to provide groups with the utmost flexibility in conducting their cooperative activities. More flexibility at the user side means less control and predictability of the groups' behavior at the system side. Due to the lack of Quality of Service (QoS) guarantees in DDCEs, groups may experience changes in the system behavior that are usually manifested as delays and inconsistencies in the shared state. We question the extent to which cooperation among group members is sensitive to system changes in DDCEs. We argue that a QoE definition for groups should account for cooperation emergence and sustainability. An experiment was conducted, where fifteen groups performed a loosely-coupled task that simulates social traps in a 3D virtual world. The groups were exposed to two forms of system delays. Exo-content delays are exogenous to the provided content (e.g., network delay). Endo-content delays are endogenous to the provided content (e.g., delay in processing time for Objects with behavior). Groups' performance in the experiment and their verbal communication have been recorded and analyzed. The results demonstrate the nonlinearity of groups' behavior when dealing with endo-content delays. System interventions are needed to maintain QoE even though that may increase the cost or the required resources. Systems are designed to be used rather than understood by users. When the system behavior changes, designers have two choices. The first is to expect the users to understand the system behavior and adjust their interaction accordingly. That did not happen in our experiment. Understanding the system behavior informed groups' behavior. It partially influenced how the groups succeeded or failed in accomplishing its goal. The second choice is to understand the semantics of the application and provide guarantees based on these semantics. Based on our results, we introduce the following design guidelines for QoE provision in DDCEs. • If possible the system should keep track of information about group goals and add guarding constraints to protect these goals. • QoE guarantees should be provided based on the semantics of the user-generated content that constitutes the group activity. • Users should be given the option to define the content that is sensitive to system changes (e.g., Objects with behavior that are sensitive to delays or require intensive computations) to avoid the negative impacts of endo-content delays. • Users should define the Objects with behavior that contribute to the shared state in order for the system to maintain the consistency of the shared state. • Endo-content delays were proven to have significantly negative impacts on the groups in our experiment compared to exo-content delays. We argue that system designers, if they have the choice, should trade processing time needed for Objects with behavior for exo-content delay.
- An Application-Attuned Framework for Optimizing HPC Storage SystemsPaul, Arnab Kumar (Virginia Tech, 2020-08-19)High performance computing (HPC) is routinely employed in diverse domains such as life sciences, and Geology, to simulate and understand the behavior of complex phenomena. Big data driven scientific simulations are resource intensive and require both computing and I/O capabilities at scale. There is a crucial need for revisiting the HPC I/O subsystem to better optimize for and manage the increased pressure on the underlying storage systems from big data processing. Extant HPC storage systems are designed and tuned for a specific set of applications targeting a range of workload characteristics, but they lack the flexibility in adapting to the ever-changing application behaviors. The complex nature of modern HPC storage systems along with the ever-changing application behaviors present unique opportunities and engineering challenges. In this dissertation, we design and develop a framework for optimizing HPC storage systems by making them application-attuned. We select three different kinds of HPC storage systems - in-memory data analytics frameworks, parallel file systems and object storage. We first analyze the HPC application I/O behavior by studying real-world I/O traces. Next we optimize parallelism for applications running in-memory, then we design data management techniques for HPC storage systems, and finally focus on low-level I/O load balance for improving the efficiency of modern HPC storage systems.
- Applying Dynamic Software Updates to Computationally-Intensive ApplicationsKim, Dong Kwan (Virginia Tech, 2009-06-22)Dynamic software updates change the code of a computer program while it runs, thus saving the programmer's time and using computing resources more productively. This dissertation establishes the value of and recommends practices for applying dynamic software updates to computationally-intensive applications—a computing domain characterized by long-running computations, expensive computing resources, and a tedious deployment process. This dissertation argues that updating computationally-intensive applications dynamically can reduce their time-to-discovery metrics—the total time it takes from posing a problem to arriving at a solution—and, as such, should become an intrinsic part of their software lifecycle. To support this claim, this dissertation presents the following technical contributions: (1) a distributed consistency algorithm for synchronizing dynamic software updates in a parallel HPC application, (2) an implementation of the Proxy design pattern that is more efficient than the existing implementations, and (3) a dynamic update approach for Java Virtual Machine (JVM)-based applications using the Proxy pattern to offer flexibility and efficiency advantages, making it suitable for computationally-intensive applications. The contributions of this dissertation are validated through performance benchmarks and case studies involving computationally-intensive applications from the bioinformatics and molecular dynamics simulation domains.
- Applying Natural Language Processing and Deep Learning Techniques for Raga Recognition in Indian Classical MusicPeri, Deepthi (Virginia Tech, 2020-08-27)In Indian Classical Music (ICM), the Raga is a musical piece's melodic framework. It encompasses the characteristics of a scale, a mode, and a tune, with none of them fully describing it, rendering the Raga a unique concept in ICM. The Raga provides musicians with a melodic fabric, within which all compositions and improvisations must take place. Identifying and categorizing the Raga is challenging due to its dynamism and complex structure as well as the polyphonic nature of ICM. Hence, Raga recognition—identify the constituent Raga in an audio file—has become an important problem in music informatics with several known prior approaches. Advancing the state of the art in Raga recognition paves the way to improving other Music Information Retrieval tasks in ICM, including transcribing notes automatically, recommending music, and organizing large databases. This thesis presents a novel melodic pattern-based approach to recognizing Ragas by representing this task as a document classification problem, solved by applying a deep learning technique. A digital audio excerpt is hierarchically processed and split into subsequences and gamaka sequences to mimic a textual document structure, so our model can learn the resulting tonal and temporal sequence patterns using a Recurrent Neural Network. Although training and testing on these smaller sequences, we predict the Raga for the entire audio excerpt, with the accuracy of 90.3% for the Carnatic Music Dataset and 95.6% for the Hindustani Music Dataset, thus outperforming prior approaches in Raga recognition.
- Architectural Refactoring for Fast and Modular Bioinformatics Sequence SearchArchuleta, Jeremy; Tilevich, Eli; Feng, Wu-chun (Department of Computer Science, Virginia Polytechnic Institute & State University, 2006-09-01)Bioinformaticists use the Basic Local Alignment Search Tool (BLAST) to characterize an unknown sequence by comparing it against a database of known sequences, thus detecting evolutionary relationships and biological properties. mpiBLAST is a widely-used, high-performance, open-source parallelization of BLAST that runs on a computer cluster delivering super-linear speedups. However, the Achilles heel of mpiBLAST is its lack of modularity, adversely affecting maintainability and extensibility; an effective architectural refactoring will benefit both users and developers. This paper describes our experiences in the architectural refactoring of mpiBLAST into a modular, high-performance software package. Our evaluation of five component-oriented designs culminated in a design that enables modularity while retaining high-performance. Furthermore, we achieved this refactoring effectively and efficiently using eXtreme Programming techniques. These experiences will be of value to software engineers faced with the challenge of creating maintainable and extensible, high-performance, bioinformatics software.
- Assessing Agile Methods: Investigating Adequacy, Capability, and Effectiveness (An Objectives, Principles, Strategies Approach)Soundararajan, Shvetha (Virginia Tech, 2013-06-10)Agile methods provide an organization or a team with the flexibility to adopt a selected subset of principles and practices based on their culture, their values, and the types of systems that they develop. More specifically, every organization or team implements a customized agile method, tailored to better accommodate its needs. However, the extent to which a customized method supports the organizational objectives, i.e. the 'goodness' of that method, should be demonstrable. Existing agile assessment approaches focus on comparative analyses, or are limited in scope and application. In this research, we propose a systematic, comprehensive approach to assessing the 'goodness' of agile methods. We examine an agile method based on (1) its adequacy, (2) the capability of the organization to support the adopted principles and strategies specified by the method, and (3) the method's effectiveness. We propose the Objectives, Principles and Strategies (OPS) Framework to guide our assessment process. The Framework identifies (a) objectives of the agile philosophy, (b) principles that support the objectives and (c) strategies that implement the principles. It also defines (d) linkages that relate objectives to principles, and principles to strategies, and finally, (e) indicators for assessing the extent to which an organization supports the implementation and effectiveness of those strategies. The propagation of indicator values along the linkages provides a multi-level assessment view of the agile method. In this dissertation, we present our assessment methodology, guiding Framework, validation approach, results and findings, and future directions.
- Automated Adaptive Software Maintenance: A Methodology and Its ApplicationsTansey, Wesley (Virginia Tech, 2008-05-22)In modern software development, maintenance accounts for the majority of the total cost and effort in a software project. Especially burdensome are those tasks which require applying a new technology in order to adapt an application to changed requirements or a different environment. This research explores methodologies, techniques, and approaches for automating such adaptive maintenance tasks. By combining high-level specifications and generative techniques, a new methodology shapes the design of approaches to automating adaptive maintenance tasks in the application domains of high performance computing (HPC) and enterprise software. Despite the vast differences of these domains and their respective requirements, each approach is shown to be effective at alleviating their adaptive maintenance burden. This thesis proves that it is possible to effectively automate tedious and error-prone adaptive maintenance tasks in a diverse set of domains by exploiting high-level specifications to synthesize specialized low-level code. The specific contributions of this thesis are as follows: (1) a common methodology for designing automated approaches to adaptive maintenance, (2) a novel approach to automating the generation of efficient marshaling logic for HPC applications from a high-level visual model, and (3) a novel approach to automatically upgrading legacy enterprise applications to use annotation-based frameworks. The technical contributions of this thesis have been realized in two software tools for automated adaptive maintenance: MPI Serializer, a marshaling logic generator for MPI applications, and Rosemari, an inference and transformation engine for upgrading enterprise applications. This thesis is based on research papers accepted to IPDPS '08 and OOPSLA '08.
- Automated Assessment of Student-written Tests Based on Defect-detection CapabilityShams, Zalia (Virginia Tech, 2015-05-05)Software testing is important, but judging whether a set of software tests is effective is difficult. This problem also appears in the classroom as educators more frequently include software testing activities in programming assignments. The most common measures used to assess student-written software tests are coverage criteria—tracking how much of the student’s code (in terms of statements, or branches) is exercised by the corresponding tests. However, coverage criteria have limitations and sometimes overestimate the true quality of the tests. This dissertation investigates alternative measures of test quality based on how many defects the tests can detect either from code written by other students—all-pairs execution—or from artificially injected changes—mutation analysis. We also investigate a new potential measure called checked code coverage that calculates coverage from the dynamic backward slices of test oracles, i.e. all statements that contribute to the checked result of any test. Adoption of these alternative approaches in automated classroom grading systems require overcoming a number of technical challenges. This research addresses these challenges and experimentally compares different methods in terms of how well they predict defect-detection capabilities of student-written tests when run against over 36,500 known, authentic, human-written errors. For data collection, we use CS2 assignments and evaluate students’ tests with 10 different measures—all-pairs execution, mutation testing with four different sets of mutation operators, checked code coverage, and four coverage criteria. Experimental results encompassing 1,971,073 test runs show that all-pairs execution is the most accurate predictor of the underlying defect-detection capability of a test suite. The second best predictor is mutation analysis with the statement deletion operator. Further, no strong correlation was found between defect-detection capability and coverage measures.
- Automated Cross-Platform Code Synthesis from Web-Based Programming ResourcesByalik, Antuan (Virginia Tech, 2015-08-04)For maximal market penetration, popular mobile applications are typically supported on all major platforms, including Android and iOS. Despite the vast differences in the look-and-feel of major mobile platforms, applications running on these platforms in essence provide the same core functionality. As an application is maintained and evolved, programmers need to replicate the resulting changes on all the supported platforms, a tedious and error-prone programming process. Commercial automated source-to-source translation tools prove inadequate due to the structural and idiomatic differences in how functionalities are expressed across major platforms. In this thesis, we present a new approach---Native-2-Native---that automatically synthesizes code for a mobile application to make use of native resources on one platform, based on the equivalent program transformations performed on another platform. First, the programmer modifies a mobile application's Android version to make use of some native resource, with a plugin capturing code changes. Based on the changes, the system then parameterizes a web search query over popular programming resources (e.g., Google Code, StackOverflow, etc.), to discover equivalent iOS code blocks with the closest similarity to the programmer-written Android code. The discovered iOS code block is then presented to the programmer as an automatically synthesized Swift source file to further fine-tune and subsequently integrate in the mobile application's iOS version. Our evaluation, enhancing mobile applications to make use of common native resources, shows that the presented approach can correctly synthesize more than 86% of Swift code for the subject applications' iOS versions.
- Automated Identification and Application of Code Refactoring in Scratch to Promote the Culture Quality from the Ground upTechapalokul, Peeratham (Virginia Tech, 2020-06-04)Much of software engineering research and practice is concerned with improving software quality. While enormous prior efforts have focused on improving the quality of programs, this dissertation instead provides the means to educate the next generation of programmers who care deeply about software quality. If they embrace the culture of quality, these programmers would be positioned to drastically improve the quality of the software ecosystem. This dissertation describes novel methodologies, techniques, and tools for introducing novice programmers to software quality and its systematic improvement. This research builds on the success of Scratch, a popular novice-oriented block-based programming language, to support the learning of code quality and its improvement. This dissertation improves the understanding of quality problems of novice programmers, creates analysis and quality improvement technologies, and develops instructional approaches for teaching quality improvement. The contributions of this dissertation are as follows. (1) We identify twelve code smells endemic to Scratch, show their prevalence in a large representative codebase, and demonstrate how they hinder project reuse and communal learning. (2) We introduce four new refactorings for Scratch, develop an infrastructure to support them in the Scratch programming environment, and evaluate their effectiveness for the target audience. (3) We study the impact of introducing code quality concepts alongside the fundamentals of programming with and without automated refactoring support. Our findings confirm that it is not only feasible but also advantageous to promote the culture of quality from the ground up. The contributions of this dissertation can benefit both novice programmers and introductory computing educators.
- Automatic Restoration and Management of Computational NotebooksVenkatesan, Satish (Virginia Tech, 2022-03-03)Computational Notebook platforms are very commonly used by programmers and data scientists. However, due to the interactive development environment of notebooks, developers struggle to maintain effective code organization which has an adverse effect on their productivity. In this thesis, we research and develop techniques to help solve issues with code organization that developers face in an effort to improve productivity. Notebooks are often executed out of order which adversely effects their portability. To determine cell execution orders in computational notebooks, we develop a technique that determines the execution order for a given cell and if need be, attempt to rearrange the cells to match the intended execution order. With such a tool, users would not need to manually determine the execution orders themselves. In a user study with 9 participants, our approach on average saves users about 95% of the time required to determine execution orders manually. We also developed a technique to support insertion of cells in rows in addition to the standard column insertion to help better represent multiple contexts. In a user study with 9 participants, this technique on a scale of one to ten on average was judged as a 8.44 in terms of representing multiple contexts as opposed to standard view which was judged as 4.77.
- Automatically Generating Tests from Natural Language Descriptions of Software BehaviorSunil Kamalakar, FNU (Virginia Tech, 2013-10-18)Behavior-Driven Development (BDD) is an emerging agile development approach where all stakeholders (including developers and customers) work together to write user stories in structured natural language to capture a software application's functionality in terms of re- quired "behaviors". Developers then manually write "glue" code so that these scenarios can be executed as software tests. This glue code represents individual steps within unit and acceptance test cases, and tools exist that automate the mapping from scenario descriptions to manually written code steps (typically using regular expressions). Instead of requiring programmers to write manual glue code, this thesis investigates a practical approach to con- vert natural language scenario descriptions into executable software tests fully automatically. To show feasibility, we developed a tool called Kirby that uses natural language processing techniques, code information extraction and probabilistic matching to automatically gener- ate executable software tests from structured English scenario descriptions. Kirby relieves the developer from the laborious work of writing code for the individual steps described in scenarios, so that both developers and customers can both focus on the scenarios as pure behavior descriptions (understandable to all, not just programmers). Results from assessing the performance and accuracy of this technique are presented.
- A Campus Situational Awareness and Emergency Response Management System ArchitectureChigani, Amine (Virginia Tech, 2011-04-06)The history of university, college, and high school campuses is eventful with man-made tragedies ensuing a tremendous loss of life. Virginia Tech's April 16 shooting ignited the discussion about balancing openness and safety in open campus environments. Existing campus safety solutions are characterized by addressing bits and pieces of the problem. The perfect example is the recent influx in demand for Electronic Notification Systems (ENS) by many educational institutions following the tragedies at Virginia Tech and Northern Illinois University. Installing such systems is important, as it is an essential part of an overall solution. However, without a comprehensive, innovative understanding of the requirements for an institution-wide solution that enables effective security control and efficient emergency response, the proposed solutions will always fall short. This dissertation describes an architecture for SINERGY (campuS sItuational awareNess and Emergency Response manaGement sYstem) – a Service-Oriented Architecture (SOA)-based network-centric system of systems that provides a comprehensive, institution-wide, software-based solution for balancing safety and openness on any campus environment. SINERGY architecture addresses three main capabilities: Situational awareness (SA), security control (SC), and emergency response management (ERM). A safe and open campus environment can be realized through the development of a network-centric system that enables the creation of a COP of the campus environment shared by all campus entities. Having a COP of what goes on campus at any point in time is key to enabling effective SC measures to be put in place. Finally, common SA and effective SC lay the foundation for an efficient and successful ERM in the case of a man-made tragedy. Because this research employs service orientation principles to architect SINERGY, this dissertation also addresses a critical area of research with regards to SOA; that area is SOA security. Security has become a critical concern when it comes to SOA-based network-centric systems of systems due the nature of business practices today, which emphasize dynamic sharing of information and services among independent partners. As a result, the line between internal and external organization networks and services has been blurred making it difficult to assess the security quality of SOA environments. In order to do this evaluation effectively, a hierarchy of security indicators is developed. The proposed hierarchy is incorporated in a well-established evaluation methodology to provide a structured approach for assessing the security of an SOA-based network-centric system of systems. Another area of focus in this dissertation is the architecting process. With the advent of potent network technology, software/system engineering has evolved from a traditional platform-centric focus into a network-centric paradigm where the “system of systems” perspective has been the norm. Under this paradigm, architecting has become a critical process in the life cycle of software/system engineering. The need for a structured description of the architecting process is undeniable. This dissertation fulfills that need and provides a structured description of the process of architecting a software-based network-centric system of systems. The architecting process is described using a set of goals that are specific to architecting, and the associated specific practices that enable the realization of these goals. The architecting process description presented herein is intended to guide the software/system architects.
- Checking Metadata Usage for Enterprise ApplicationsZhang, Yaxuan (Virginia Tech, 2021-05-20)It is becoming more and more common for developers to build enterprise applications on Spring framework or other other Java frameworks. While the developers are enjoying the convenient implementations of web frameworks, developers should pay attention to con- figuration deployment with metadata usage (i.e., Java annotations and XML deployment descriptors). Different formats of metadata can correspond to each other. Metadata usually exist in multiple files. Maintaining such metadata is challenging and time-consuming. Cur- rent compilers and research tools rarely inspect the XML files, not to say the corresponding relationship between Java annotations and XML files. To help developers ensure the quality of metadata, this work presents a Domain Specific Language, RSL, and its engine, MeEditor. RSL facilitates pattern definition for correct metadata usage. MeEditor can take in specified rules and check Java projects for any rule violations. Developer can define rules with RSL considering the metadata usage. Then, developers can run RSL script with MeEditor. 9 rules were extracted from Spring specification and are written in RSL. To evaluate the effectiveness and correctness of MeEditor, we mined 180 plus 500 open-source projects from Github. To evaluate the effectiveness and usefulness of MeEditor, we conducted our evaluation by taking two steps. First, we evaluated the effec- tiveness of MeEditor by constructing a know ground truth data set. Based on experiments of ground truth data set, MeEditor can identified the metadata misuse. MeEditor detected bug with 94% precision, 94% recall, 94% accuracy. Second, we evaluate the usefulness of MeEditor by applying it to real world projects (total 500 projects). For the latest version of these 500 projects, MeEditor gave 79% precision according to our manual inspection. Then, we applied MeEditor to the version histories of rule-adopted projects, which adopt the rule and is identified as correct project for latest version. MeEditor identified 23 bugs, which later fixed by developers.
- CLES: A Universal Wrench for Embedded Systems Communication and CoordinationDavis, Jason; Tilevich, Eli (Springer, 2022-01-01)Modern embedded systems—autonomous vehicle-to-vehicle communication, smart cities, and military Joint All-Domain Operations—feature increasingly heterogeneous distributed components. As a result, existing communication methods, tightly coupled with specific networking layers and individual applications, can no longer balance the flexibility of modern data distribution with the traditional constraints of embedded systems. To address this problem, this paper presents a domain-specific language, designed around the Representational State Transfer (REST) architecture, most famously used on the web. Our language, called the Communication Language for Embedded Systems (CLES), supports both traditional point-to-point data communication and allocation of decentralized distributed tasks. To meet the traditional constraints of embedded execution, CLES’s novel runtime allocates decentralized distributed tasks across a heterogeneous network of embedded devices, overcoming limitations of centralized management and limited operating system integration. We evaluated CLES with performance micro-benchmarks, implementation of distributed stochastic gradient descent, and by applying it to design versatile stateless services for vehicle-to-vehicle communication and military Joint All-Domain Command and Control, thus meeting the data distribution needs of realistic cyber-physical embedded systems.
- The Client Insourcing Refactoring to Facilitate the Re-engineering of Web-Based ApplicationsAn, Kijin (Virginia Tech, 2021-05-19)Developers often need to re-engineer distributed applications to address changes in requirements, made only after deployment. Much of the complexity of inspecting and evolving distributed applications lies in their distributed nature, while the majority of mature program analysis and transformation tools works only with centralized software. Inspired by business process re-engineering, in which remote operations can be insourced back in house to restructure and outsource anew, this dissertation brings an analogous approach to the re-engineering of distributed applications. Our approach introduces a novel automatic refactoring---Client Insourcing---that creates a semantically equivalent centralized version of a distributed application. This centralized version is then inspected, modified, and redistributed to meet new requirements. This dissertation demonstrates the utility of Client Insourcing in helping meet the changed requirements in performance, reliability, and security. We implemented Client Insourcing in the important domain of full-stack JavaScript applications, in which both the client and server parts are written in JavaScript, and applied our implementation to re-engineer mobile web applications. Client Insourcing reduces the complexity of inspecting and evolving distributed applications, thereby facilitating their re-engineering. This dissertation is based on 4 conference papers and 2 doctoral symposium papers, presented at ICWE 2019, SANER 2020, WWW 2020, and ICWE 2021.