Browsing by Author "Servant Cortes, Francisco Javier"
Now showing 1 - 20 of 32
Results Per Page
Sort Options
- Automated Identification and Application of Code Refactoring in Scratch to Promote the Culture Quality from the Ground upTechapalokul, Peeratham (Virginia Tech, 2020-06-04)Much of software engineering research and practice is concerned with improving software quality. While enormous prior efforts have focused on improving the quality of programs, this dissertation instead provides the means to educate the next generation of programmers who care deeply about software quality. If they embrace the culture of quality, these programmers would be positioned to drastically improve the quality of the software ecosystem. This dissertation describes novel methodologies, techniques, and tools for introducing novice programmers to software quality and its systematic improvement. This research builds on the success of Scratch, a popular novice-oriented block-based programming language, to support the learning of code quality and its improvement. This dissertation improves the understanding of quality problems of novice programmers, creates analysis and quality improvement technologies, and develops instructional approaches for teaching quality improvement. The contributions of this dissertation are as follows. (1) We identify twelve code smells endemic to Scratch, show their prevalence in a large representative codebase, and demonstrate how they hinder project reuse and communal learning. (2) We introduce four new refactorings for Scratch, develop an infrastructure to support them in the Scratch programming environment, and evaluate their effectiveness for the target audience. (3) We study the impact of introducing code quality concepts alongside the fundamentals of programming with and without automated refactoring support. Our findings confirm that it is not only feasible but also advantageous to promote the culture of quality from the ground up. The contributions of this dissertation can benefit both novice programmers and introductory computing educators.
- Checking Metadata Usage for Enterprise ApplicationsZhang, Yaxuan (Virginia Tech, 2021-05-20)It is becoming more and more common for developers to build enterprise applications on Spring framework or other other Java frameworks. While the developers are enjoying the convenient implementations of web frameworks, developers should pay attention to con- figuration deployment with metadata usage (i.e., Java annotations and XML deployment descriptors). Different formats of metadata can correspond to each other. Metadata usually exist in multiple files. Maintaining such metadata is challenging and time-consuming. Cur- rent compilers and research tools rarely inspect the XML files, not to say the corresponding relationship between Java annotations and XML files. To help developers ensure the quality of metadata, this work presents a Domain Specific Language, RSL, and its engine, MeEditor. RSL facilitates pattern definition for correct metadata usage. MeEditor can take in specified rules and check Java projects for any rule violations. Developer can define rules with RSL considering the metadata usage. Then, developers can run RSL script with MeEditor. 9 rules were extracted from Spring specification and are written in RSL. To evaluate the effectiveness and correctness of MeEditor, we mined 180 plus 500 open-source projects from Github. To evaluate the effectiveness and usefulness of MeEditor, we conducted our evaluation by taking two steps. First, we evaluated the effec- tiveness of MeEditor by constructing a know ground truth data set. Based on experiments of ground truth data set, MeEditor can identified the metadata misuse. MeEditor detected bug with 94% precision, 94% recall, 94% accuracy. Second, we evaluate the usefulness of MeEditor by applying it to real world projects (total 500 projects). For the latest version of these 500 projects, MeEditor gave 79% precision according to our manual inspection. Then, we applied MeEditor to the version histories of rule-adopted projects, which adopt the rule and is identified as correct project for latest version. MeEditor identified 23 bugs, which later fixed by developers.
- The Client Insourcing Refactoring to Facilitate the Re-engineering of Web-Based ApplicationsAn, Kijin (Virginia Tech, 2021-05-19)Developers often need to re-engineer distributed applications to address changes in requirements, made only after deployment. Much of the complexity of inspecting and evolving distributed applications lies in their distributed nature, while the majority of mature program analysis and transformation tools works only with centralized software. Inspired by business process re-engineering, in which remote operations can be insourced back in house to restructure and outsource anew, this dissertation brings an analogous approach to the re-engineering of distributed applications. Our approach introduces a novel automatic refactoring---Client Insourcing---that creates a semantically equivalent centralized version of a distributed application. This centralized version is then inspected, modified, and redistributed to meet new requirements. This dissertation demonstrates the utility of Client Insourcing in helping meet the changed requirements in performance, reliability, and security. We implemented Client Insourcing in the important domain of full-stack JavaScript applications, in which both the client and server parts are written in JavaScript, and applied our implementation to re-engineer mobile web applications. Client Insourcing reduces the complexity of inspecting and evolving distributed applications, thereby facilitating their re-engineering. This dissertation is based on 4 conference papers and 2 doctoral symposium papers, presented at ICWE 2019, SANER 2020, WWW 2020, and ICWE 2021.
- Code Reading Dojo: Designing an Educationally-oriented Mobile Application Aimed at Promoting Code Reading SkillsGhaed, Zahra (Virginia Tech, 2017-06-07)In recent years, much attention has been directed to the use of educational games for learning computer science concepts. The motivation of game-based learning with positive experience has been deeply studied in the literature, but game design for improving code reading skills have much room for improvement. Being good at the reading code is important to a professional developer. To address this issue, we defined a new educationally-oriented mobile game application, aimed at promoting the development of code reading skills in a new and fun way. The strategy of this game is to find errors in pieces of codes. At each level, students should find all syntactic and semantic errors in the code in a certain time in order to advance to the next level. Of the numerous programming languages, we chose Java because it is one of the most popular programming languages. In many colleges, Java plays a major role in introductory courses. Our vision is to allow instructors to employ the game in their introduction to programming in Java course. In addition, we hope it could be adapted for use in introductory courses using different programming languages. Data collected during the project helps us evaluate the impact of game-based learning on code reading in programming languages. We asked undergraduate students at the department of computer science at Virginia Tech to play with the game during Spring 2017 semester. The collected data analyzed, and students believe that Code Reading Dojo improves their code reading skills in Java and overall programming ability, in additions to help them find errors in their own program.
- Cost-saving in Continuous Integration: Development, Improvement, and Evaluation of Build Selection ApproachesJin, Xianhao (Virginia Tech, 2022-05-24)Continuous integration (CI) is a widely used practice in modern software engineering. Unfortunately, it is also an expensive practice — Google and Mozilla estimate their CI systems in millions of dollars. In this dissertation, I propose a collection of novel build selection approaches that are able to save the cost of CI. I also propose the first exhaustive comparison of techniques to improve CI including build and test granularity approaches. I firstly design a build selection approach (SMARTBUILDSKIP) for CI cost reduction in a balanceable way. The evaluation of SMARTBUILDSKIP shows that it can save a median of 30% of builds by only incurring a median delay of 1 build in a median of 15% of failing builds under its most conservative configuration. To minimize the delayed failure observation, I then propose the second build selection approach (PRECISEBUILDSKIP) that can save cost without delaying failure observation. We find that PRECISEBUILDSKIP can save a median of 5.5% of builds while capturing the majority of failing builds (100% in median) from the evaluation. After that, I evaluate the strengths and weaknesses of 10 techniques that can improve CI including SMARTBUILDSKIP. The findings of the comparison motivate my next work to design a hybrid technique (HYBRIDBUILDSKIP) that combines these techniques to produce more cost-saving while keeping a low proportion of failing builds that are delayed in observation. Finally, I design an experiment to understand how different weights of test duration among the whole build duration can influence the cost-saving of build and test selection techniques.
- DR_BEV: Developer Recommendation Based on Executed VocabularyBendelac, Alon (Virginia Tech, 2020-05-28)Bug-fixing, or fixing known errors in computer software, makes up a large portion of software development expenses. Once a bug is discovered, it must be assigned to an appropriate developer who has the necessary expertise to fix the bug. This bug-assignment task has traditionally been done manually. However, this manual task is time-consuming, error-prone, and tedious. Therefore, automatic bug assignment techniques have been developed to facilitate this task. Most of the existing techniques are report-based. That is, they work on bugs that are textually described in bug reports. However, only a subset of bugs that are observed as a faulty program execution are also described textually. Certain bugs, such as security vulnerability bugs, are only represented with a faulty program execution, and are not described textually. In other words, these bugs are represented by a code coverage, which indicates which lines of source code have been executed in the faulty program execution. Promptly fixing these software security vulnerability bugs is necessary in order to manage security threats. Accordingly, execution-based bug assignment techniques, which model a bug with a faulty program execution, are an important tool in fixing software security bugs. In this thesis, we compare WhoseFault, an existing execution-based bug assignment technique, to report-based techniques. Additionally, we propose DR_BEV (Developer Recommendation Based on Executed Vocabulary), a novel execution-based technique that models developer expertise based on the vocabulary of each developer's source code contributions, and we demonstrate that this technique out-performs the current state-of-the-art execution-based technique. Our observations indicate that report-based techniques perform better than execution-based techniques, but not by a wide margin. Therefore, while a report-based technique should be used if a report exists for a bug, our results should provide confidence in the scenarios in which only execution-based techniques are applicable.
- Enhancing CryptoGuard's Deployability for Continuous Software Security ScanningFrantz, Miles Eugene (Virginia Tech, 2020-05-21)The increasing development speed via Agile may introduce overlooked security steps in the process, with an example being the Iowa Caucus application. Verifying the protection of confidential information such as social security numbers requires security at all levels, providing protection through any connected applications. CryptoGuard is a static code analyzer for Java. This program verifies that developers do not leave vulnerabilities in their application. The program aids the developer by identifying cryptographic misuses such as hard-coded keys, weak program hashes, and using insecure protocols. In my Master thesis work, I made several important contributions to improving the deployability, accessibility, and usability of CryptoGuard. I extended CryptoGuard to scan source and compiled code, created live documentation, and supported a dual cloud and local tool-suite. I also created build tool plugins and a program aid for CryptoGuard. In addition, I also analyzed several Java-related surveys encompassing more than 50,000 developers and reported interesting current practices of real-world software developers.
- Enhancing Fault Localization with Cost AwarenessNachimuthu Nallasamy, Kanagaraj (Virginia Tech, 2019-06-24)Debugging is a challenging and time-consuming process in software life-cycle. The focus of the thesis is to improve the accuracy of existing fault localization (FL) techniques. We experimented with several source code line level features such as line commit size, line recency, and line length to arrive at a new fault localization technique. Based on our experiments, we propose a novel enhanced cost-aware fault localization (ECFL) technique by combining line length with the existing selected baseline fault localization techniques. ECFL improves the accuracy of DStar (Baseline 1), CombineFastestFL (Baseline 2), and CombineFL (Baseline 3) by locating 81%, 58%, and 30% more real faults respectively in Top-1 evaluation metric. In comparison with the baseline techniques, ECFL requires a marginal additional time (on an average, 5 seconds per bug) and data while providing a significant improvement in accuracy. The source code line features also improve the baseline fault localization techniques when ''learning to rank'' SVM machine learning approach is used to combine the features. We also provide an infrastructure to facilitate future research on combining new source code line features with other fault localization techniques.
- An Experimental Study of the Performance, Energy, and Programming Effort Trade-offs of Android Persistence FrameworksPu, Jing (Virginia Tech, 2016-08-16)One of the fundamental building blocks of a mobile application is the ability to persist program data between different invocations. Referred to as persistence, this functionality is commonly implemented by means of persistence frameworks. When choosing a particular framework, Android-the most popular mobile platform-offers a wide variety of options to developers. Unfortunately, the energy, performance, and programming effort trade-offs of these frameworks are poorly understood, leaving the Android developer in the dark trying to select the most appropriate option for their applications. To address this problem, this thesis reports on the results of the first systematic study of six Android persistence frameworks (i.e., ActiveAndroid, greenDAO, OrmLite, Sugar ORM, Android SQLite, and Realm Java) in their application to and performance with popular benchmarks, such as DaCapo. Having measured and analyzed the energy, performance, and programming effort trade-offs for each framework, we present a set of practical guidelines for the developer to choose between Android persistence frameworks. Our findings can also help the framework developers to optimize their products to meet the desired design objectives.
- Exploring the Process and Challenges of Programming with Regular ExpressionsMichael, Louis Guy IV (Virginia Tech, 2019-06-27)Regular expressions (regexes) are a powerful mechanism for solving string-matching problems and are supported by all modern programming languages. While regular expressions are highly expressive, they are also often perceived to be highly complex and hard to read. While existing studies have focused on improving the readability of regular expressions, little is known about any other difficulties that developers face when programming with regular expressions. In this paper, we aim to provide a deeper understanding of the process of programming regular expressions by studying: (1) how developers make decisions through the process, (2) what difficulties they face, and (3) how aware they are about serious risks involved in programming regexes. We surveyed 158 professional developers from a diversity of backgrounds, and we conducted a series of interviews to learn more details about the difficulties and solutions that participants face in this process. This mixed methods approach revealed that some of the difficulties of regexes come in the shape of: inability to effectively search for them; fully validate them; and document them. Developers also reported cascading impacts of poor readability, lack of universal portability, and struggling with overall problem comprehension. The majority of our studied developers were unaware of critical security risks that can occur when using regexes, and those that were aware of potential problems felt that they lacked the ability to identify problematic regexes. Our findings provide multiple implications for future work, including development of semantic regex search engines for regex reuse, and improved input generators for regex validation.
- Fast and accurate incremental feedback for students' software tests using selective mutation analysisKazerouni, Ayaan M.; Davis, James C.; Basak, Arinjoy; Shaffer, Clifford A.; Servant Cortes, Francisco Javier; Edwards, Stephen H. (2021-05)As incorporating software testing into programming assignments becomes routine, educators have begun to assess not only the correctness of students' software, but also the adequacy of their tests. In practice, educators rely on code coverage measures, though its shortcomings are widely known. Mutation analysis is a stronger measure of test adequacy, but it is too costly to be applied beyond the small programs developed in introductory programming courses. We demonstrate how to adapt mutation analysis to provide rapid automated feedback on software tests for complex projects in large programming courses. We study a dataset of 1389 student software projects ranging from trivial to complex. We begin by showing that although the state-of-the-art in mutation analysis is practical for providing rapid feedback on projects in introductory courses, it is prohibitively expensive for the more complex projects in subsequent courses. To reduce this cost, we use a statistical procedure to select a subset of mutation operators that maintains accuracy while minimizing cost. We show that with only 2 operators, costs can be reduced by a factor of 2-3 with negligible loss in accuracy. Finally, we evaluate our approach on open-source software and report that our findings may generalize beyond our educational context. (c) 2021 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
- Helping Developers Migrate their Code across Programming LanguagesElarnaoty, Mohammed Elsayed (Virginia Tech, 2024-10-15)Migrating source code from one programming language to another is a common task in software development. This migration can be done by completely rewriting the code in the target language, or it can be facilitated through code-reuse or automation techniques. This thesis explores both approaches. For code-reuse, two new cross-language code search techniques are proposed that enable developers to search for code in one language using code from another. These techniques address the limitations of existing methods in the context of code migration. The first technique leverages a Siamese network combined with Word2Vec embeddings, while the second employs transformers. For code automation, the concept of Translation Types is introduced to categorize code translations. An empirical study was conducted to analyze the differences between human-translated and machine-translated code. Based on these findings, two multi-output code translation techniques were developed that produce multiple translations aligned with the different styles that developers use when translating their code. The first tool employs a denoising autoencoder and a blueprint-guided beam search algorithm to generate translations of specific types. This algorithm mimics the translation operations that developers apply in similar software projects. The second tool utilizes GPT-4 with a specialized prompt to generate translations tailored to the requested types. In the evaluation, these approaches produced automated code translations that better aligned with developer preferences while maintaining correctness compared to existing methods.
- Intelligent Goal-Oriented Feedback for Java Programming AssignmentsKandru, Nischel (Virginia Tech, 2018-07-12)Within computer science education, goal-oriented feedback motivates beginners to be engaged in learning programming. As the number of students increases, it is challenging for teaching assistants to cater to all the doubts of students and provide goals. This problem is addressed by intelligent visual feedback which guides beginners formulate effective goals to resolve all the errors they would incur while solving a programming assignment. Most current automated feedback mechanisms provide feedback without categorization, prioritization, or goal formulation in mind. Students may overlook important issues, and high priority issues might be hidden among other issues. Also, beginners are not well equipped in formulating goals to resolve the issues provided in the feedback. In this research, we address the problem of providing an effective, intelligent goal-oriented feedback to student's code to resolve all the issues in their code while ensuring that the code is well tested. The goal-oriented feedback would eventually implicitly navigate the students to write a logically correct solution. The code feedback is summarized into four categories in the descending order of priority: Coding, Student's Testing, Behavior, and Style. Each category is further classified into subcategories, and a simple visual summary of the student's code is also provided. Each of the above-mentioned categories has detailed feedback on each error in that category to provide a better understanding of the errors. We also offer enhanced error messages and diagnosis of errors to make the feedback very useful. This intelligent feedback has been integrated into Web-CAT, an open-source automated grading tool developed at Virginia Tech that is widely used by many universities. A user survey was collected after the students have utilized this feedback for a couple of programming assignments and we obtained promising results to claim that our intelligent feedback is effective.
- Investigating and Recommending Co-Changed Entities for JavaScript ProgramsJiang, Zijian (Virginia Tech, 2020)JavaScript (JS) is one of the most popular programming languages due to its flexibility and versatility, but debugging JS code is tedious and error-prone. In our research, we conducted an empirical study to characterize the relationship between co-changed software entities (e.g., functions and variables), and built a machine learning (ML)-based approach to recommend additional entity to edit given developers’ code changes. Specifically, we first crawled 14,747 commits in 10 open-source projects; for each commit, we created one or more change dependency graphs (CDGs) to model the referencer-referencee relationship between co-changed entities. Next, we extracted the common subgraphs between CDGs to locate recurring co-change patterns between entities. Finally, based on those patterns, we extracted code features from co-changed entities and trained an ML model that recommends entities-to-change given a program commit. According to our empirical investigation, (1) 50% of the crawled commits involve multi-entity edits (i.e., edits that touch multiple entities simultaneously); (2) three recurring patterns commonly exist in all projects; and (3) 80–90% of co-changed function pairs either invoke the same function(s), access the same variable(s), or contain similar statement(s); and (4) our ML-based approach CoRec recommended entity changes with high accuracy. This research will improve programmer productivity and software quality.
- An Investigation into Code Search Engines: The State of the Art Versus Developer ExpectationsLi, Shuangyi (Virginia Tech, 2022-07-15)An essential software development tool, code search engines are expected to provide superior accuracy, usability, and performance. However, prior research has neither (1) summarized, categorized, and compared representative code search engines, nor (2) analyzed the actual expectations that developers have for code search engines. This missing knowledge can empower developers to fully benefit from search engines, academic researchers to uncover promising research directions, and industry practitioners to properly marshal their efforts. This thesis fills the aforementioned gaps by drawing a comprehensive picture of code search engines, including their definition, standard processes, existing solutions, common alternatives, and developers' perspectives. We first study the state of the art in code search engines by analyzing academic papers, industry releases, and open-source projects. We then survey more than a 100 software developers to ascertain their usage of and preferences for code search engines. Finally, we juxtapose the results of our study and survey to synthesize a call-for-action for researchers and industry practitioners to better meet the demands software developers make on code search engines. We present the first comprehensive overview of state-of-the-art code search engines by categorizing and comparing them based on their respective search strategies, applicability, and performance. Our user survey revealed a surprising lack of awareness among many developers w.r.t. code search engines, with a high preference for using general-purpose search engines (e.g., Google) or code repositories (e.g., GitHub) to search for code. Our results also clearly identify typical usage scenarios and sought-after properties of code search engines. Our findings can guide software developers in selecting code search engines most suitable for their programming pursuits, suggest new research directions for researchers, and help programming tool builders in creating effective code search engine solutions.
- Measuring the Software Development Process to Enable Formative FeedbackKazerouni, Ayaan Mehdi (Virginia Tech, 2020-04-16)Graduating CS students face well-documented difficulties upon entering the workforce, with reports of a gap between what they learn and what is expected of them in industry. Project management, software testing, and debugging have been repeatedly listed as common "knowledge deficiencies" among newly hired CS graduates. Similar difficulties manifest themselves on a smaller scale in upper-level CS courses, like the Data Structures and Algorithms course at Virginia Tech: students are required to develop large and complex projects over a three to four week lifecycle, and it is common to see close to a quarter of the students drop or fail the course, largely due to the difficult and time-consuming nature of the projects. My research is driven by the hypothesis that regular feedback about the software development process, delivered during development, will help ameliorate these difficulties. Assessment of software currently tends to focus on qualities like correctness, code coverage from test suites, and code style. Little attention or tooling has been developed for the assessment of the software development process. I use empirical software engineering methods like IDE-log analysis, software repository mining, and semi-structured interviews with students to identify effective and ineffective software practices to formulate. Using the results of these analyses, I have worked on assessing students' development in terms of time management, test writing, test quality, and other "self-checking" behaviours like running the program locally or submitting to an oracle of instructor-written test cases. The goal is to use this information to formulate formative feedback about the software development process. In addition to educators, this research is relevant to software engineering researchers and practitioners, since the results from these experiments are based on the work of upper-level students who grapple with issues of design and work-flow that are not far removed from those faced by professionals in industry.
- Methodologies, Techniques, and Tools for Understanding and Managing Sensitive Program InformationLiu, Yin (Virginia Tech, 2021-05-20)Exfiltrating or tampering with certain business logic, algorithms, and data can harm the security and privacy of both organizations and end users. Collectively referred to as sensitive program information (SPI), these building blocks are part and parcel of modern software systems in domains ranging from enterprise applications to cyberphysical setups. Hence, protecting SPI has become one of the most salient challenges of modern software development. However, several fundamental obstacles stand on the way of effective SPI protection: (1) understanding and locating the SPI for any realistically sized codebase by hand is hard; (2) manually isolating SPI to protect it is burdensome and error-prone; (3) if SPI is passed across distributed components within and across devices, it becomes vulnerable to security and privacy attacks. To address these problems, this dissertation research innovates in the realm of automated program analysis, code transformation, and novel programming abstractions to improve the state of the art in SPI protection. Specifically, this dissertation comprises three interrelated research thrusts that: (1) design and develop program analysis and programming support for inferring the usage semantics of program constructs, with the goal of helping developers understand and identify SPI; (2) provide powerful programming abstractions and tools that transform code automatically, with the goal of helping developers effectively isolate SPI from the rest of the codebase; (3) provide programming mechanism for distributed managed execution environments that hides SPI, with the goal of enabling components to exchange SPI safely and securely. The novel methodologies, techniques, and software tools, supported by programming abstractions, automated program analysis, and code transformation of this dissertation research lay the groundwork for establishing a secure, understandable, and efficient foundation for protecting SPI. This dissertation is based on 4 conference papers, presented at TrustCom'20, GPCE'20, GPCE'18, and ManLang'17, as well as 1 journal paper, published in Journal of Computer Languages (COLA).
- Modeling Software Developer Expertise and Inexpertise to Handle Diverse Information NeedsClaytor, Frank L. (Virginia Tech, 2018-06-08)Expert software developer recommendation is a mature research field with many different techniques being developed to help automate the search for experts to help with development tasks and questions. But all previous research on recommending expert developers has had two constant restrictions. First, all previous expert recommendation work assumed that developers only demonstrate positive expertise. But developers can also make mistakes and demonstrate negative expertise, referred to as inexpertise, and show which concepts they don't know as well. Previous research on developer expertise hasn't taken inexpertise into account. Another restriction is that all previous expert developer recommendation research has focused on recommending developers for a single development task or expertise need, such as fixing a bug report or helping with a change request. But not all expertise needs can be easily classified into one of these groups, and having different techniques for every possible task type would be difficult and confusing to maintain and use. We find that inexpertise exists, can be measured, and that it can be used to direct inspection effort to find potentially incorrect or buggy commits. Additionally we investigate how different expertise finding techniques perform on a diverse set of long and short expertise queries and develop new techniques that can get more consistent cross query performance.
- On the Impact and Defeat of Regular Expression Denial of ServiceDavis, James Collins (Virginia Tech, 2020-05-28)Regular expressions (regexes) are a widely-used yet little-studied software component. Engineers use regexes to match domain-specific languages of strings. Unfortunately, many regex engine implementations perform these matches with worst-case polynomial or exponential time complexity in the length of the string. Because they are commonly used in user-facing contexts, super-linear regexes are a potential denial of service vector known as Regular expression Denial of Service (ReDoS). Part I gives the necessary background to understand this problem. In Part II of this dissertation, I present the first large-scale empirical studies of super-linear regex use. Guided by case studies of ReDoS issues in practice (Chapter 3), I report that the risk of ReDoS affects up to 10% of the regexes used in practice (Chapter 4), and that these findings generalize to software written in eight popular programming languages (Chapter 5). ReDoS appears to be a widespread vulnerability, motivating the consideration of defenses. In Part III I present the first systematic comparison of ReDoS defenses. Based on the necessary conditions for ReDoS, a ReDoS defense can be erected at the application level, the regex engine level, or the framework/runtime level. In my experiments I report that application-level defenses are difficult and error prone to implement (Chapter 6), that finding a compatible higher-performing regex engine is unlikely (Chapter 7), that optimizing an existing regex engine using memoization incurs (perhaps acceptable) space overheads (Chapter 8), and that incorporating resource caps into the framework or runtime is feasible but faces barriers to adoption (Chapter 9). In Part IV of this dissertation, we reflect on our findings. By leveraging empirical software engineering techniques, we have exposed the scope of potential ReDoS vulnerabilities, and given strong motivation for a solution. To assist practitioners, we have conducted a systematic evaluation of the solution space. We hope that our findings assist in the elimination of ReDoS, and more generally that we have provided a case study in the value of data-driven software engineering.
- Promoting Systematic Practices for Designing and Developing Edge Computing Applications via Middleware Abstractions and Performance EstimationDantas Cruz, Breno (Virginia Tech, 2021-04-09)Mobile, IoT, and wearable devices have been transitioning from passive consumers to active generators of massive amounts of user-generated data. Edge-based processing eliminates network bottlenecks and improves data privacy. However, developing edge applications remains hard, with developers often have to employ ad-hoc software development practices to meet their requirements. By doing so, developers introduce low-level and hard-to-maintain code to the codebase, which is error-prone, expensive to maintain, and vulnerable in terms of security. The thesis of this research is that modular middleware abstractions, exemplar use cases, and ML-based performance estimation can make the design and development of edge applications more systematic. To prove this thesis, this dissertation comprises of three research thrusts: (1) understand the characteristics of edge-based applications, in terms of their runtime, architecture, and performance; (2) provide exemplary use cases to support the development of edge-based application; (3) innovate in the realm of middleware to address the unique challenges of edge-based data transfer and processing. We provide programming support and performance estimation methodologies to help edge-based application developers improve their software development practices. This dissertation is based on three conference papers, presented at MOBILESoft 2018, VTC 2020, and IEEE SMDS 2020.