Recommending TEE-based Functions Using a Deep Learning Model
MetadataShow full item record
Trusted execution environments (TEEs) are an emerging technology that provides a protected hardware environment for processing and storing sensitive information. By using TEEs, developers can bolster the security of software systems. However, incorporating TEE into existing software systems can be a costly and labor-intensive endeavor. Software maintenance—changing software after its initial release—is known to contribute the majority of the cost in the software development lifecycle. The first step of making use of a TEE requires that developers accurately identify which pieces of code would benefit from being protected in a TEE. For large code bases, this identification process can be quite tedious and time-consuming. To help reduce the software maintenance costs associated with introducing a TEE into existing software, this thesis introduces ML-TEE, a recommendation tool that uses a deep learning model to classify whether an input function handles sensitive information or sensitive code. By applying ML-TEE, developers can reduce the burden of manual code inspection and analysis. ML-TEE's model was trained and tested on functions from GitHub repositories that use Intel SGX and on an imbalanced dataset. The accuracy of the final model used in the recommendation system has an accuracy of 98.86% and an F1 score of 80.00%. In addition, we conducted a pilot study, in which participants were asked to identify functions that needed to be placed inside a TEE in a third-party project. The study found that on average, participants who had access to the recommendation system's output had a 4% higher accuracy and completed the task 21% faster.
General Audience Abstract
Improving the security of software systems has become critically important. A trusted execution environment (TEE) is an emerging technology that can help secure software that uses or stores confidential information. To make use of this technology, developers need to identify which pieces of code handle confidential information and should thus be placed in a TEE. However, this process is costly and laborious because it requires the developers to understand the code well enough to make the appropriate changes in order to incorporate a TEE. This process can become challenging for large software that contains millions of lines of code. To help reduce the cost incurred in the process of identifying which pieces of code should be placed within a TEE, this thesis presents ML-TEE, a recommendation system that uses a deep learning model to help reduce the number of lines of code a developer needs to inspect. Our results show that the recommendation system achieves high accuracy as well as a good balance between precision and recall. In addition, we conducted a pilot study and found that participants from the intervention group who used the output from the recommendation system managed to achieve a higher average accuracy and perform the assigned task faster than the participants in the control group.
- Masters Theses