Neural Network-based Methodologies for Securing Cryptographic Code

Xiao, Ya

Neural Network-based Methodologies for Securing Cryptographic Code

dc.contributor.author	Xiao, Ya	en
dc.contributor.committeechair	Yao, Danfeng	en
dc.contributor.committeemember	Hicks, Matthew	en
dc.contributor.committeemember	Ge, Xinyang	en
dc.contributor.committeemember	Ramakrishnan, Narendran	en
dc.contributor.committeemember	McDaniel, Patrick Drew	en
dc.contributor.department	Computer Science and Applications	en
dc.date.accessioned	2022-08-18T08:00:10Z	en
dc.date.available	2022-08-18T08:00:10Z	en
dc.date.issued	2022-08-17	en
dc.description.abstract	Many studies show that manual code generation is error-prone and results in vulnerabilities. Vulnerability fixing has been shown as the most time-consuming process among multiple steps of code repair. To help developers repair these security vulnerabilities, my dissertation aims to develop an automatic or semi-automatic secure code generation system with neural network based approaches. Trained with huge amounts of good-quality code, I expect the neural network to learn the secure usage and produce the correct code suggestions. Despite the great success of neural networks, the vision of comprehending and generating programming languages through neural networks has not been fully realized. There are many fundamental questions that need to be answered. These questions include 1) what are the accuracy impacts of the various choices in code embedding? 2) How to address the accuracy challenges caused by the programming language specific properties in the task of secure code suggestion? My dissertation work answers the two questions with a systematical measurement study and specialized neural network designs. My experiments show that program analysis is a necessary preprocessing step to guide the code embedding – resulting in a 36.1% accuracy improvement. Furthermore, I identify two previously unreported deficiencies in the cryptographic API suggestion task. To close the gap, I invent a highly accurate API method suggestion solution, referred to as Multi-HyLSTM, with specialized neural network designs to recognize unique programming language characteristics. My work points out the important differences between natural languages and programming languages, which pure data-driven learning approaches may not recognize.	en
dc.description.abstractgeneral	Neural network techniques that automatically learn rules from data show great potential to provide vulnerability-agnostic solutions for securing code. Recent research community has witnessed the rapid progress of neural network techniques in various application domains, such as computer vision, natural language processing, etc. However, how to harness the success of neural network based approaches for dealing with programs is still largely unknown. Many fundamental questions are required to be answered. This dissertation aims to provide neural network based solutions to help developers write secure code, as well as answer several important but unknown research questions about promoting neural network based approaches specialized for the programming language domain. Learning from Java cryptographic code, I explore the accuracy challenges for neural networks to understand the secure API usage rules and generate appropriate suggestions based on them. One of my research focuses is on how to express code in a way that neural networks can comprehend, aka code embedding. Code embedding is the process of transforming code into numeric vectors. It is important for accuracy as all the subsequent neural network calculation is performed on it. I conduct a systematic comparison to evaluate several key embedding design choices and reveal their impacts on accuracy improvements. To further improve the accuracy, I focus on the accuracy challenges in the specific task, generating API suggestions by neural networks. I identify the unreported program dependency specific challenges and present several specialized neural network designs to address them.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:35329	en
dc.identifier.uri	http://hdl.handle.net/10919/111543	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Code embedding	en
dc.subject	Code suggestion	en
dc.subject	Neural networks	en
dc.subject	Cryptographic APIs	en
dc.title	Neural Network-based Methodologies for Securing Cryptographic Code	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science & Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Xiao_Y_D_2022.pdf
Size:: 3.16 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Doctoral Dissertations