Neural Network-based Methodologies for Securing Cryptographic Code

dc.contributor.authorXiao, Yaen
dc.contributor.committeechairYao, Danfengen
dc.contributor.committeememberHicks, Matthewen
dc.contributor.committeememberGe, Xinyangen
dc.contributor.committeememberRamakrishnan, Narendranen
dc.contributor.committeememberMcDaniel, Patrick Drewen
dc.contributor.departmentComputer Science and Applicationsen
dc.date.accessioned2022-08-18T08:00:10Zen
dc.date.available2022-08-18T08:00:10Zen
dc.date.issued2022-08-17en
dc.description.abstractMany studies show that manual code generation is error-prone and results in vulnerabilities. Vulnerability fixing has been shown as the most time-consuming process among multiple steps of code repair. To help developers repair these security vulnerabilities, my dissertation aims to develop an automatic or semi-automatic secure code generation system with neural network based approaches. Trained with huge amounts of good-quality code, I expect the neural network to learn the secure usage and produce the correct code suggestions. Despite the great success of neural networks, the vision of comprehending and generating programming languages through neural networks has not been fully realized. There are many fundamental questions that need to be answered. These questions include 1) what are the accuracy impacts of the various choices in code embedding? 2) How to address the accuracy challenges caused by the programming language specific properties in the task of secure code suggestion? My dissertation work answers the two questions with a systematical measurement study and specialized neural network designs. My experiments show that program analysis is a necessary preprocessing step to guide the code embedding – resulting in a 36.1% accuracy improvement. Furthermore, I identify two previously unreported deficiencies in the cryptographic API suggestion task. To close the gap, I invent a highly accurate API method suggestion solution, referred to as Multi-HyLSTM, with specialized neural network designs to recognize unique programming language characteristics. My work points out the important differences between natural languages and programming languages, which pure data-driven learning approaches may not recognize.en
dc.description.abstractgeneralNeural network techniques that automatically learn rules from data show great potential to provide vulnerability-agnostic solutions for securing code. Recent research community has witnessed the rapid progress of neural network techniques in various application domains, such as computer vision, natural language processing, etc. However, how to harness the success of neural network based approaches for dealing with programs is still largely unknown. Many fundamental questions are required to be answered. This dissertation aims to provide neural network based solutions to help developers write secure code, as well as answer several important but unknown research questions about promoting neural network based approaches specialized for the programming language domain. Learning from Java cryptographic code, I explore the accuracy challenges for neural networks to understand the secure API usage rules and generate appropriate suggestions based on them. One of my research focuses is on how to express code in a way that neural networks can comprehend, aka code embedding. Code embedding is the process of transforming code into numeric vectors. It is important for accuracy as all the subsequent neural network calculation is performed on it. I conduct a systematic comparison to evaluate several key embedding design choices and reveal their impacts on accuracy improvements. To further improve the accuracy, I focus on the accuracy challenges in the specific task, generating API suggestions by neural networks. I identify the unreported program dependency specific challenges and present several specialized neural network designs to address them.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:35329en
dc.identifier.urihttp://hdl.handle.net/10919/111543en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectCode embeddingen
dc.subjectCode suggestionen
dc.subjectNeural networksen
dc.subjectCryptographic APIsen
dc.titleNeural Network-based Methodologies for Securing Cryptographic Codeen
dc.typeDissertationen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Xiao_Y_D_2022.pdf
Size:
3.16 MB
Format:
Adobe Portable Document Format
Description: