Toward Bridging the Semantic Gap Between x86-64 Software Binaries and Abstract Languages for Formal Verification and Security

dc.contributor.authorRoessle, Ian Henryen
dc.contributor.committeechairTront, Joseph G.en
dc.contributor.committeememberAdams, Stephen Conwayen
dc.contributor.committeememberBeling, Peter A.en
dc.contributor.committeememberGaeddert, Joseph Danielen
dc.contributor.committeememberJin, Mingen
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2025-01-14T09:00:21Zen
dc.date.available2025-01-14T09:00:21Zen
dc.date.issued2025-01-13en
dc.description.abstractFormal verification of software systems has historically focused on verifying properties at the source code (abstract language) level (e.g., C++, Java, Python), as the language semantics are more amenable to use by mathematical libraries. However, this approach comes with limitations. Firstly, not all software has source code. Secondly, the source code may be unavailable for licensing reasons. Thirdly, even if the source code is available, formal verification at the source code level generally assumes that the compiler is trustworthy and mathematically sound. This dissertation addresses challenges with lifting the abstraction of software binaries to support formal verification and reasoning toward bridging the semantic gap. The semantic gap is the loss and obfuscation of program meaning that occurs as a result of compiling source code into software binaries. This work's primary application is verifying safety-critical systems where source code is unavailable. The x86-64 Instruction Set Architecture (ISA) was chosen due to its applicability to a wide range of relevant deployed software. As a necessary dependency for the formal verification of software binaries, a machine model of the ISA is required. The machine model is analogous to the language semantics necessary when reasoning over source code. The x86-64 ISA was also chosen because, in addition to being complex, it lacked a formal specification, making it a more challenging target. Historically, much research in the area of reasoning over x86 software binaries has focused on small fragments of code, utilizing a small subset of the ISA that could be handwritten formally, limiting research to a small subset of real-world software. The first contribution of this dissertation addresses these limitations, presenting a robust hardware-derived formal executable machine model of the x86-64 ISA, supporting 1,625 instruction variants expressed as small-step operational semantics embedded within an Interactive Theorem Prover (ITP) (a 4-fold increase over prior work). The second contribution of this dissertation presents a largely automated methodology for generating formally proven equivalence theorems between decompiled x86-64 machine code and big-step operational semantics. These formally proven abstract representations of software binaries allow one to construct correctness proofs per block. To support this, we developed improved Decompilation-into-logic and formal symbolic execution techniques. We then applied this methodology to several case studies, including examples heavily relying on floating-point instructions and real-world examples. This dissertation's third contribution presents efforts to bridge the semantic gap further and tie the results to formal verification techniques commonly utilized over source code. It presents a first-of-the-art approach to leveraging a hardware-derived model derived from machine learning in software binary formal verification efforts. It introduces a library of 628 proven simplification lemmas designed to lift the abstraction of big-step operational semantics to a form more useful to proof engineers. A methodology for performing axiomatic reasoning across blocks is presented, with case studies demonstrating, for software binaries, standard formal verification techniques typically applicable to source code. Lastly, this dissertation explores a methodology for sound software binary diversity to employ a moving target defense strategy with associated security benefits. Toward bridging the semantic gap between software binaries and source code, this work demonstrates large-scale black-box software binary formal verification in settings where source code is unavailable, on a hardware-derived machine model, as well as a sound software binary diversity approach for moving target defense.en
dc.description.abstractgeneralBetween the optimization and obfuscation of compilers and a lack of comprehensive small-step operational semantics, lifting x86-64 software binaries and formally verifying them to a specification has been difficult. As a result, most formal verification methodologies operate on source code instead of software binaries. Operating systems and other safety-critical software, however, often leverage custom assembly code or a hybrid of C++ and assembly for performance considerations, which would not work with standard formal verification methodologies that operate on source code. This dissertation demonstrates methods for lifting the abstraction of software binaries, leveraging a hardware-derived machine model toward bridging the semantic gap to enable more traditional formal verification methodologies to work, as well as a sound software binary diversity approach for moving target defense.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:41947en
dc.identifier.urihttps://hdl.handle.net/10919/124175en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectx86-64 Assemblyen
dc.subjectCode Abstractionen
dc.subjectSymbolic Executionen
dc.subjectFormal Verificationen
dc.subjectBinary Analysisen
dc.subjectBinary Diversityen
dc.subjectMoving Target Defenseen
dc.subjectCyber Resilienceen
dc.titleToward Bridging the Semantic Gap Between x86-64 Software Binaries and Abstract Languages for Formal Verification and Securityen
dc.typeDissertationen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 2 of 2
Name:
Roessle_IH_D_2025.pdf
Size:
877.84 KB
Format:
Adobe Portable Document Format
Name:
Roessle_IH_D_2025_support_1.docx
Size:
13.55 KB
Format:
Microsoft Word XML
Description:
Supporting documents