Toward Bridging the Semantic Gap Between x86-64 Software Binaries and Abstract Languages for Formal Verification and Security
dc.contributor.author | Roessle, Ian Henry | en |
dc.contributor.committeechair | Tront, Joseph G. | en |
dc.contributor.committeemember | Adams, Stephen Conway | en |
dc.contributor.committeemember | Beling, Peter A. | en |
dc.contributor.committeemember | Gaeddert, Joseph Daniel | en |
dc.contributor.committeemember | Jin, Ming | en |
dc.contributor.department | Electrical and Computer Engineering | en |
dc.date.accessioned | 2025-01-14T09:00:21Z | en |
dc.date.available | 2025-01-14T09:00:21Z | en |
dc.date.issued | 2025-01-13 | en |
dc.description.abstract | Formal verification of software systems has historically focused on verifying properties at the source code (abstract language) level (e.g., C++, Java, Python), as the language semantics are more amenable to use by mathematical libraries. However, this approach comes with limitations. Firstly, not all software has source code. Secondly, the source code may be unavailable for licensing reasons. Thirdly, even if the source code is available, formal verification at the source code level generally assumes that the compiler is trustworthy and mathematically sound. This dissertation addresses challenges with lifting the abstraction of software binaries to support formal verification and reasoning toward bridging the semantic gap. The semantic gap is the loss and obfuscation of program meaning that occurs as a result of compiling source code into software binaries. This work's primary application is verifying safety-critical systems where source code is unavailable. The x86-64 Instruction Set Architecture (ISA) was chosen due to its applicability to a wide range of relevant deployed software. As a necessary dependency for the formal verification of software binaries, a machine model of the ISA is required. The machine model is analogous to the language semantics necessary when reasoning over source code. The x86-64 ISA was also chosen because, in addition to being complex, it lacked a formal specification, making it a more challenging target. Historically, much research in the area of reasoning over x86 software binaries has focused on small fragments of code, utilizing a small subset of the ISA that could be handwritten formally, limiting research to a small subset of real-world software. The first contribution of this dissertation addresses these limitations, presenting a robust hardware-derived formal executable machine model of the x86-64 ISA, supporting 1,625 instruction variants expressed as small-step operational semantics embedded within an Interactive Theorem Prover (ITP) (a 4-fold increase over prior work). The second contribution of this dissertation presents a largely automated methodology for generating formally proven equivalence theorems between decompiled x86-64 machine code and big-step operational semantics. These formally proven abstract representations of software binaries allow one to construct correctness proofs per block. To support this, we developed improved Decompilation-into-logic and formal symbolic execution techniques. We then applied this methodology to several case studies, including examples heavily relying on floating-point instructions and real-world examples. This dissertation's third contribution presents efforts to bridge the semantic gap further and tie the results to formal verification techniques commonly utilized over source code. It presents a first-of-the-art approach to leveraging a hardware-derived model derived from machine learning in software binary formal verification efforts. It introduces a library of 628 proven simplification lemmas designed to lift the abstraction of big-step operational semantics to a form more useful to proof engineers. A methodology for performing axiomatic reasoning across blocks is presented, with case studies demonstrating, for software binaries, standard formal verification techniques typically applicable to source code. Lastly, this dissertation explores a methodology for sound software binary diversity to employ a moving target defense strategy with associated security benefits. Toward bridging the semantic gap between software binaries and source code, this work demonstrates large-scale black-box software binary formal verification in settings where source code is unavailable, on a hardware-derived machine model, as well as a sound software binary diversity approach for moving target defense. | en |
dc.description.abstractgeneral | Between the optimization and obfuscation of compilers and a lack of comprehensive small-step operational semantics, lifting x86-64 software binaries and formally verifying them to a specification has been difficult. As a result, most formal verification methodologies operate on source code instead of software binaries. Operating systems and other safety-critical software, however, often leverage custom assembly code or a hybrid of C++ and assembly for performance considerations, which would not work with standard formal verification methodologies that operate on source code. This dissertation demonstrates methods for lifting the abstraction of software binaries, leveraging a hardware-derived machine model toward bridging the semantic gap to enable more traditional formal verification methodologies to work, as well as a sound software binary diversity approach for moving target defense. | en |
dc.description.degree | Doctor of Philosophy | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:41947 | en |
dc.identifier.uri | https://hdl.handle.net/10919/124175 | en |
dc.language.iso | en | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | x86-64 Assembly | en |
dc.subject | Code Abstraction | en |
dc.subject | Symbolic Execution | en |
dc.subject | Formal Verification | en |
dc.subject | Binary Analysis | en |
dc.subject | Binary Diversity | en |
dc.subject | Moving Target Defense | en |
dc.subject | Cyber Resilience | en |
dc.title | Toward Bridging the Semantic Gap Between x86-64 Software Binaries and Abstract Languages for Formal Verification and Security | en |
dc.type | Dissertation | en |
thesis.degree.discipline | Computer Engineering | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | doctoral | en |
thesis.degree.name | Doctor of Philosophy | en |