An, Xiaoxin2022-06-162022-06-162022-06-15vt_gsexam:35073http://hdl.handle.net/10919/110791The translation of binary code to higher-level models has wide applications, including decompilation, binary analysis, and binary rewriting. This calls for high reliability of the underlying trusted computing base (TCB) of the translation methodology. A key challenge is to reduce the TCB by validating its soundness. Both the definition of soundness and the validation method heavily depend on the context: what is in the TCB and how to prove it. This dissertation presents three research contributions. The first two contributions include reducing the TCB in binary verification, and the last contribution includes a binary verification process that leverages a reduced TCB. The first contribution targets the validation of OCaml-to-PVS translation -- commonly used to translate instruction-set-architecture (ISA) specifications to PVS -- where the destination language is non-executable. We present a methodology called OPEV to validate the translation between OCaml and PVS, supporting non-executable semantics. The validation includes generating large-scale tests for OCaml implementations, generating test lemmas for PVS, and generating proofs that automatically discharge these lemmas. OPEV incorporates an intermediate type system that captures a large subset of OCaml types, employing a variety of rules to generate test cases for each type. To prove the PVS lemmas, we develop automatic proof strategies and discharge the test lemmas using PVS Proof-Lite, a powerful proof scripting utility of the PVS verification system. We demonstrate our approach in two case studies that include 259 functions selected from the Sail and Lem libraries. For each function, we generate thousands of test lemmas, all of which are automatically discharged. The dissertation's second contribution targets the soundness validation of a disassembly process where the source language does not have well-defined semantics. Disassembly is a crucial step in binary security, reverse engineering, and binary verification. Various studies in these fields use disassembly tools and hypothesize that the reconstructed disassembly is correct. However, disassembly is an undecidable problem. State-of-the-art disassemblers suffer from issues ranging from incorrectly recovered instructions to incorrectly assessing which addresses belong to instructions and which to data. We present DSV, a systematic and automated approach to validate whether the output of a disassembler is sound with respect to the input binary. No source code, debugging information, or annotations are required. DSV defines soundness using a transition relation defined over concrete machine states: a binary is sound if, for all addresses in the binary that can be reached from the binary's entry point, the bytes of the (disassembled) instruction located at an address are the same as the actual bytes read from the binary. Since computing this transition relation is undecidable, DSV uses over-approximation by preventing false positives (i.e., the existence of an incorrectly disassembled reachable instruction but deemed unreachable) and allowing, but minimizing, false negatives. We apply DSV to 102 binaries of GNU Coreutils with eight different state-of-the-art disassemblers from academia and industry. DSV is able to find soundness issues in the output of all disassemblers. The dissertation's third contribution is WinCheck: a concolic model checker that detects memory-related properties of closed-source binaries. Bugs related to memory accesses are still a major issue for security vulnerabilities. Even a single buffer overflow or use-after-free in a large program may be the cause of a software crash, a data leak, or a hijacking of the control flow. Typical static formal verification tools aim to detect these issues at the source code level. WinCheck is a model-checker that is directly applicable to closed-source and stripped Windows executables. A key characteristic of WinCheck is that it performs its execution as symbolically as possible while leaving any information related to pointers concrete. This produces a model checker tailored to pointer-related properties, such as buffer overflows, use-after-free, null-pointer dereferences, and reading from uninitialized memory. The technique thus provides a novel trade-off between ease of use, accuracy, applicability, and scalability. We apply WinCheck to ten closed-source binaries available in a Windows 10 distribution, as well as the Windows version of the entire Coreutils library. We conclude that the approach taken is precise -- provides only a few false negatives -- but may not explore the entire state space due to unresolved indirect jumps.ETDenIn CopyrightTranslation ValidationDisassembly SoundnessBinary VerificationRandom TestingSymbolic ExecutionBounded Model CheckingOn Reducing the Trusted Computing Base in Binary VerificationDissertation