Low-Level Static Analysis for Memory Usage and Control Flow Recovery

TR Number

Date

2023-03-07

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Formal characterization of the memory used by a program is an important basis for security analyses, compositional verification, and identification of noninterference. However, soundly proving memory usage requires operating on the assembly level due to the semantic gap between high-level languages and the code that processors actually execute. Automated methods, such as model checking, would not be able to handle many interesting functions due to the undecidability of memory usage. Fully-interactive methods do not scale well either.

Sound control flow recovery (CFR) is also important for binary decompilation, verification, patching, and security analysis. It lifts raw unstructured data into a form that allows reasoning over behavior and semantics. However, doing so requires interpreting the behavior of the program when indirect or dynamic control flow exists, creating a recursive dependency.

This dissertation tackles the first property with two contributions that perform proof generation combined with interactive theorem proving in a semi-automated manner: an untrusted tool extracts as much information as it can from the functions under test and then generates all the necessary proofs to be completed in a theorem prover. The first, Floyd-style approach still requires significant manual effort but provides good flexibility and ensures no paths are analyzed more than once. In contrast, the second, Hoare-style approach sacrifices some flexibility and avoidance of repeated path evaluation in order to achieve much greater automation. However, neither approach can handle the dynamic control flow caused by indirect branching.

The second property is handled by the second set of contributions of this dissertation. These two contributions provide fully-automated methods of recovering control flow from binaries even in the presence of indirect branching. When such dynamic control flow cannot be overapproximatively resolved, it is clearly noted in the resultant output. In the first approach to control flow recovery, a structured memory representation allows for general analysis of control flow in the presence of indirection, gaining scalability by utilizing context-free function analysis. It supports various aliasing conditions via the usage of nondeterminism, with multiple output states potentially being produced from a given input state. The second approach adds function context and abstract interpretation-inspired modeling of the C++ exception handling (EH) application binary interface (ABI), allowing for the discovery of previously-unknown paths while maintaining or increasing automation.

Description

Keywords

Formal Verification, x86-64 Assembly, Interactive Theorem Proving, Static Binary Analysis, Memory Usage, Control Flow Recovery, Exception Handling

Citation