RALLY: Retargetable Assembly Lifter to LLVM IR

TR Number

Date

2026-06-22

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Binary lifting to compiler-level intermediate representations enables powerful analysis, re-optimization, and cross-architecture porting for legacy software. However, traditional lifters operating directly on raw binaries suffer from irreversible information loss, forcing reliance on fragile heuristics that misclassify code, break position-independent code (PIC) semantics, and produce unsound control-flow graphs. In this paper, we present RALLY, a retargetable lifter that translates symbolized x86-64 NASM assembly into LLVM IR. By consuming symbolized assembly, such as the output of formally verified decompilation tools, RALLY circumvents the error-prone disassembly and code/data separation phases that plague binary-first approaches. Instead, RALLY treats assembly as a first-class input, using a deterministic grammar to structurally preserve PIC constructs and symbolic references from the outset. To bridge the semantic gap between assembly's implicit machine state and LLVM's explicit value semantics, RALLY introduces an integrated pipeline that recovers control flow, reconstructs ABI-compliant function signatures, and explicitly models architectural state. This approach produces semantically rich, optimizable LLVM IR without the heuristic compromises inherent in raw-binary lifting, enabling robust static analysis and retargetable compilation for modern position-independent software. To understand RALLY's effectiveness for recompilation and retargetability, we conducted experimental studies using NASM suite test cases, algorithmic C programs, microbenchmarks, and real-world applications targeting x86-64 and ARM Linux. Our evaluations reveal that RALLY achieves high semantic correctness, broad instruction-family coverage, and produces lifted IR that faithfully reproduces native execution behavior while successfully enabling cross-architecture recompilation.

Description

Keywords

Binary Lifting, Decompilation, Program Analysis, Data-Flow Analysis, Control-Flow Analysis

Citation

Collections