RALLY: Retargetable Assembly Lifter to LLVM IR
| dc.contributor.author | Goel, Kartik | en |
| dc.contributor.committeechair | Ravindran, Binoy | en |
| dc.contributor.committeemember | Verbeek, Freek | en |
| dc.contributor.committeemember | Giles, Kendall Everett | en |
| dc.contributor.department | Electrical and Computer Engineering | en |
| dc.date.accessioned | 2026-06-23T08:01:06Z | en |
| dc.date.available | 2026-06-23T08:01:06Z | en |
| dc.date.issued | 2026-06-22 | en |
| dc.description.abstract | Binary lifting to compiler-level intermediate representations enables powerful analysis, re-optimization, and cross-architecture porting for legacy software. However, traditional lifters operating directly on raw binaries suffer from irreversible information loss, forcing reliance on fragile heuristics that misclassify code, break position-independent code (PIC) semantics, and produce unsound control-flow graphs. In this paper, we present RALLY, a retargetable lifter that translates symbolized x86-64 NASM assembly into LLVM IR. By consuming symbolized assembly, such as the output of formally verified decompilation tools, RALLY circumvents the error-prone disassembly and code/data separation phases that plague binary-first approaches. Instead, RALLY treats assembly as a first-class input, using a deterministic grammar to structurally preserve PIC constructs and symbolic references from the outset. To bridge the semantic gap between assembly's implicit machine state and LLVM's explicit value semantics, RALLY introduces an integrated pipeline that recovers control flow, reconstructs ABI-compliant function signatures, and explicitly models architectural state. This approach produces semantically rich, optimizable LLVM IR without the heuristic compromises inherent in raw-binary lifting, enabling robust static analysis and retargetable compilation for modern position-independent software. To understand RALLY's effectiveness for recompilation and retargetability, we conducted experimental studies using NASM suite test cases, algorithmic C programs, microbenchmarks, and real-world applications targeting x86-64 and ARM Linux. Our evaluations reveal that RALLY achieves high semantic correctness, broad instruction-family coverage, and produces lifted IR that faithfully reproduces native execution behavior while successfully enabling cross-architecture recompilation. | en |
| dc.description.abstractgeneral | Translating legacy software from raw machine code back into a flexible compiler format holds massive potential for updating, analyzing, and migrating old programs to modern processors, but traditional tools struggle because the original compilation process strips away critical details. To fill in these blanks, older tools rely on fragile guesswork that frequently misinterprets the code, breaks memory rules, and scrambles the program's logical flow. To overcome this, we developed RALLY, a new translation tool that works with enriched, clearly labeled assembly language, rather than raw machine code, to generate code for LLVM, a powerful modern compiler framework. By starting with this higher-quality input, which can be provided by highly reliable reverse-engineering software, RALLY completely bypasses the messy guessing stages that plague traditional methods. Instead, it uses a strict, rule-based system to perfectly preserve the program's original structure and memory references from the very beginning. RALLY then expertly bridges the gap between the hidden mechanics of hardware and the strict requirements of modern compilers, meticulously rebuilding the program's logical flow, restoring its original blueprints, and clearly defining its hardware interactions. The result is high-quality, easily optimizable code built entirely without guesswork, empowering developers to seamlessly modernize and port software. Extensive testing across various benchmarks and real-world applications confirmed that RALLY consistently produces code that behaves exactly like the original, while successfully adapting it to run on entirely new processor architectures, such as moving from standard x86-64 chips to ARM. | en |
| dc.description.degree | Master of Science | en |
| dc.format.medium | ETD | en |
| dc.identifier.other | vt_gsexam:47185 | en |
| dc.identifier.uri | https://hdl.handle.net/10919/143477 | en |
| dc.language.iso | en | en |
| dc.publisher | Virginia Tech | en |
| dc.rights | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International | en |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | en |
| dc.subject | Binary Lifting | en |
| dc.subject | Decompilation | en |
| dc.subject | Program Analysis | en |
| dc.subject | Data-Flow Analysis | en |
| dc.subject | Control-Flow Analysis | en |
| dc.title | RALLY: Retargetable Assembly Lifter to LLVM IR | en |
| dc.type | Thesis | en |
| thesis.degree.discipline | Computer Engineering | en |
| thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
| thesis.degree.level | masters | en |
| thesis.degree.name | Master of Science | en |
Files
Original bundle
1 - 1 of 1