RALLY: Retargetable Assembly Lifter to LLVM IR

dc.contributor.authorGoel, Kartiken
dc.contributor.committeechairRavindran, Binoyen
dc.contributor.committeememberVerbeek, Freeken
dc.contributor.committeememberGiles, Kendall Everetten
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2026-06-23T08:01:06Zen
dc.date.available2026-06-23T08:01:06Zen
dc.date.issued2026-06-22en
dc.description.abstractBinary lifting to compiler-level intermediate representations enables powerful analysis, re-optimization, and cross-architecture porting for legacy software. However, traditional lifters operating directly on raw binaries suffer from irreversible information loss, forcing reliance on fragile heuristics that misclassify code, break position-independent code (PIC) semantics, and produce unsound control-flow graphs. In this paper, we present RALLY, a retargetable lifter that translates symbolized x86-64 NASM assembly into LLVM IR. By consuming symbolized assembly, such as the output of formally verified decompilation tools, RALLY circumvents the error-prone disassembly and code/data separation phases that plague binary-first approaches. Instead, RALLY treats assembly as a first-class input, using a deterministic grammar to structurally preserve PIC constructs and symbolic references from the outset. To bridge the semantic gap between assembly's implicit machine state and LLVM's explicit value semantics, RALLY introduces an integrated pipeline that recovers control flow, reconstructs ABI-compliant function signatures, and explicitly models architectural state. This approach produces semantically rich, optimizable LLVM IR without the heuristic compromises inherent in raw-binary lifting, enabling robust static analysis and retargetable compilation for modern position-independent software. To understand RALLY's effectiveness for recompilation and retargetability, we conducted experimental studies using NASM suite test cases, algorithmic C programs, microbenchmarks, and real-world applications targeting x86-64 and ARM Linux. Our evaluations reveal that RALLY achieves high semantic correctness, broad instruction-family coverage, and produces lifted IR that faithfully reproduces native execution behavior while successfully enabling cross-architecture recompilation.en
dc.description.abstractgeneralTranslating legacy software from raw machine code back into a flexible compiler format holds massive potential for updating, analyzing, and migrating old programs to modern processors, but traditional tools struggle because the original compilation process strips away critical details. To fill in these blanks, older tools rely on fragile guesswork that frequently misinterprets the code, breaks memory rules, and scrambles the program's logical flow. To overcome this, we developed RALLY, a new translation tool that works with enriched, clearly labeled assembly language, rather than raw machine code, to generate code for LLVM, a powerful modern compiler framework. By starting with this higher-quality input, which can be provided by highly reliable reverse-engineering software, RALLY completely bypasses the messy guessing stages that plague traditional methods. Instead, it uses a strict, rule-based system to perfectly preserve the program's original structure and memory references from the very beginning. RALLY then expertly bridges the gap between the hidden mechanics of hardware and the strict requirements of modern compilers, meticulously rebuilding the program's logical flow, restoring its original blueprints, and clearly defining its hardware interactions. The result is high-quality, easily optimizable code built entirely without guesswork, empowering developers to seamlessly modernize and port software. Extensive testing across various benchmarks and real-world applications confirmed that RALLY consistently produces code that behaves exactly like the original, while successfully adapting it to run on entirely new processor architectures, such as moving from standard x86-64 chips to ARM.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:47185en
dc.identifier.urihttps://hdl.handle.net/10919/143477en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution-NonCommercial-ShareAlike 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en
dc.subjectBinary Liftingen
dc.subjectDecompilationen
dc.subjectProgram Analysisen
dc.subjectData-Flow Analysisen
dc.subjectControl-Flow Analysisen
dc.titleRALLY: Retargetable Assembly Lifter to LLVM IRen
dc.typeThesisen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Name:
Goel_K_T_2026.pdf
Size:
2.36 MB
Format:
Adobe Portable Document Format

Collections