Helping Developers Migrate their Code across Programming Languages

TR Number

Date

2024-10-15

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Migrating source code from one programming language to another is a common task in software development. This migration can be done by completely rewriting the code in the target language, or it can be facilitated through code-reuse or automation techniques. This thesis explores both approaches. For code-reuse, two new cross-language code search techniques are proposed that enable developers to search for code in one language using code from another. These techniques address the limitations of existing methods in the context of code migration. The first technique leverages a Siamese network combined with Word2Vec embeddings, while the second employs transformers.

For code automation, the concept of Translation Types is introduced to categorize code translations. An empirical study was conducted to analyze the differences between human-translated and machine-translated code. Based on these findings, two multi-output code translation techniques were developed that produce multiple translations aligned with the different styles that developers use when translating their code. The first tool employs a denoising autoencoder and a blueprint-guided beam search algorithm to generate translations of specific types. This algorithm mimics the translation operations that developers apply in similar software projects. The second tool utilizes GPT-4 with a specialized prompt to generate translations tailored to the requested types. In the evaluation, these approaches produced automated code translations that better aligned with developer preferences while maintaining correctness compared to existing methods.

Description

Keywords

cross-language code migration, clone detection, code translation, software engineering, machine learning

Citation