Helping Developers Migrate their Code across Programming Languages
dc.contributor.author | Elarnaoty, Mohammed Elsayed | en |
dc.contributor.committeechair | Servant Cortes, Francisco Javier | en |
dc.contributor.committeemember | Poshyvanyk, Denys | en |
dc.contributor.committeemember | North, Christopher L. | en |
dc.contributor.committeemember | Prakash, Bodicherla Aditya | en |
dc.contributor.committeemember | Meng, Na | en |
dc.contributor.department | Computer Science and#38; Applications | en |
dc.date.accessioned | 2024-10-16T08:00:11Z | en |
dc.date.available | 2024-10-16T08:00:11Z | en |
dc.date.issued | 2024-10-15 | en |
dc.description.abstract | Migrating source code from one programming language to another is a common task in software development. This migration can be done by completely rewriting the code in the target language, or it can be facilitated through code-reuse or automation techniques. This thesis explores both approaches. For code-reuse, two new cross-language code search techniques are proposed that enable developers to search for code in one language using code from another. These techniques address the limitations of existing methods in the context of code migration. The first technique leverages a Siamese network combined with Word2Vec embeddings, while the second employs transformers. For code automation, the concept of Translation Types is introduced to categorize code translations. An empirical study was conducted to analyze the differences between human-translated and machine-translated code. Based on these findings, two multi-output code translation techniques were developed that produce multiple translations aligned with the different styles that developers use when translating their code. The first tool employs a denoising autoencoder and a blueprint-guided beam search algorithm to generate translations of specific types. This algorithm mimics the translation operations that developers apply in similar software projects. The second tool utilizes GPT-4 with a specialized prompt to generate translations tailored to the requested types. In the evaluation, these approaches produced automated code translations that better aligned with developer preferences while maintaining correctness compared to existing methods. | en |
dc.description.abstractgeneral | In the world of software development, it is often necessary to convert code written in one programming language into another. This process can be quite time-consuming, especially if developers have to rewrite everything from scratch. To make this task easier, this thesis explores two approaches: finding reusable code snippets in other languages and using automated tools to translate code. Firstly, this thesis presents two techniques that help developers search for similar code written in different programming languages. These techniques aim to accurately retrieve potential code snippets, ensuring that developers find what they need quickly, with the most relevant results appearing at the top of the list. The two techniques use machine learning models to understand and match code across languages. Additionally, this thesis explores ways to automate code translation by recognizing that different developers have their own style when translating code. A taxonomy of "Translation Types" is introduced to capture these differences. After studying how human and machine translations vary, two existing tools were adapted to generate translations. The first tool uses machine learning to create translations based on common developer patterns, while the second employs the powerful GPT-4 model to produce translations tailored to specific developer styles. Overall, the presented approaches in this thesis enable developers to convert code accurately and efficiently, reducing the time and effort needed for software migration. | en |
dc.description.degree | Doctor of Philosophy | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:41590 | en |
dc.identifier.uri | https://hdl.handle.net/10919/121345 | en |
dc.language.iso | en | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | cross-language code migration | en |
dc.subject | clone detection | en |
dc.subject | code translation | en |
dc.subject | software engineering | en |
dc.subject | machine learning | en |
dc.title | Helping Developers Migrate their Code across Programming Languages | en |
dc.type | Dissertation | en |
thesis.degree.discipline | Computer Science & Applications | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | doctoral | en |
thesis.degree.name | Doctor of Philosophy | en |
Files
Original bundle
1 - 1 of 1