Helping Developers Migrate their Code across Programming Languages

dc.contributor.authorElarnaoty, Mohammed Elsayeden
dc.contributor.committeechairServant Cortes, Francisco Javieren
dc.contributor.committeememberPoshyvanyk, Denysen
dc.contributor.committeememberNorth, Christopher L.en
dc.contributor.committeememberPrakash, Bodicherla Adityaen
dc.contributor.committeememberMeng, Naen
dc.contributor.departmentComputer Science and#38; Applicationsen
dc.date.accessioned2024-10-16T08:00:11Zen
dc.date.available2024-10-16T08:00:11Zen
dc.date.issued2024-10-15en
dc.description.abstractMigrating source code from one programming language to another is a common task in software development. This migration can be done by completely rewriting the code in the target language, or it can be facilitated through code-reuse or automation techniques. This thesis explores both approaches. For code-reuse, two new cross-language code search techniques are proposed that enable developers to search for code in one language using code from another. These techniques address the limitations of existing methods in the context of code migration. The first technique leverages a Siamese network combined with Word2Vec embeddings, while the second employs transformers. For code automation, the concept of Translation Types is introduced to categorize code translations. An empirical study was conducted to analyze the differences between human-translated and machine-translated code. Based on these findings, two multi-output code translation techniques were developed that produce multiple translations aligned with the different styles that developers use when translating their code. The first tool employs a denoising autoencoder and a blueprint-guided beam search algorithm to generate translations of specific types. This algorithm mimics the translation operations that developers apply in similar software projects. The second tool utilizes GPT-4 with a specialized prompt to generate translations tailored to the requested types. In the evaluation, these approaches produced automated code translations that better aligned with developer preferences while maintaining correctness compared to existing methods.en
dc.description.abstractgeneralIn the world of software development, it is often necessary to convert code written in one programming language into another. This process can be quite time-consuming, especially if developers have to rewrite everything from scratch. To make this task easier, this thesis explores two approaches: finding reusable code snippets in other languages and using automated tools to translate code. Firstly, this thesis presents two techniques that help developers search for similar code written in different programming languages. These techniques aim to accurately retrieve potential code snippets, ensuring that developers find what they need quickly, with the most relevant results appearing at the top of the list. The two techniques use machine learning models to understand and match code across languages. Additionally, this thesis explores ways to automate code translation by recognizing that different developers have their own style when translating code. A taxonomy of "Translation Types" is introduced to capture these differences. After studying how human and machine translations vary, two existing tools were adapted to generate translations. The first tool uses machine learning to create translations based on common developer patterns, while the second employs the powerful GPT-4 model to produce translations tailored to specific developer styles. Overall, the presented approaches in this thesis enable developers to convert code accurately and efficiently, reducing the time and effort needed for software migration.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:41590en
dc.identifier.urihttps://hdl.handle.net/10919/121345en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectcross-language code migrationen
dc.subjectclone detectionen
dc.subjectcode translationen
dc.subjectsoftware engineeringen
dc.subjectmachine learningen
dc.titleHelping Developers Migrate their Code across Programming Languagesen
dc.typeDissertationen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Name:
Elarnaoty_ME_D_2024.pdf
Size:
4.83 MB
Format:
Adobe Portable Document Format