Measurement of Embedding Choices on Cryptographic API Completion Tasks

In this paper, we conduct a measurement study to comprehensively compare the accuracy impacts of multiple embedding options in cryptographic API completion tasks. Embedding is the process of automatically learning vector representations of program elements. Our measurement focuses on design choices of three important aspects, program analysis preprocessing, token-level embedding, and sequence-level embedding. Our findings show that program analysis is necessary even under advanced embedding. The results show 36.20% accuracy improvement on average when program analysis preprocessing is applied to transfer byte code sequences into API dependence paths. With program analysis and the token-level embedding training, the embedding dep2vec improves the task accuracy from 55.80% to 92.04%. Moreover, only a slight accuracy advantage (0.55% on average) is observed by training the expensive sequence-level embedding compared with the token-level embedding. Our experiments also suggest the differences made by the data. In the cross-app learning setup and a data scarcity scenario, sequence-level embedding is more necessary and results in a more obvious accuracy improvement (5.10%)

Persistent link

http://hdl.handle.net/10919/116585

Collections

Journal Articles, Association for Computing Machinery (ACM)
Scholarly Works, Computer Science

Full item page

Measurement of Embedding Choices on Cryptographic API Completion Tasks

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections