Measurement of Embedding Choices on Cryptographic API Completion Tasks

dc.contributor.authorXiao, Yaen
dc.contributor.authorSong, Wenjiaen
dc.contributor.authorAhmed, Salmanen
dc.contributor.authorGe, Xinyangen
dc.contributor.authorViswanath, Bimalen
dc.contributor.authorMeng, Naen
dc.contributor.authorYao, Danfengen
dc.date.accessioned2023-11-02T13:02:21Zen
dc.date.available2023-11-02T13:02:21Zen
dc.date.issued2023-10en
dc.date.updated2023-11-01T08:00:29Zen
dc.description.abstractIn this paper, we conduct a measurement study to comprehensively compare the accuracy impacts of multiple embedding options in cryptographic API completion tasks. Embedding is the process of automatically learning vector representations of program elements. Our measurement focuses on design choices of three important aspects, program analysis preprocessing, token-level embedding, and sequence-level embedding. Our findings show that program analysis is necessary even under advanced embedding. The results show 36.20% accuracy improvement on average when program analysis preprocessing is applied to transfer byte code sequences into API dependence paths. With program analysis and the token-level embedding training, the embedding dep2vec improves the task accuracy from 55.80% to 92.04%. Moreover, only a slight accuracy advantage (0.55% on average) is observed by training the expensive sequence-level embedding compared with the token-level embedding. Our experiments also suggest the differences made by the data. In the cross-app learning setup and a data scarcity scenario, sequence-level embedding is more necessary and results in a more obvious accuracy improvement (5.10%)en
dc.description.versionAccepted versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1145/3625291en
dc.identifier.urihttp://hdl.handle.net/10919/116585en
dc.language.isoenen
dc.publisherACMen
dc.rightsIn Copyrighten
dc.rights.holderThe author(s)en
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.titleMeasurement of Embedding Choices on Cryptographic API Completion Tasksen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3625291.pdf
Size:
1.54 MB
Format:
Adobe Portable Document Format
Description:
Accepted version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description: