How Reliable is the Crowdsourced Knowledge of Security Implementation?

dc.contributor.authorChen, Mengsuen
dc.contributor.committeechairMeng, Naen
dc.contributor.committeememberTilevich, Elien
dc.contributor.committeememberYao, Danfeng (Daphne)en
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2019-01-24T20:16:39Zen
dc.date.available2019-01-24T20:16:39Zen
dc.date.issued2018-12en
dc.description.abstractThe successful crowdsourcing model and gamification design of Stack Overflow (SO) Q&A platform have attracted many programmers to ask and answer technical questions, regardless of their level of expertise. Researchers have recently found evidence of security vulnerable code snippets being possibly copied from SO to production software. This inspired us to study how reliable is SO in providing secure coding suggestions. In this project, we automatically extracted answer posts related to Java security APIs from the entire SO site. Then based on the known misuses of these APIs, we manually labeled each extracted code snippets as secure or insecure. In total, we extracted 953 groups of code snippets in terms of their similarity detected by clone detection tools, which corresponds to 785 secure answer posts and 644 insecure answer posts. Compared with secure answers, counter-intuitively, insecure answers has higher view counts (36,508 vs. 18,713), higher score (14 vs. 5), more duplicates (3.8 vs. 3.0) on average. We also found that 34% of answers provided by the so-called trusted users who have administrative privileges are insecure. Our finding reveals that there are comparable numbers of secure and insecure answers. Users cannot rely on community feedback to differentiate secure answers from insecure answers either. Therefore, solutions need to be developed beyond the current mechanism of SO or on the utilization of SO in security-sensitive software development.en
dc.description.abstractgeneralStack Overflow (SO), the most popular question and answer platform for programmers today, has accumulated and continues accumulating tremendous question and answer posts since its launch a decade ago. Contributed by numerous users all over the world, these posts are a type of crowdsourced knowledge. In the past few years, they have been the main reference source for software developers. Studies have shown that code snippets in answer posts are copied into production software. This is a dangerous sign because the code snippets contributed by SO users are not guaranteed to be secure implementations of critical functions, such as transferring sensitive information on the internet. In this project, we conducted a comprehensive study on answer posts related to Java security APIs. By labeling code snippets as secure or insecure, contrasting their distributions over associated attributes such as post score and user reputation, we found that there are a significant number of insecure answers (644 insecure vs 785 secure in our study) on Stack Overflow. Our statistical analysis also revealed the infeasibility of differentiating between secure and insecure posts leveraging the current community feedback system (eg. voting) of Stack Overflow.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.urihttp://hdl.handle.net/10919/86885en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectStack Overflowen
dc.subjectcrowdsourced knowledgeen
dc.subjectsocial dynamicsen
dc.subjectsecurity implementationen
dc.subjectclone detectionen
dc.titleHow Reliable is the Crowdsourced Knowledge of Security Implementation?en
dc.typeThesisen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chen_M_T_2018.pdf
Size:
479.75 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections