How Reliable is the Crowdsourced Knowledge of Security Implementation?
The successful crowdsourcing model and gamification design of Stack Overflow (SO) Q&A platform have attracted many programmers to ask and answer technical questions, regardless of their level of expertise. Researchers have recently found evidence of security vulnerable code snippets being possibly copied from SO to production software. This inspired us to study how reliable is SO in providing secure coding suggestions.
In this project, we automatically extracted answer posts related to Java security APIs from the entire SO site. Then based on the known misuses of these APIs, we manually labeled each extracted code snippets as secure or insecure. In total, we extracted 953 groups of code snippets in terms of their similarity detected by clone detection tools, which corresponds to 785 secure answer posts and 644 insecure answer posts. Compared with secure answers, counter-intuitively, insecure answers has higher view counts (36,508 vs. 18,713), higher score (14 vs. 5), more duplicates (3.8 vs. 3.0) on average. We also found that 34% of answers provided by the so-called trusted users who have administrative privileges are insecure. Our finding reveals that there are comparable numbers of secure and insecure answers. Users cannot rely on community feedback to differentiate secure answers from insecure answers either. Therefore, solutions need to be developed beyond the current mechanism of SO or on the utilization of SO in security-sensitive software development.