Browsing by Author "Cai, Haipeng"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- DroidCat: Unified Dynamic Detection of Android MalwareCai, Haipeng; Meng, Na; Ryder, Barbara G.; Yao, Danfeng (Daphne) (Department of Computer Science, Virginia Polytechnic Institute & State University, 2016)Various dynamic approaches have been developed to detect or categorize Android malware. These approaches execute software, collect call traces, and then detect abnormal system calls or sensitive API usage. Consequently, attackers can evade these approaches by intentionally obfuscating those calls under focus. Additionally, existing approaches treat detection and categorization of malware as separate tasks, although intuitively both tasks are relevant and could be performed simultaneously. This paper presents DroidCat, the first unified dynamic malware detection approach, which not only detects malware, but also pinpoints the malware family. DroidCat leverages supervised machine learning to train a multi-class classifier using diverse behavioral profiles of benign apps and different kinds of malware. Compared with prior heuristics-based machine learning-based approaches, the feature set used in DroidCat is decided purely based on a systematic dynamic characterization study of benign and malicious apps. All differentiating features that show behavioral differences between benign and malicious apps are included. In this way, DroidCat is robust to existing evasion attacks. We evaluated DroidCat using leave-one-out cross validation with 136 benign apps and 135 malicious apps. The evaluation shows that DroidCat provided an effective and scalable unified malware detection solution with 81% precision, 82% recall, and 92% accuracy.
- A First Look at Security and Privacy Risks in the RapidAPI EcosystemLiao, Song; Cheng, Long; Luo, Xiapu; Song, Zheng; Cai, Haipeng; Yao, Danfeng (Daphne); Hu, Hongxin (ACM, 2024-12-02)With the emergence of the open API ecosystem, third-party developers can publish their APIs on the API marketplace, significantly facilitating the development of cutting-edge features and services. The RapidAPI platform is currently the largest API marketplace and it provides over 40,000 APIs, which have been used by more than 4 million developers. However, such open API also raises security and privacy concerns associated with APIs hosted on the platform. In this work, we perform the first large-scale analysis of 32,089 APIs on the RapidAPI platform. By searching in the GitHub code and Android apps, we find that 3,533 RapidAPI keys, which are important and used in API request authorization, have been leaked in the wild. These keys can be exploited to launch various attacks, such as Resource Exhaustion Running, Theft of Service, Data Manipulation, and User Data Breach attacks. We also explore risks in API metadata that can be abused by adversaries. Due to the lack of a strict certification system, adversaries can manipulate the API metadata to perform typosquatting attacks on API URLs, impersonate other developers or renowned companies, and publish spamming APIs on the platform. Lastly, we analyze the privacy non-compliance of APIs and applications, e.g., Android apps, that call these APIs with data collection. We find that 1,709 APIs collect sensitive data and 94% of them don’t provide a complete privacy policy. For the Android apps that call these APIs, 50% of them in our study have privacy non-compliance issues.
- How are Multilingual Systems Constructed: Characterizing Language Use and Selection in Open-Source Multilingual SoftwareLi, Wen; Marino, Austin; Yang, Haoran; Meng, Na; Li, Li; Cai, Haipeng (ACM, 2023-12)For many years now, modern software is known to be developed in multiple languages (hence termed as multilingual or multi-language software). Yet to this date we still only have very limited knowledge about how multilingual software systems are constructed. For instance, it is not yet really clear how diferent languages are used, selected together, and why they have been so in multilingual software development. Given the fact that using multiple languages in a single software project has become a norm, understanding language use and selection (i.e, language proile) as a basic element of the multilingual construction in contemporary software engineering is an essential first step. In this paper, we set out to ill this gap with a large-scale characterization study on language use and selection in open-source multilingual software. We start with presenting an updated overview of language use in 7,113 GitHub projects spanning ive past years by characterizing overall statistics of language proiles, followed by a deeper look into the functionality relevance/justiication of language selection in these projects through association rule mining.We proceed with an evolutionary characterization of 1,000 GitHub projects for each of 10 past years to provide a longitudinal view of how language use and selection have changed over the years, as well as how the association between functionality and language selection has been evolving. Among many other indings, our study revealed a growing trend of using 3 to 5 languages in one multilingual software project and noticeable stableness of top language selections. We found a non-trivial association between language selection and certain functionality domains, which was less stable than that with individual languages over time. In a historical context, we also have observed major shifts in these characteristics of multilingual systems both in contrast to earlier peer studies and along the evolutionary timeline. Our indings ofer essential knowledge on the multilingual construction in modern software development. Based on our results, we also provide insights and actionable suggestions for both researchers and developers of multilingual systems.
- A Lightweight Approach of Human-Like Playtest for Android AppsZhao, Yan (Virginia Tech, 2022-02-01)Testing is recognized as a key and challenging factor that can either boost or halt the game development in the mobile game industry. On one hand, manual testing is expensive and time-consuming, especially the wide spectrum of device hardware and software, so called fragmentation, significantly increases the cost to test applications on devices manually. On the other hand, automated testing is also very difficult due to more inherent technical issues to test games as compared to other mobile applications, such as non-native widgets, nondeterminism , complex game strategies and so on. Current testing frameworks (e.g., Android Monkey, Record and Replay) are limited because they adopt no domain knowledge to test games. Learning-based tools (e.g., Wuji) require tremendous resources and manual efforts to train a model before testing any game. The high cost of manual testing and lack of efficient testing tools for mobile games motivated the work presented in this thesis which aims to explore easy and efficient approaches to test mobile games efficiently and effectively. A new Android mobile game testing tool, called LIT, has been developed. LIT is a lightweight approach to generalize playtest tactics from manual testing, and to adopt the tactics for automatic game testing. LIT has two phases: tactic generalization and tactic concretization. In Phase I, when a human tester plays an Android game G for awhile (e.g., eight minutes), LIT records the tester's inputs and related scenes. Based on the collected data, LIT infers a set of context-aware, abstract playtest tactics that describe under what circumstances, what actions can be taken. In Phase II,LIT tests G based on the generalized tactics. Namely, given a randomly generated game scene, LIT tentatively matches that scene with the abstract context of any inferred tactic; if the match succeeds, LIT customizes the tactic to generate an action for playtest. Our evaluation with nine games shows LIT to outperform two state-of-the-art tools and are reinforcement learning (RL)-based tool, by covering more code and triggering more errors. This implies that LIT complements existing tools and helps developers better test certain games (e.g., match3).
- Understanding Application Behaviours for Android Security: A Systematic CharacterizationCai, Haipeng; Ryder, Barbara G. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2016)In contrast to most existing research on Android focusing on specific security issues, there is little broad understanding of Android application run-time characteristics and their security implications. To mitigate this gap, we present the first dynamic characterization study of Android applications that targets such a broad understanding for Android security. Through lightweight method-level profiling, we have collected 33GB traces of method calls and inter-component communication (ICC) from 114 popular Android applications on Google Play and 61 communicating pairs among them that enabled an extensive empirical investigation of the run-time behaviours of Android applications. Our study revealed that (1) the Android framework was the target of 88.3% of all calls during application executions, (2) callbacks accounted for merely 3% of the total method calls, (3) 75% of ICCs did not carry any data payloads with those doing so preferring bundles over URIs, (4) 85% of sensitive data sources and sinks targeted one or two top categories of information or operations which were also most likely to constitute data leaks. We discuss the security implications of our findings to secure development and effective security defense of modern Android applications.