Anomaly Detection Through System and Program Behavior Modeling
Various vulnerabilities in software applications become easy targets for attackers. The trend constantly being observed in the evolution of advanced modern exploits is their growing sophistication in stealthy attacks. Code-reuse attacks such as return-oriented programming allow intruders to execute mal-intended instruction sequences on a victim machine without injecting external code. Successful exploitation leads to hijacked applications or the download of malicious software (drive-by download attack), which usually happens without the notice or permission from users. In this dissertation, we address the problem of host-based system anomaly detection, specifically by predicting expected behaviors of programs and detecting run-time deviations and anomalies. We first introduce an approach for detecting the drive-by download attack, which is one of the major vectors for malware infection. Our tool enforces the dependencies between user actions and system events, such as file-system access and process execution. It can be used to provide real time protection of a personal computer, as well as for diagnosing and evaluating untrusted websites for forensic purposes. We perform extensive experimental evaluation, including a user study with 21 participants, thousands of legitimate websites (for testing false alarms), 84 malicious websites in the wild, as well as lab reproduced exploits. Our solution demonstrates a usable host-based framework for controlling and enforcing the access of system resources. Secondly, we present a new anomaly-based detection technique that probabilistically models and learns a program's control flows for high-precision behavioral reasoning and monitoring. Existing solutions suffer from either incomplete behavioral modeling (for dynamic models) or overestimating the likelihood of call occurrences (for static models). We introduce a new probabilistic anomaly detection method for modeling program behaviors. Its uniqueness is the ability to quantify the static control flow in programs and to integrate the control flow information in probabilistic machine learning algorithms. The advantage of our technique is the significantly improved detection accuracy. We observed 11 up to 28-fold of improvement in detection accuracy compared to the state-of-the-art HMM-based anomaly models. We further integrate context information into our detection model, which achieves both strong flow-sensitivity and context-sensitivity. Our context-sensitive approach gives on average over 10 times of improvement for system call monitoring, and 3 orders of magnitude for library call monitoring, over existing regular HMM methods. Evaluated with a large amount of program traces and real-world exploits, our findings confirm that the probabilistic modeling of program dependences provides a significant source of behavior information for building high-precision models for real-time system monitoring. Abnormal traces (obtained through reproducing exploits and synthesized abnormal traces) can be well distinguished from normal traces by our model.
- Doctoral Dissertations