Robustifying Machine Learning based Security Applications
Jan, Steve T. K.
MetadataShow full item record
In recent years, machine learning (ML) has been explored and employed in many fields. However, there are growing concerns about the robustness of machine learning models. These concerns are further amplified in security-critical applications — attackers can manipulate the inputs (i.e., adversarial examples) to cause machine learning models to make a mistake, and it's very challenging to obtain a large amount of attackers' data. These make applying machine learning in security-critical applications difficult. In this dissertation, we present several approaches to robustifying three machine learning based security applications. First, we start from adversarial examples in image recognition. We develop a method to generate robust adversarial examples that remain effective in the physical domain. Our core idea is to use an image-to-image translation network to simulate the digital-to-physical transformation process for generating robust adversarial examples. We further show these robust adversarial examples can improve the robustness of machine learning models by adversarial retraining. The second application is bot detection. We show that the performance of existing machine learning models is not effective if we only have the limit attackers' data. We develop a data synthesis method to address this problem. The key novelty is that our method is distribution aware synthesis, using two different generators in a Generative Adversarial Network to synthesize data for the clustered regions and the outlier regions in the feature space. We show the detection performance using 1% of attackers' data is close to existing methods trained with 100% of the attackers' data. The third component of this dissertation is phishing detection. By designing a novel measurement system, we search and detect phishing websites that adopt evasion techniques not only at the page content level but also at the web domain level. The key novelty is that our system is built on the observation of the evasive behaviors of phishing pages in practice. We also study how existing browsers defenses against phishing websites that impersonate trusted entities at the web domain. Our results show existing browsers are not yet effective to detect them.
General Audience Abstract
Machine learning (ML) is computer algorithms that aim to identify hidden patterns from the data. In recent years, machine learning has been widely used in many fields. The range of them is broad, from natural language to autonomous driving. However, there are growing concerns about the robustness of machine learning models. And these concerns are further amplified in security-critical applications — Attackers can manipulate their inputs (i.e., adversarial examples) to cause machine learning models to predict wrong, and it's highly expensive and difficult to obtain a huge amount of attackers' data because attackers are rare compared to the normal users. These make applying machine learning in security-critical applications concerning. In this dissertation, we seek to build better defenses in three types of machine learning based security applications. The first one is image recognition, by developing a method to generate realistic adversarial examples, the machine learning models are more robust for defending against adversarial examples by adversarial retraining. The second one is bot detection, we develop a data synthesis method to detect malicious bots when we only have the limit malicious bots data. For phishing websites, we implement a tool to detect domain name impersonation and detect phishing pages using dynamic and static analysis.
- Doctoral Dissertations