Robustifying Machine Learning based Security Applications
In recent years, machine learning (ML) has been explored and employed in many fields. However, there are growing concerns about the robustness of machine learning models. These concerns are further amplified in security-critical applications — attackers can manipulate the inputs (i.e., adversarial examples) to cause machine learning models to make a mistake, and it's very challenging to obtain a large amount of attackers' data. These make applying machine learning in security-critical applications difficult.
In this dissertation, we present several approaches to robustifying three machine learning based security applications. First, we start from adversarial examples in image recognition. We develop a method to generate robust adversarial examples that remain effective in the physical domain. Our core idea is to use an image-to-image translation network to simulate the digital-to-physical transformation process for generating robust adversarial examples. We further show these robust adversarial examples can improve the robustness of machine learning models by adversarial retraining. The second application is bot detection. We show that the performance of existing machine learning models is not effective if we only have the limit attackers' data. We develop a data synthesis method to address this problem. The key novelty is that our method is distribution aware synthesis, using two different generators in a Generative Adversarial Network to synthesize data for the clustered regions and the outlier regions in the feature space. We show the detection performance using 1% of attackers' data is close to existing methods trained with 100% of the attackers' data. The third component of this dissertation is phishing detection. By designing a novel measurement system, we search and detect phishing websites that adopt evasion techniques not only at the page content level but also at the web domain level. The key novelty is that our system is built on the observation of the evasive behaviors of phishing pages in practice. We also study how existing browsers defenses against phishing websites that impersonate trusted entities at the web domain. Our results show existing browsers are not yet effective to detect them.