Robustifying Machine Learning based Security Applications

Jan, Steve T. K.

Robustifying Machine Learning based Security Applications

dc.contributor.author	Jan, Steve T. K.	en
dc.contributor.committeechair	Wang, Gang Alan	en
dc.contributor.committeechair	Viswanath, Bimal	en
dc.contributor.committeemember	Yao, Danfeng (Daphne)	en
dc.contributor.committeemember	Huang, Jia-Bin	en
dc.contributor.committeemember	Xing, Xinyu	en
dc.contributor.committeemember	Ramakrishnan, Naren	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2020-08-28T08:01:07Z	en
dc.date.available	2020-08-28T08:01:07Z	en
dc.date.issued	2020-08-27	en
dc.description.abstract	In recent years, machine learning (ML) has been explored and employed in many fields. However, there are growing concerns about the robustness of machine learning models. These concerns are further amplified in security-critical applications — attackers can manipulate the inputs (i.e., adversarial examples) to cause machine learning models to make a mistake, and it's very challenging to obtain a large amount of attackers' data. These make applying machine learning in security-critical applications difficult. In this dissertation, we present several approaches to robustifying three machine learning based security applications. First, we start from adversarial examples in image recognition. We develop a method to generate robust adversarial examples that remain effective in the physical domain. Our core idea is to use an image-to-image translation network to simulate the digital-to-physical transformation process for generating robust adversarial examples. We further show these robust adversarial examples can improve the robustness of machine learning models by adversarial retraining. The second application is bot detection. We show that the performance of existing machine learning models is not effective if we only have the limit attackers' data. We develop a data synthesis method to address this problem. The key novelty is that our method is distribution aware synthesis, using two different generators in a Generative Adversarial Network to synthesize data for the clustered regions and the outlier regions in the feature space. We show the detection performance using 1% of attackers' data is close to existing methods trained with 100% of the attackers' data. The third component of this dissertation is phishing detection. By designing a novel measurement system, we search and detect phishing websites that adopt evasion techniques not only at the page content level but also at the web domain level. The key novelty is that our system is built on the observation of the evasive behaviors of phishing pages in practice. We also study how existing browsers defenses against phishing websites that impersonate trusted entities at the web domain. Our results show existing browsers are not yet effective to detect them.	en
dc.description.abstractgeneral	Machine learning (ML) is computer algorithms that aim to identify hidden patterns from the data. In recent years, machine learning has been widely used in many fields. The range of them is broad, from natural language to autonomous driving. However, there are growing concerns about the robustness of machine learning models. And these concerns are further amplified in security-critical applications — Attackers can manipulate their inputs (i.e., adversarial examples) to cause machine learning models to predict wrong, and it's highly expensive and difficult to obtain a huge amount of attackers' data because attackers are rare compared to the normal users. These make applying machine learning in security-critical applications concerning. In this dissertation, we seek to build better defenses in three types of machine learning based security applications. The first one is image recognition, by developing a method to generate realistic adversarial examples, the machine learning models are more robust for defending against adversarial examples by adversarial retraining. The second one is bot detection, we develop a data synthesis method to detect malicious bots when we only have the limit malicious bots data. For phishing websites, we implement a tool to detect domain name impersonation and detect phishing pages using dynamic and static analysis.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:27235	en
dc.identifier.uri	http://hdl.handle.net/10919/99862	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Machine learning	en
dc.subject	Security	en
dc.subject	Bot Detection	en
dc.subject	Phishing Attacks	en
dc.title	Robustifying Machine Learning based Security Applications	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Jan_ST_D_2020.pdf
Size:: 5.39 MB
Format:: Adobe Portable Document Format

Download

Name:: Jan_ST_D_2020_support_1.pdf
Size:: 50.18 KB
Format:: Adobe Portable Document Format
Description:: Supporting documents

Download

Collections

Doctoral Dissertations