Characterizing and Detecting Online Deception via Data-Driven Methods

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Tech

In recent years, online deception has become a major threat to information security. Online deception that caused significant consequences is usually spear phishing. Spear-phishing emails come in a very small volume, target a small number of audiences, sometimes impersonate a trusted entity and use very specific content to redirect targets to a phishing website, where the attacker tricks targets sharing their credentials.

In this thesis, we aim at measuring the entire process. Starting from phishing emails, we examine anti-spoofing protocols, analyze email services' policies and warnings towards spoofing emails, and measure the email tracking ecosystem. With phishing websites, we implement a powerful tool to detect domain name impersonation and detect phishing pages using dynamic and static analysis. We also analyze credential sharing on phishing websites, and measure what happens after victims share their credentials. Finally, we discuss potential phishing and privacy concerns on new platforms such as Alexa and Google Assistant.

In the first part of this thesis (Chapter 3), we focus on measuring how email providers detect and handle forged emails. We also try to understand how forged emails can reach user inboxes by deliberately composing emails. Finally, we check how email providers warn users about forged emails. In the second part (Chapter 4), we measure the adoption of anti-spoofing protocols and seek to understand the reasons behind the low adoption rates. In the third part of this thesis (Chapter 5), we observe that a lot of phishing emails use email tracking techniques to track targets. We collect a large dataset of email messages using disposable email services and measure the landscape of email tracking. In the fourth part of this thesis (Chapter 6), we move on to phishing websites. We implement a powerful tool to detect squatting domains and train a machine learning model to classify phishing websites. In the fifth part (Chapter 7), we focus on the credential leaks. More specifically, we measure what happens after the targets' credentials are leaked. We monitor and measure the potential post-phishing exploiting activities. Finally, with new voice platforms such as Alexa becoming more and more popular, we wonder if new phishing and privacy concerns emerge with new platforms. In this part (Chapter 8), we systematically assess the attack surfaces by measuring sensitive applications on voice assistant systems.

My thesis measures important parts of the complete process of online deception. With deeper understandings of phishing attacks, more complete and effective defense mechanisms can be developed to mitigate attacks in various dimensions.

Spear-Phishing, Email Spoofing, Email Tracking, Domain Squatting, Password Reuse, Voice Personal Assistant