A Measurement Approach to Understanding the Data Flow of Phishing From Attacker and Defender Perspectives
MetadataShow full item record
Phishing has been a big concern due to its active roles in recent data breaches and state- sponsored attacks. While existing works have extensively analyzed phishing websites and detection methods, there is still a limited understanding of the data flow of the phishing process. In this thesis, we perform an empirical measurement to draw a clear picture of the data flow of phishing from both attacker and defender perspectives. First, from attackers' perspective, we want to know how attackers collect the sensitive information stolen from victims throughout the end-to-end phishing attack process. So we collected more than 179,000 real-world phishing URLs. Then we build a measurement tool to feed fake credentials to live phishing sites and monitor how the credential information is shared with the phishing server and potentially third-party collectors on the client side. Besides, we also obtain phishing kits to analyze how credentials are sent to attackers and third-parties on the server side. Then, from defenders' perspective, online scan engines such as VirusTotal are heavily used by phishing defenders to label phishing URLs, however, the data flow behind phishing detection by those scan engines is still unclear. So we build our own phishing websites, submit them to VirusTotal for scanning, to understand how VirusTotal works and the quality of its labels. Our study reveals the key mechanisms for information sharing during phishing attacks and the need for developing more rigorous methodologies to assess and make use of the labels obtained from VirusTotal.
General Audience Abstract
Phishing attack is the fraudulent attempt to lure the target users to give away sensitive information such as usernames, passwords and credit card details. Cybercriminals usually build phishing websites (mimicking a trustworthy entity), and trick users to reveal important credentials. However, the data flow of phishing process is still unclear. From attackers' per- spective, we want to know how attackers collect the sensitive information stolen by phishing websites. On the other hand, from defenders' perspective, we are trying to figure out how online scan engines (e.g., VirusTotal) detect phishing URLs and how reliable their detection results are. In this thesis, we perform an empirical measurement to help answer the two questions above. By monitoring and analyzing a large number of real-world phishing websites, we draw a clear picture of the credential sharing process during phishing attacks. Also, by building our own phishing websites and submitting to VirusTotal for scanning, we find that more rigorous methodologies to use VirusTotal labels are desperately needed.
- Masters Theses