A Measurement Approach to Understanding the Data Flow of Phishing From Attacker and Defender Perspectives
Phishing has been a big concern due to its active roles in recent data breaches and state- sponsored attacks. While existing works have extensively analyzed phishing websites and detection methods, there is still a limited understanding of the data flow of the phishing process. In this thesis, we perform an empirical measurement to draw a clear picture of the data flow of phishing from both attacker and defender perspectives. First, from attackers' perspective, we want to know how attackers collect the sensitive information stolen from victims throughout the end-to-end phishing attack process. So we collected more than 179,000 real-world phishing URLs. Then we build a measurement tool to feed fake credentials to live phishing sites and monitor how the credential information is shared with the phishing server and potentially third-party collectors on the client side. Besides, we also obtain phishing kits to analyze how credentials are sent to attackers and third-parties on the server side. Then, from defenders' perspective, online scan engines such as VirusTotal are heavily used by phishing defenders to label phishing URLs, however, the data flow behind phishing detection by those scan engines is still unclear. So we build our own phishing websites, submit them to VirusTotal for scanning, to understand how VirusTotal works and the quality of its labels. Our study reveals the key mechanisms for information sharing during phishing attacks and the need for developing more rigorous methodologies to assess and make use of the labels obtained from VirusTotal.