Liu, Fang2017-08-102017-08-102017-08-09vt_gsexam:12266http://hdl.handle.net/10919/78684Cyber security risk has been a problem ever since the appearance of telecommunication and electronic computers. In the recent 30 years, researchers have developed various tools to protect the confidentiality, integrity, and availability of data and programs. However, new challenges are emerging as the amount of data grows rapidly in the big data era. On one hand, attacks are becoming stealthier by concealing their behaviors in massive datasets. One the other hand, it is becoming more and more difficult for existing tools to handle massive datasets with various data types. This thesis presents the attempts to address the challenges and solve different security problems by mining security risks from massive datasets. The attempts are in three aspects: detecting security risks in the enterprise environment, prioritizing security risks of mobile apps and measuring the impact of security risks between websites and mobile apps. First, the thesis presents a framework to detect data leakage in very large content. The framework can be deployed on cloud for enterprise and preserve the privacy of sensitive data. Second, the thesis prioritizes the inter-app communication risks in large-scale Android apps by designing new distributed inter-app communication linking algorithm and performing nearest-neighbor risk analysis. Third, the thesis measures the impact of deep link hijacking risk, which is one type of inter-app communication risks, on 1 million websites and 160 thousand mobile apps. The measurement reveals the failure of Google's attempts to improve the security of deep links.ETDIn CopyrightCyber SecurityBig Data SecurityMobile SecurityData Leakage DetectionMining Security Risks from Massive DatasetsDissertation