Browsing by Author "Shu, Xiaokui"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
- Data Leak Detection As a Service: Challenges and SolutionsShu, Xiaokui; Yao, Danfeng (Daphne) (Department of Computer Science, Virginia Polytechnic Institute & State University, 2012)We describe a network-based data-leak detection (DLD) technique, the main feature of which is that the detection does not require the data owner to reveal the content of the sensitive data. Instead, only a small amount of specialized digests are needed. Our technique – referred to as the fuzzy fingerprint – can be used to detect accidental data leaks due to human errors or application flaws. The privacy-preserving feature of our algorithms minimizes the exposure of sensitive data and enables the data owner to safely delegate the detection to others.We describe how cloud providers can offer their customers data-leak detection as an add-on service with strong privacy guarantees. We perform extensive experimental evaluation on the privacy, efficiency, accuracy and noise tolerance of our techniques. Our evaluation results under various data-leak scenarios and setups show that our method can support accurate detection with very small number of false alarms, even when the presentation of the data has been transformed. It also indicates that the detection accuracy does not degrade when partial digests are used. We further provide a quantifiable method to measure the privacy guarantee offered by our fuzzy fingerprint framework.
- Fast Detection of Transformed Data LeaksShu, Xiaokui; Zhang, Jing; Yao, Danfeng (Daphne); Feng, Wu-chun (IEEE, 2016-03-01)
- Hadoop Map-reduceShu, Xiaokui; Cohen, Ron (2010-12-10)Hadoop Map-Reduce is a software framework for writing applications for processing large amounts of data in parallel on commodity hardware.
- Natural Language Toolkit (NLTK)Shu, Xiaokui; Cohen, Ron (2010-10-25)This module is a teaching and studying platform for prototyping and building research systems on natural language processing (NLP), related to linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning.
- Privacy-Preserving Scanning of Big Content for Sensitive Data Exposure with MapReduceLiu, Fang; Shu, Xiaokui; Yao, Danfeng (Daphne); Butt, Ali R. (2015-02-06)The exposure of sensitive data in storage and transmission poses a serious threat to organizational and personal security. Data leak detection aims at scanning content (in storage or transmission) for exposed sensitive data. Because of the large content and data volume, such a screening algorithm needs to be scalable for a timely detection. Our solution uses the MapReduce framework for detecting exposed sensitive content, because it has the ability to arbitrarily scale and utilize public resources for the task, such as Amazon EC2. We design new MapReduce algorithms for computing collection intersection for data leak detection. Our prototype implemented with the Hadoop system achieves 225 Mbps analysis throughput with 24 nodes. Our algorithms support a useful privacy-preserving data transformation. This transformation enables the privacy-preserving technique to minimize the exposure of sensitive data during the detection. This transformation supports the secure outsourcing of the data leak detection to untrusted MapReduce and cloud providers.
- Rapid Screening of Transformed Data Leaks with Efficient Algorithms and Parallel ComputingShu, Xiaokui; Zhang, Jing; Yao, Danfeng (Daphne); Feng, Wu-chun (ACM, 2015-03)The leak of sensitive data on computer systems poses a serious threat to organizational security. Organizations need to identify the exposure of sensitive data by screening the content in storage and transmission, i.e., to detect sensitive information being stored or transmitted in the clear. However, detecting the exposure of sensitive information is challenging due to data transformation in the content. Transformations (such as insertion, deletion) result in highly unpredictable leak patterns. Existing automata-based string matching algorithms are impractical for detecting transformed data leaks, because of its formidable complexity when modeling the required regular expressions. We design two new algorithms for detecting long and transformed data leaks. Our system achieves high detection accuracy in recognizing transformed leaks compared to the state-of-the-art inspection methods. We parallelize our prototype on graphics processing unit and demonstrate the strong scalability of our detection solution required by a sizable organization.
- Threat Detection in Program Execution and Data Movement: Theory and PracticeShu, Xiaokui (Virginia Tech, 2016-06-25)Program attacks are one of the oldest and fundamental cyber threats. They compromise the confidentiality of data, the integrity of program logic, and the availability of services. This threat becomes even severer when followed by other malicious activities such as data exfiltration. The integration of primitive attacks constructs comprehensive attack vectors and forms advanced persistent threats. Along with the rapid development of defense mechanisms, program attacks and data leak threats survive and evolve. Stealthy program attacks can hide in long execution paths to avoid being detected. Sensitive data transformations weaken existing leak detection mechanisms. New adversaries, e.g., semi-honest service provider, emerge and form threats. This thesis presents theoretical analysis and practical detection mechanisms against stealthy program attacks and data leaks. The thesis presents a unified framework for understanding different branches of program anomaly detection and sheds light on possible future program anomaly detection directions. The thesis investigates modern stealthy program attacks hidden in long program executions and develops a program anomaly detection approach with data mining techniques to reveal the attacks. The thesis advances network-based data leak detection mechanisms by relaxing strong requirements in existing methods. The thesis presents practical solutions to outsource data leak detection procedures to semi-honest third parties and identify noisy or transformed data leaks in network traffic.
- Unearthing Stealthy Program Attacks Buried in Extremely Long Execution PathsShu, Xiaokui; Yao, Danfeng (Daphne); Ramakrishnan, Naren (ACM, 2015-10)Modern stealthy exploits can achieve attack goals without introducing illegal control flows, e.g., tampering with noncontrol data and waiting for the modified data to propagate and alter the control flow legally. Existing program anomaly detection systems focusing on legal control flow attestation and short call sequence verification are inadequate to detect such stealthy attacks. In this paper, we point out the need to analyze program execution paths and discover event correlations in large-scale execution windows among millions of instructions. We propose an anomaly detection approach with two-stage machine learning algorithms to recognize diverse normal call-correlation patterns and detect program attacks at both inter- and intra-cluster levels. We implement a prototype of our approach and demonstrate its effectiveness against three real-world attacks and four synthetic anomalies with less than 0.01% false positive rates and 0.1~1.3 ms analysis overhead per behavior instance (1k to 50k function or system calls).