Rapid Screening of Transformed Data Leaks with Efficient Algorithms and Parallel Computing

TR Number
Date
2015-03
Journal Title
Journal ISSN
Volume Title
Publisher
ACM
Abstract

The leak of sensitive data on computer systems poses a serious threat to organizational security. Organizations need to identify the exposure of sensitive data by screening the content in storage and transmission, i.e., to detect sensitive information being stored or transmitted in the clear. However, detecting the exposure of sensitive information is challenging due to data transformation in the content. Transformations (such as insertion, deletion) result in highly unpredictable leak patterns. Existing automata-based string matching algorithms are impractical for detecting transformed data leaks, because of its formidable complexity when modeling the required regular expressions. We design two new algorithms for detecting long and transformed data leaks. Our system achieves high detection accuracy in recognizing transformed leaks compared to the state-of-the-art inspection methods. We parallelize our prototype on graphics processing unit and demonstrate the strong scalability of our detection solution required by a sizable organization.

Description
Keywords
Data leak detection, content inspection, algorithm, sampling, alignment, dynamic programming, parallelism
Citation