Rapid Screening of Transformed Data Leaks with Efficient Algorithms and Parallel Computing

TR Number



Journal Title

Journal ISSN

Volume Title




The leak of sensitive data on computer systems poses a serious threat to organizational security. Organizations need to identify the exposure of sensitive data by screening the content in storage and transmission, i.e., to detect sensitive information being stored or transmitted in the clear. However, detecting the exposure of sensitive information is challenging due to data transformation in the content. Transformations (such as insertion, deletion) result in highly unpredictable leak patterns. Existing automata-based string matching algorithms are impractical for detecting transformed data leaks, because of its formidable complexity when modeling the required regular expressions. We design two new algorithms for detecting long and transformed data leaks. Our system achieves high detection accuracy in recognizing transformed leaks compared to the state-of-the-art inspection methods. We parallelize our prototype on graphics processing unit and demonstrate the strong scalability of our detection solution required by a sizable organization.



Data leak detection, content inspection, algorithm, sampling, alignment, dynamic programming, parallelism