Privacy-Preserving Scanning of Big Content for Sensitive Data Exposure with MapReduce

dc.contributorVirginia Techen
dc.contributor.authorLiu, Fangen
dc.contributor.authorShu, Xiaokuien
dc.contributor.authorYao, Danfeng (Daphne)en
dc.contributor.authorButt, Ali R.en
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2015-02-06T21:49:29Zen
dc.date.available2015-02-06T21:49:29Zen
dc.date.issued2015-02-06en
dc.description.abstractThe exposure of sensitive data in storage and transmission poses a serious threat to organizational and personal security. Data leak detection aims at scanning content (in storage or transmission) for exposed sensitive data. Because of the large content and data volume, such a screening algorithm needs to be scalable for a timely detection. Our solution uses the MapReduce framework for detecting exposed sensitive content, because it has the ability to arbitrarily scale and utilize public resources for the task, such as Amazon EC2. We design new MapReduce algorithms for computing collection intersection for data leak detection. Our prototype implemented with the Hadoop system achieves 225 Mbps analysis throughput with 24 nodes. Our algorithms support a useful privacy-preserving data transformation. This transformation enables the privacy-preserving technique to minimize the exposure of sensitive data during the detection. This transformation supports the secure outsourcing of the data leak detection to untrusted MapReduce and cloud providers.en
dc.description.sponsorshipThis work has been supported in part by Security and Software Engineering Research Center (S2ERC), a NSF sponsored multi-university Industry/University Cooperative Research Center (I/UCRC), and ARO YIP W911NF-14-1-0535.en
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1145/2699026.2699106en
dc.identifier.urihttp://hdl.handle.net/10919/51271en
dc.language.isoen_USen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectData leak detectionen
dc.subjectMapreduceen
dc.subjectScalabilityen
dc.subjectCollection intersectionen
dc.subjectComputer-Communication Networks: General—security and protectionen
dc.subjectComputer- Communication Networks: Distributed System— distributed applicationsen
dc.titlePrivacy-Preserving Scanning of Big Content for Sensitive Data Exposure with MapReduceen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
hadoop-DLD.pdf
Size:
1017.87 KB
Format:
Adobe Portable Document Format
Description:
Main Article
Loading...
Thumbnail Image
Name:
ACM-CODASPY-OpenAccess_64_Record.pdf
Size:
71.76 KB
Format:
Adobe Portable Document Format
Description:
ACM Open Access Invoice: Privacy-Preserving Scanning of Big Content for Sensitive Data Exposure with MapReduce
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: