Show simple item record

dc.contributor.authorGalad, Andrej
dc.date.accessioned2017-04-23T19:35:57Z
dc.date.available2017-04-23T19:35:57Z
dc.date.issued2016-12-13
dc.identifier.urihttp://hdl.handle.net/10919/77457
dc.description.abstractThis project expands upon the work at the Internet Archive of researcher Vinay Goel and of Jefferson Bailey (co-PI on two NSF-funded collaborative projects with Virginia Tech: IDEAL, GETAR) on the ArchiveSpark project - a framework for efficient Web archive access, extraction, and derivation. The main goal of the project is to quantitatively and qualitatively evaluate ArchiveSpark against mainstream Web archive processing solutions and extend it as necessary with regard to the processing of testing collections. This also relates to an IMLS funded project. This report describes the efforts and contributions made as part of this project. The primary focus of these efforts lies in the comprehensive evaluation of ArchiveSpark against existing archive-processing solutions (pure Apache Spark with pre-installed Warcbase tools and HBase) in a variety of environments and setups in order to comparatively analyze performance improvements that ArchiveSpark brings to the table as well as understand the shortcomings and tradeoffs of its usage under varying scenarios.en_US
dc.description.sponsorshipIMLS LG-71-16-0037-16: Developing Library Cyberinfrastructure Strategy for Big Data Sharing and Reuseen_US
dc.description.sponsorshipNSF IIS-1619028, III: Small: Collaborative Research: Global Event and Trend Archive Research (GETAR)en_US
dc.description.sponsorshipNSF IIS - 1319578: III: Small: Integrated Digital Event Archiving and Library (IDEAL)en_US
dc.language.isoen_USen_US
dc.publisherVirginia Techen_US
dc.subjectArchiveSparken_US
dc.subjectBig dataen_US
dc.subjectIDEALen_US
dc.subjectGETARen_US
dc.subjectHBaseen_US
dc.subjectWARCen_US
dc.subjectInternet Archiveen_US
dc.subjectSparken_US
dc.subjectWeb Archivingen_US
dc.subjectILMSen_US
dc.subjectCDXen_US
dc.titleArchiveSpark - MS Independent Study Final Submissionen_US
dc.typePresentationen_US
dc.typeReporten_US
dc.typeSoftwareen_US
dc.description.notesIncluded are the final report (PDF + Word), the final presentation (PPTX + PDF), the ArchiveSpark demo in the form of Jupyter Notebook, and the software developed during this project.en_US


Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record