Virginia Tech
    • Log in
    View Item 
    •   VTechWorks Home
    • College of Engineering (COE)
    • Department of Computer Science
    • Digital Library Research Laboratory
    • Reports, Digital Library Research Laboratory
    • View Item
    •   VTechWorks Home
    • College of Engineering (COE)
    • Department of Computer Science
    • Digital Library Research Laboratory
    • Reports, Digital Library Research Laboratory
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    ArchiveSpark - MS Independent Study Final Submission

    Thumbnail
    View/Open
    ArchiveSpark.zip (2.223Mb)
    Downloads: 22
    ArchiveSpark_Demo.ipynb (20.85Kb)
    Downloads: 27
    ArchiveSpark-FINAL.pdf (920.3Kb)
    Downloads: 373
    ArchiveSpark-FINAL.docx (917.5Kb)
    Downloads: 622
    ArchiveSpark.pptx (669.7Kb)
    Downloads: 63
    ArchiveSpark.pdf (630.5Kb)
    Downloads: 260
    Date
    2016-12-13
    Author
    Galad, Andrej
    Metadata
    Show full item record
    Abstract
    This project expands upon the work at the Internet Archive of researcher Vinay Goel and of Jefferson Bailey (co-PI on two NSF-funded collaborative projects with Virginia Tech: IDEAL, GETAR) on the ArchiveSpark project - a framework for efficient Web archive access, extraction, and derivation. The main goal of the project is to quantitatively and qualitatively evaluate ArchiveSpark against mainstream Web archive processing solutions and extend it as necessary with regard to the processing of testing collections. This also relates to an IMLS funded project. This report describes the efforts and contributions made as part of this project. The primary focus of these efforts lies in the comprehensive evaluation of ArchiveSpark against existing archive-processing solutions (pure Apache Spark with pre-installed Warcbase tools and HBase) in a variety of environments and setups in order to comparatively analyze performance improvements that ArchiveSpark brings to the table as well as understand the shortcomings and tradeoffs of its usage under varying scenarios.
    URI
    http://hdl.handle.net/10919/77457
    Collections
    • Reports, Digital Library Research Laboratory [27]

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us
     

     

    VTechWorks

    AboutPoliciesHelp

    Browse

    All of VTechWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Log inRegister

    Statistics

    View Usage Statistics

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us