Performance Measurement and Analysis of Transactional Web Archiving

TR Number
Date
2017-07-19
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Tech
Abstract

Web archiving is necessary to retain the history of the World Wide Web and to study its evolution. It is important for the cultural heritage community. Some organizations are legally obligated to capture and archive Web content. The advent of transactional Web archiving makes the archiving process more efficient, thereby aiding organizations to archive their Web content.

This study measures and analyzes the performance of transactional Web archiving systems. To conduct a detailed analysis, we construct a meaningful design space defined by the system specifications that determine the performance of these systems. SiteStory, a state-of-the-art transactional Web archiving system, and local archiving, an alternative archiving technique, are used in this research. We experimentally evaluate the performance of these systems using the Greek version of Wikipedia deployed on dedicated hardware on a private network. Our benchmarking results show that the local archiving technique uses a Web server’s resources more efficiently than SiteStory for one data point in our design space. Better performance than SiteStory in such scenarios makes our archiving solution favorable to use for transactional archiving. We also show that SiteStory does not impose any significant performance overhead on the Web server for the rest of the data points in our design space.

Description
Keywords
Web Archiving, Digital Preservation, Performance Benchmark
Citation
Collections