Performance Evaluation of Web Archiving Through In-Memory Page Cache

TR Number
Date
2017-06-23
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Tech
Abstract

This study proposes and evaluates a new method for Web archiving. We leverage the caching infrastructure in Web servers for archiving. Redis is used as the page cache and its persistence mechanism is exploited for archiving. We experimentally evaluate the performance of our archival technique using the Greek version of Wikipedia deployed on Amazon cloud infrastructure. We show that there is a slight increase in latencies of the rendered pages due to archiving. Though the server performance is comparable at larger page cache sizes, the maximum throughput the server can handle decreases significantly at lower cache sizes due to more disk write operations as a result of archiving. Since pages are dynamically rendered and the technology stack of Wikipedia is extensively used in a number of Web applications, our results should have broad impact.

Description
Keywords
Information Retrieval, Transactional Web Archiving, Caching, Benchmarking, Wikipedia
Citation
Collections