Performance Measurement and Analysis of Transactional Web Archiving

dc.contributor.authorMaharshi, Shivamen
dc.contributor.committeechairFox, Edward A.en
dc.contributor.committeememberXie, Zhiwuen
dc.contributor.committeememberLee, Dongyoonen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2017-07-20T08:00:29Zen
dc.date.available2017-07-20T08:00:29Zen
dc.date.issued2017-07-19en
dc.description.abstractWeb archiving is necessary to retain the history of the World Wide Web and to study its evolution. It is important for the cultural heritage community. Some organizations are legally obligated to capture and archive Web content. The advent of transactional Web archiving makes the archiving process more efficient, thereby aiding organizations to archive their Web content. This study measures and analyzes the performance of transactional Web archiving systems. To conduct a detailed analysis, we construct a meaningful design space defined by the system specifications that determine the performance of these systems. SiteStory, a state-of-the-art transactional Web archiving system, and local archiving, an alternative archiving technique, are used in this research. We experimentally evaluate the performance of these systems using the Greek version of Wikipedia deployed on dedicated hardware on a private network. Our benchmarking results show that the local archiving technique uses a Web server’s resources more efficiently than SiteStory for one data point in our design space. Better performance than SiteStory in such scenarios makes our archiving solution favorable to use for transactional archiving. We also show that SiteStory does not impose any significant performance overhead on the Web server for the rest of the data points in our design space.en
dc.description.abstractgeneralWeb archiving is the process of preserving the information available on the World Wide Web into archives. This process provides historians and cultural heritage scholars access to the data that allows them to understand the evolution of the Internet and its usage. Additionally, Web archiving is also essential for some organizations that are obligated to keep the records of online resource access for their customers. Transactional Web archiving is an archiving technique where the information available on the Web is archived by capturing a transaction between a user and the Web server processing the user’s request. Transactional Web archiving provides a more complete and accurate history of a Web server than the traditional Web archiving models. However, in some scenarios the transactional Web archiving solutions may impose performance issues for the Web server being archived. In this thesis, we conduct a detailed performance analysis of SiteStory, a state-of-the-art transactional Web archiving solution, in various experimental settings. Furthermore, we propose a novel transactional Web archiving approach and compare its performance with SiteStory. To conduct a realistic study, we analyze real-life traffic on Greek Wikipedia website and generate similar traffic to perform our benchmarking experiments. Our benchmarking results show that our archiving technique uses a Web server’s resources more efficiently than SiteStory in some scenarios. Better performance than SiteStory in such scenarios makes our archiving solution favorable to use for transactional archiving. We also show that SiteStory does not impose any significant performance overhead on the Web server in other scenarios.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:10593en
dc.identifier.urihttp://hdl.handle.net/10919/78371en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectWeb Archivingen
dc.subjectDigital Preservationen
dc.subjectPerformance Benchmarken
dc.titlePerformance Measurement and Analysis of Transactional Web Archivingen
dc.typeThesisen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Maharshi_S_T_2017.pdf
Size:
10.19 MB
Format:
Adobe Portable Document Format
Name:
Maharshi_S_T_2017_support_2.zip
Size:
40.56 MB
Format:
Description:
Supporting documents

Collections