Performance Measurement and Analysis of Transactional Web Archiving

dc.contributor.authorMaharshi, Shivamen
dc.contributor.committeechairFox, Edward A.en
dc.contributor.committeememberXie, Zhiwuen
dc.contributor.committeememberLee, Dongyoonen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2017-07-20T08:00:29Zen
dc.date.available2017-07-20T08:00:29Zen
dc.date.issued2017-07-19en
dc.description.abstractWeb archiving is necessary to retain the history of the World Wide Web and to study its evolution. It is important for the cultural heritage community. Some organizations are legally obligated to capture and archive Web content. The advent of transactional Web archiving makes the archiving process more efficient, thereby aiding organizations to archive their Web content. This study measures and analyzes the performance of transactional Web archiving systems. To conduct a detailed analysis, we construct a meaningful design space defined by the system specifications that determine the performance of these systems. SiteStory, a state-of-the-art transactional Web archiving system, and local archiving, an alternative archiving technique, are used in this research. We experimentally evaluate the performance of these systems using the Greek version of Wikipedia deployed on dedicated hardware on a private network. Our benchmarking results show that the local archiving technique uses a Web server’s resources more efficiently than SiteStory for one data point in our design space. Better performance than SiteStory in such scenarios makes our archiving solution favorable to use for transactional archiving. We also show that SiteStory does not impose any significant performance overhead on the Web server for the rest of the data points in our design space.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:10593en
dc.identifier.urihttp://hdl.handle.net/10919/78371en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectWeb Archivingen
dc.subjectDigital Preservationen
dc.subjectPerformance Benchmarken
dc.titlePerformance Measurement and Analysis of Transactional Web Archivingen
dc.typeThesisen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Maharshi_S_T_2017.pdf
Size:
10.19 MB
Format:
Adobe Portable Document Format
Name:
Maharshi_S_T_2017_support_2.zip
Size:
40.56 MB
Format:
Description:
Supporting documents

Collections