Nearline Web Archiving
dc.contributor.author | Xie, Zhiwu | en |
dc.contributor.author | Nayyar, Krati | en |
dc.contributor.author | Fox, Edward A. | en |
dc.date.accessioned | 2016-06-28T20:46:06Z | en |
dc.date.available | 2016-06-28T20:46:06Z | en |
dc.date.issued | 2016-06-23 | en |
dc.description.abstract | In this paper, we propose a modified approach to realĀtime transactional web archiving. It leverages the web caching infrastructure that is already prevalent on web servers. Instead of archiving web content at HTTP transaction time, in our approach the archiving happens when the cached copy expires and is about to be expunged. Before the deletion, all expired cache copies are combined and then sent to the web archive in small batches. Since the cache is purged at much lower frequency than HTTP transactions, the archival workload is also much lower than that for transactional archiving. To further decrease the processing load at the origin server, archival copy deduplication is carried out at the archive instead of at the origin server. It is crucial to note that the cache purging process is separate from those that serve the HTTP requests. It can be, and usually is set to lower priority. The archiving therefore occurs only when the server is not busy fulfilling its more mission critical tasks; this is much less disruptive to the origin server. This approach, however, does not guarantee that the freshest copy is archived, although the cache purging policy may be adjusted to attempt to bound the freshness of the archive. | en |
dc.identifier.uri | http://hdl.handle.net/10919/71648 | en |
dc.language.iso | en_US | en |
dc.relation.ispartof | 3rd International Workshop on Web Archiving and Digital Libraries (WADL2016) | en |
dc.rights | Creative Commons Attribution-ShareAlike 3.0 United States | en |
dc.rights.uri | http://creativecommons.org/licenses/by-sa/3.0/us/ | en |
dc.subject | Web archiving | en |
dc.subject | Nearline web archiving | en |
dc.subject | Apache web server | en |
dc.subject | Web cache | en |
dc.title | Nearline Web Archiving | en |
dc.type | Article | en |
Files
Original bundle
1 - 4 of 4
Loading...
- Name:
- 2016-WADL-nearline.pdf
- Size:
- 91.48 KB
- Format:
- Adobe Portable Document Format
- Description:
- Submitted version
Loading...
- Name:
- 2016-WADL-nearline-slides.pdf
- Size:
- 780.71 KB
- Format:
- Adobe Portable Document Format
- Description:
- Slides for presentation at WADL 2016
Loading...
- Name:
- ApachewithWARC.mp4
- Size:
- 36.92 MB
- Format:
- MP4 Container format for video files
- Description:
- Demonstration video
License bundle
1 - 1 of 1
- Name:
- license.txt
- Size:
- 1.5 KB
- Format:
- Item-specific license agreed upon to submission
- Description: