VT Web Archive Project

VTWebArchive is a project to archive, organize, and make available to the public, historical back-versions of content hosted on vt.edu domains. This system incorporates several open source software packages to design a publicly utilizable tool for searching and discovering historical versions of content hosted on Virginia Tech websites. These tools include Heritrix, a highly customizable spider and crawler, as well as the Apache Tomcat webserver system and the Wayback Machine front-end.

Description

In addition to the report and presentation files, included in this repository is a Heritrix configuration file, 'Heritrix Configuration.xml'. This file contains a customized configuration for crawling the VT.edu domain. Support has been provided through: 1) Virginia Tech's Information Technology organization; 2) Qatar National Research Fund Project No. NPRP 4-029-1-007; 3) NSF IIS - 1319578: Integrated Digital Event Archiving and Library (IDEAL)

Keywords

Archive, Internet archive, Heritrix, Wayback, Crawl, Crawler, wayback machine, WARC, Website archive, vt.edu, IDEAL, Qatar

Persistent link

http://hdl.handle.net/10919/47935

Collections

CS4624: Multimedia, Hypertext, and Information Access

Full item page

VT Web Archive Project

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections