VTechWorks is currently accessible only on the VT network (campus, VPN). Elements deposit is now enabled. We are working to restore full access as soon as possible.
 

VT Web Archive Project

Abstract

VTWebArchive is a project to archive, organize, and make available to the public, historical back-versions of content hosted on vt.edu domains. This system incorporates several open source software packages to design a publicly utilizable tool for searching and discovering historical versions of content hosted on Virginia Tech websites. These tools include Heritrix, a highly customizable spider and crawler, as well as the Apache Tomcat webserver system and the Wayback Machine front-end.

Description

In addition to the report and presentation files, included in this repository is a Heritrix configuration file, 'Heritrix Configuration.xml'. This file contains a customized configuration for crawling the VT.edu domain. Support has been provided through: 1) Virginia Tech's Information Technology organization; 2) Qatar National Research Fund Project No. NPRP 4-029-1-007; 3) NSF IIS - 1319578: Integrated Digital Event Archiving and Library (IDEAL)

Keywords

Archive, Internet archive, Heritrix, Wayback, Crawl, Crawler, wayback machine, WARC, Website archive, vt.edu, IDEAL, Qatar

Citation