VT Web Archive Project

Rinaldi, AnthonyMehta, Dev2014-05-092014-05-092014-05-09http://hdl.handle.net/10919/47935In addition to the report and presentation files, included in this repository is a Heritrix configuration file, 'Heritrix Configuration.xml'. This file contains a customized configuration for crawling the VT.edu domain. Support has been provided through: 1) Virginia Tech's Information Technology organization; 2) Qatar National Research Fund Project No. NPRP 4-029-1-007; 3) NSF IIS - 1319578: Integrated Digital Event Archiving and Library (IDEAL)VTWebArchive is a project to archive, organize, and make available to the public, historical back-versions of content hosted on vt.edu domains. This system incorporates several open source software packages to design a publicly utilizable tool for searching and discovering historical versions of content hosted on Virginia Tech websites. These tools include Heritrix, a highly customizable spider and crawler, as well as the Apache Tomcat webserver system and the Wayback Machine front-end.en-USCreative Commons CC0 1.0 Universal Public Domain DedicationArchiveInternet archiveHeritrixWaybackCrawlCrawlerwayback machineWARCWebsite archivevt.eduIDEALQatarVT Web Archive ProjectPresentation