VT Web Archive Project
dc.contributor.author | Rinaldi, Anthony | en |
dc.contributor.author | Mehta, Dev | en |
dc.date.accessioned | 2014-05-09T14:52:20Z | en |
dc.date.available | 2014-05-09T14:52:20Z | en |
dc.date.issued | 2014-05-09 | en |
dc.description | In addition to the report and presentation files, included in this repository is a Heritrix configuration file, 'Heritrix Configuration.xml'. This file contains a customized configuration for crawling the VT.edu domain. Support has been provided through: 1) Virginia Tech's Information Technology organization; 2) Qatar National Research Fund Project No. NPRP 4-029-1-007; 3) NSF IIS - 1319578: Integrated Digital Event Archiving and Library (IDEAL) | en |
dc.description.abstract | VTWebArchive is a project to archive, organize, and make available to the public, historical back-versions of content hosted on vt.edu domains. This system incorporates several open source software packages to design a publicly utilizable tool for searching and discovering historical versions of content hosted on Virginia Tech websites. These tools include Heritrix, a highly customizable spider and crawler, as well as the Apache Tomcat webserver system and the Wayback Machine front-end. | en |
dc.description.sponsorship | Mohamed Magdy, (mmagdy@vt.edu) | en |
dc.description.sponsorship | Tarek Kanan, (tarekk@vt.edu) | en |
dc.description.sponsorship | Virginia Tech's Information Technology organization | en |
dc.description.sponsorship | Qatar National Research Fund Project No. NPRP 4-029-1-007 | en |
dc.description.sponsorship | NSF IIS - 1319578: Integrated Digital Event Archiving and Library (IDEAL) | en |
dc.identifier.uri | http://hdl.handle.net/10919/47935 | en |
dc.language.iso | en_US | en |
dc.rights | Creative Commons CC0 1.0 Universal Public Domain Dedication | en |
dc.rights.uri | http://creativecommons.org/publicdomain/zero/1.0/ | en |
dc.subject | Archive | en |
dc.subject | Internet archive | en |
dc.subject | Heritrix | en |
dc.subject | Wayback | en |
dc.subject | Crawl | en |
dc.subject | Crawler | en |
dc.subject | wayback machine | en |
dc.subject | WARC | en |
dc.subject | Website archive | en |
dc.subject | vt.edu | en |
dc.subject | IDEAL | en |
dc.subject | Qatar | en |
dc.title | VT Web Archive Project | en |
dc.type | Presentation | en |
Files
Original bundle
1 - 5 of 7
- Name:
- VTWebArchiving - Final Report.docx
- Size:
- 433.72 KB
- Format:
- Microsoft Word XML
- Description:
- Project Report (Word)
Loading...
- Name:
- VTWebArchiving - Final Report.pdf
- Size:
- 297.44 KB
- Format:
- Adobe Portable Document Format
- Description:
- Project Report (PDF)
- Name:
- VTWebArchiving - Midterm Presentation.pptx
- Size:
- 158.45 KB
- Format:
- Microsoft Powerpoint XML
- Description:
- Project Presentation: 05MAR2014 (PowerPoint)
Loading...
- Name:
- VTWebArchiving - Midterm Presentation.pdf
- Size:
- 190.21 KB
- Format:
- Adobe Portable Document Format
- Description:
- Project Presentation: 05MAR2014 (PDF)
- Name:
- Heritrix Configuration.xml
- Size:
- 29.66 KB
- Format:
- Extensible Markup Language
- Description:
- Heritrix Configuration File (XML)
License bundle
1 - 1 of 1
- Name:
- license.txt
- Size:
- 1.5 KB
- Format:
- Item-specific license agreed upon to submission
- Description: