VTechWorks staff will be away for the winter holidays starting Tuesday, December 24, 2024, through Wednesday, January 1, 2025, and will not be replying to requests during this time. Thank you for your patience, and happy holidays!
 

CS4624: Environment - Virginia Water Resources Research Center (VWRRC) PDF Documents to VTechWorks

Abstract

Virginia Tech has many groups engaged in work related to the environment. In an effort to alleviate server strain for the Virginia Water Resources Research Center (VWRRC), we have begun to archive over 300 PDF documents into VTechWorks. This will make more than five decades of Virginia Tech’s water research more searchable and accessible than ever before. This permanent archive supports searching and browsing by issue date, author, title, subject, series, and more. It may lead to other efforts in support of the College of Natural Resources and Environment.

Description

This submission describes our efforts in moving over 300 VWRRC PDF documents to VTechWorks. We employed mostly Java code to do this, using the third-party libraries OpenCloud and JSoup for metadata tagging and procurement, respectively. Additionally, PDFBox by Apache was used to pull textual information out of PDF documents dating back to the 1970's.

Keywords

Water, links, vwrrc, pdf conversion, jsoup, opencloud, tag cloud, html parsing, resources

Citation