CS4624: Environment - Virginia Water Resources Research Center (VWRRC) PDF Documents to VTechWorks

dc.contributor.authorKatz, Benen
dc.contributor.authorHotinger, Ericen
dc.date.accessioned2013-07-01T00:46:31Zen
dc.date.available2013-07-01T00:46:31Zen
dc.date.issued2013-06-30en
dc.descriptionThis submission describes our efforts in moving over 300 VWRRC PDF documents to VTechWorks. We employed mostly Java code to do this, using the third-party libraries OpenCloud and JSoup for metadata tagging and procurement, respectively. Additionally, PDFBox by Apache was used to pull textual information out of PDF documents dating back to the 1970's.en
dc.description.abstractVirginia Tech has many groups engaged in work related to the environment. In an effort to alleviate server strain for the Virginia Water Resources Research Center (VWRRC), we have begun to archive over 300 PDF documents into VTechWorks. This will make more than five decades of Virginia Tech’s water research more searchable and accessible than ever before. This permanent archive supports searching and browsing by issue date, author, title, subject, series, and more. It may lead to other efforts in support of the College of Natural Resources and Environment.en
dc.identifier.urihttp://hdl.handle.net/10919/23285en
dc.language.isoen_USen
dc.rightsCreative Commons CC0 1.0 Universal Public Domain Dedicationen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/en
dc.subjectWateren
dc.subjectlinksen
dc.subjectvwrrcen
dc.subjectpdf conversionen
dc.subjectjsoupen
dc.subjectopenclouden
dc.subjecttag clouden
dc.subjecthtml parsingen
dc.subjectresourcesen
dc.titleCS4624: Environment - Virginia Water Resources Research Center (VWRRC) PDF Documents to VTechWorksen
dc.typeTechnical reporten

Files

Original bundle
Now showing 1 - 5 of 8
Name:
vwrrc-parser-source.tar.gz
Size:
8.47 KB
Format:
Unknown data format
Description:
VWRRC Data Parser Source Code - contains Java source files that, when compiled, are capable of parsing all the different types of reports listed below.
Name:
specialreports.txt
Size:
15.39 KB
Format:
Plain Text
Description:
A year-by-year list of all special items retrieved. These are typically educational reports about water.
Name:
metadata.txt
Size:
149.72 KB
Format:
Plain Text
Description:
All metadata generated for all the PDFs on the VWRRC website. This involves tags, titles, and authors for each document.
Name:
bulletins.txt
Size:
50.15 KB
Format:
Plain Text
Description:
A year-by-year list of all bulletins retrieved from the VWRRC website. Bulletins are typically news announcements concerning the water industry.
Name:
CS4624S13P.docx
Size:
3.5 MB
Format:
Microsoft Word XML
Description:
Final Report (in Word's docx format): this describes our work on the project for VWRRC.
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: