CS4624: Environment - Virginia Water Resources Research Center (VWRRC) PDF Documents to VTechWorks
dc.contributor.author | Katz, Ben | en |
dc.contributor.author | Hotinger, Eric | en |
dc.date.accessioned | 2013-07-01T00:46:31Z | en |
dc.date.available | 2013-07-01T00:46:31Z | en |
dc.date.issued | 2013-06-30 | en |
dc.description | This submission describes our efforts in moving over 300 VWRRC PDF documents to VTechWorks. We employed mostly Java code to do this, using the third-party libraries OpenCloud and JSoup for metadata tagging and procurement, respectively. Additionally, PDFBox by Apache was used to pull textual information out of PDF documents dating back to the 1970's. | en |
dc.description.abstract | Virginia Tech has many groups engaged in work related to the environment. In an effort to alleviate server strain for the Virginia Water Resources Research Center (VWRRC), we have begun to archive over 300 PDF documents into VTechWorks. This will make more than five decades of Virginia Tech’s water research more searchable and accessible than ever before. This permanent archive supports searching and browsing by issue date, author, title, subject, series, and more. It may lead to other efforts in support of the College of Natural Resources and Environment. | en |
dc.identifier.uri | http://hdl.handle.net/10919/23285 | en |
dc.language.iso | en_US | en |
dc.rights | Creative Commons CC0 1.0 Universal Public Domain Dedication | en |
dc.rights.uri | http://creativecommons.org/publicdomain/zero/1.0/ | en |
dc.subject | Water | en |
dc.subject | links | en |
dc.subject | vwrrc | en |
dc.subject | pdf conversion | en |
dc.subject | jsoup | en |
dc.subject | opencloud | en |
dc.subject | tag cloud | en |
dc.subject | html parsing | en |
dc.subject | resources | en |
dc.title | CS4624: Environment - Virginia Water Resources Research Center (VWRRC) PDF Documents to VTechWorks | en |
dc.type | Technical report | en |
Files
Original bundle
1 - 5 of 8
- Name:
- vwrrc-parser-source.tar.gz
- Size:
- 8.47 KB
- Format:
- Unknown data format
- Description:
- VWRRC Data Parser Source Code - contains Java source files that, when compiled, are capable of parsing all the different types of reports listed below.
- Name:
- specialreports.txt
- Size:
- 15.39 KB
- Format:
- Plain Text
- Description:
- A year-by-year list of all special items retrieved. These are typically educational reports about water.
- Name:
- metadata.txt
- Size:
- 149.72 KB
- Format:
- Plain Text
- Description:
- All metadata generated for all the PDFs on the VWRRC website. This involves tags, titles, and authors for each document.
- Name:
- bulletins.txt
- Size:
- 50.15 KB
- Format:
- Plain Text
- Description:
- A year-by-year list of all bulletins retrieved from the VWRRC website. Bulletins are typically news announcements concerning the water industry.
- Name:
- CS4624S13P.docx
- Size:
- 3.5 MB
- Format:
- Microsoft Word XML
- Description:
- Final Report (in Word's docx format): this describes our work on the project for VWRRC.
License bundle
1 - 1 of 1
- Name:
- license.txt
- Size:
- 1.5 KB
- Format:
- Item-specific license agreed upon to submission
- Description: