BTDImporter

TR Number
Date
2014-05-07
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

The BTD Importer is used to importer Bound Thesis Dissertations to the Electronic Thesis Dissertation Database. The process involves taking a hard copy thesis and scanning it into PDF form. Once in PDF form, the importer script would locate the new PDF and extract its library call number, which is located in the PDF file’s name. Using the call number, the importer script would fetch the metadata of the thesis, such as title and author, by scraping the metadata using AirPAC Classic. The PDF would then be uploaded to the ETD database along with its metadata.

The BTD Importer deliverables listed a new importer script that would take new PDFs and look up their metadata using the Sierra APIs to access Addison directly, then taking that metadata and constructing an XML file containing the data. The script would then move the PDF and the new XML file to a new output file structure, which would later be read, by another section of new the importer process. That final section would then upload the PDF and XML file to VTechWorks. The project would require PHP skills which Nathanael had and SQL skills which Adam and Scott had knowledge of, so our group felt like we could complete the project satisfactorily. The project spec also listed the project as being high impact as our work would be used to import roughly 13,000 BTDs into VTechWorks.

We completed the project by splitting up the work amongst the group and meeting weekly to discuss milestones and our next goals. We decided to stick with using PHP, as that was what the original importer script was written in. The PHP libraries made it very straightforward to construct an XML file and a directory structure.

Description
This is a project to process bound theses PDF's for importing into the VTechWorks system. The process scans a set of given directories and creates XML files containing metadata for each file in a directory structure set up for importation into the VTechWorks system.
Keywords
PHP, XML, Shell Script, DSpace
Citation