Our team’s main objective was to expand the Virtuoso database by integrating a comprehensive dataset of 500,000 enriched Electronic Theses and Dissertations (ETDs). We built upon the preliminary framework of 200 XML records used for initial testing. This database expansion would enable the developers to deploy more robust testing and analysis of the current Knowledge Graph database. Additionally, our team focused on standardizing the data expansion process, ensuring that future developers have a consistent and reliable foundation for their work. The current Knowledge Graph was established with the Virtuoso graph database system. We primarily worked on four steps to expand the KG database, including inserting Object IDs into each element in XML files, converting XML files to RDF triples, uploading RDF triples to the Virtuoso database, and URI resolution. We leveraged the power of Python, along with its robust libraries (rdflib, sparqlwarpper, requests, xmltodict, Node.js, NPM, tkinter) and tools (REST API, Docker) to execute these steps. Initially, our team successfully tested the data expansion process on a local Virtuoso instance to ensure the functionality and correctness of the expanding procedure. We prepared to deploy the process on the Virtuoso database within the Endeavour cluster upon confirmation. Although we successfully expanded the database by 333 ETDs, we were unable to reach our target of 500,000 ETDs due to a shortage of XML data. This limitation made us refocus our efforts on refining the data expansion process for better standardization and future scalability. We streamlined the data expansion process by integrating the Object ID insertion, data conversion, and data uploading processes into a single GUI application, creating a more straightforward and compact workflow. This visual interface would enhance usability for future developers and teams.


RDF triples, Virtuoso, ETDs, Electronic Theses and Dissertations, XML, URI resolution