Knowledge Graph Building

dc.contributor.authorHao, Qianxiangen
dc.contributor.authorXing, Haoranen
dc.date.accessioned2024-05-09T21:07:58Zen
dc.date.available2024-05-09T21:07:58Zen
dc.date.issued2024-05-09en
dc.descriptionKGBuildingReport.pdf: PDF version of the presentation for the KG Building project. KGBuildingReport.docx: Microsoft Word version of the final report for the KG Building project. KGBuildingPresentation.pdf: PDF version of the presentation for the KG Building project. KGBuildingPresentation.pptx: Microsoft PowerPoint version of the presentation for the KG Building project.en
dc.description.abstractOur team’s main objective was to expand the Virtuoso database by integrating a comprehensive dataset of 500,000 enriched Electronic Theses and Dissertations (ETDs). We built upon the preliminary framework of 200 XML records used for initial testing. This database expansion would enable the developers to deploy more robust testing and analysis of the current Knowledge Graph database. Additionally, our team focused on standardizing the data expansion process, ensuring that future developers have a consistent and reliable foundation for their work. The current Knowledge Graph was established with the Virtuoso graph database system. We primarily worked on four steps to expand the KG database, including inserting Object IDs into each element in XML files, converting XML files to RDF triples, uploading RDF triples to the Virtuoso database, and URI resolution. We leveraged the power of Python, along with its robust libraries (rdflib, sparqlwarpper, requests, xmltodict, Node.js, NPM, tkinter) and tools (REST API, Docker) to execute these steps. Initially, our team successfully tested the data expansion process on a local Virtuoso instance to ensure the functionality and correctness of the expanding procedure. We prepared to deploy the process on the Virtuoso database within the Endeavour cluster upon confirmation. Although we successfully expanded the database by 333 ETDs, we were unable to reach our target of 500,000 ETDs due to a shortage of XML data. This limitation made us refocus our efforts on refining the data expansion process for better standardization and future scalability. We streamlined the data expansion process by integrating the Object ID insertion, data conversion, and data uploading processes into a single GUI application, creating a more straightforward and compact workflow. This visual interface would enhance usability for future developers and teams.en
dc.identifier.urihttps://hdl.handle.net/10919/118938en
dc.language.isoenen
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en
dc.subjectRDF triplesen
dc.subjectVirtuosoen
dc.subjectETDsen
dc.subjectElectronic Theses and Dissertationsen
dc.subjectXMLen
dc.subjectURI resolutionen
dc.titleKnowledge Graph Buildingen
dc.typeReporten
dc.typePresentationen

Files

Original bundle
Now showing 1 - 4 of 4
Name:
KGBuildingReport.docx
Size:
9.46 MB
Format:
Microsoft Word XML
Loading...
Thumbnail Image
Name:
KGBuildingReport.pdf
Size:
3.89 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
KGBuildingPresentation.pdf
Size:
4.1 MB
Format:
Adobe Portable Document Format
Name:
KGBuildingPresentation.pptx
Size:
4.54 MB
Format:
Microsoft Powerpoint XML
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: