MetadataShow full item record
The Conversation Facts project is a part of Dr. Fox's CS 4624: Multimedia, Hypertext, and Information Access class; it was proposed by Saurabh Chakravarty as a way to help his research in natural language processing. The goal of the Conversation Facts project is to be able to take a summary of a conversation and link it back to where it occurs in the conversation dialogue. We used the Argumentative Dialogue Summary Corpus: Version 1 from Natural Language and Dialog Systems as our dataset for this project. This project was created in Python due to its natural language processing libraries which include spaCy and the Natural Language Toolkit (NLTK) libraries. These two contained the methods and techniques used in the project to parse the data and process it into the parts of speech for us to work with. Our general method of approach for this project was to create knowledge graphs of the summaries and the conversation dialogues. This way, we could connect the two based on the entity-relation-entity (ERE) triples. We can then compare the summary triple which would point us back to a corresponding conversation triple. This will link back to the section in the dialogue text that the summary is referencing. Upon completion of the project, we have found that our methods outperform naïve implementations of simply running our data through industry standard software, but there are still many things that could be improved to get better results. Our program focuses on utilizing natural language processing techniques, but we believe that machine learning could be applied to the data set in order to increase accuracy. The report explains the requirements set for the team to accomplish, the overall design of the project, the implementation of said design, and evaluation of results. It also includes a User’s Manual and Developer’s Manual to help illustrate how to either run the source code or continue development on the project. Finally, we describe the lessons learned throughout completing the project and list the resources used.