Topic Modeling Toolkit
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The Topic Modeling Toolkit project began with an existing text mining toolkit and aimed to enhance its functionality by incorporating cutting-edge topic modeling techniques. Specifically, BERTopic, CTM, and LDA were used to extract pertinent topics from a corpus of text documents. The resulting web-based platform provides users with a search engine, a recommendation system, and a usable interface for browsing and exploring these topics. In addition to these enhancements, our team also implemented a text-filtering framework and redesigned the user interface using Tailwind CSS. The final deliverables of the project include a fully functional website, user documentation, and an open-source toolkit that can be used to train machine learning models and support browsing and searching for various text datasets. While the current version of the toolkit includes BERTopic, CTM, and LDA, there is potential for future work to incorporate additional topic modeling methods. It is important to note that while the project originally focused on electronic theses and dissertations (ETDs), the resulting platform can be used to explore and comprehend complex subjects within any corpus of text documents. The topic modeling toolkit is available as an open-source package that users can install and use on their own computers. It is available for use and can be used to support browsing and searching for various text datasets. The intended user group for the platform includes researchers, students, and other users interested in exploring and understanding complex topics within a given corpus of text documents. The resulting topic modeling toolkit offers features that facilitate the exploration and comprehension of intricate topics within text document collections. This tool has the potential to aid researchers, students, and other users in their respective fields.