Interactive Text Classification & Evaluation

Abstract

Text classification is a critical task in natural language processing that assigns predefined categories or labels to text documents. It has become even more important than ever with the rapid growth in the sheer number of text documents with the introduction of social media. It is highly practical to have a machine classify these documents rather than a human manually identifying the contents of a document. Starting in 2007, a project from the Google Summer of Code program released a free Python machine-learning library that featured many classification, regression, and clustering tools. This project will be based on this library to perform the necessary text classification from generating a model to outputting a prediction given text.

The goal of this project is to create an interactive text classifier with the web application, user, and developer manuals as deliverables. Our team will work closely with a client to ensure that our application is on track and fits their needs. The main objective is to develop a web application that allows the user to interact with a machine-learning text classification model by tracking its correctness based on the principle of supervised machine learning. The UI should display keywords that were used to classify the text and highlight them to the user. The interactive portion of the application comes from the fact that the user will be able to classify the text themselves, mark down whether the highlighted text is right or wrong and save the document for future reference.

This project is the first of its kind this Spring 2023 semester and no previous groups in other semesters have done a project like this. Our group will have to start from scratch, and use tools and technologies that are unfamiliar to us but have the willingness to learn them. Our approach in building this application is to use a similar stack to MERN but instead of Expressjs, we opt to use Flask as our back-end server to handle our scikit-learn machine learning script. We use Reactjs as the front-end framework, and Nodejs to run the application and use other features. Lastly, we use MongoDB as our database to store documents, classifications, and other important attributes. We hope that our project will provide valuable insight into the effectiveness and power of machine learning and allow those who wish to continue our project to be able to with ease through reading our user and developer manuals in this report.

Description

Keywords

Text, AI, Artificial Intelligence, ML, Machine Learning, Classification

Citation