Crisis Events One-Class Text Classification
Files
TR Number
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This project aims to design and develop a one-class text classification system tailored to process crisis-related web pages to gain data insights at a high precision. Unlike traditional binary classifiers, our approach addresses the practical challenge of classifying documents when only examples of one class - i.e., the crisis event and related articles are available - and the negative class is undefined or highly variable. One-class classification (OCC) offers a more effective solution for this problem by treating non-crisis content as outliers or anomalies.
The final deliverable will be an integrated web application that allows users to input URLs related to a crisis event. The backend will scrape, clean, and preprocess webpage content using tools such as requests and BeautifulSoup. The core machine learning engine, implemented using both traditional OCC algorithms (One-Class SVM) and advanced deep learning methods (specifically the DOCC method with PyTorch), will evaluate each page for relevance. Results will be presented through a React-based user interface, supported by a FastAPI backend and SQLite database for persistent storage and retrieval.
Our pipeline consists of data collection, preprocessing, model training, evaluation and visualization, all integrated into a web app, developed through end-to-end testing. After finalizing the technology stack and dividing roles, we have currently implemented the first version of our front-end and ML model.
This project not only serves a practical societal need by identifying and surfacing timely crisis information but also deepens our understanding of anomaly detection and full-stack application development in a real-world setting.