Segmentation Algorithm
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
We developed a dynamic temporal segmentation algorithm that wraps around topic modeling algorithms for the purpose of identifying change points where significant shifts in topics occur. The main task of the segmentation algorithm is to automatically partition the total time period defined by the documents in the collection such that segment boundaries indicate important periods of temporal evolution and re-organization. The algorithm moves across the data by time and evaluates two adjacent windows, assuming a given segmentation granularity (e.g., discrete days, weeks, or months). This granularity varies from one application to another and is decided by domain experts. We evaluate adjacent windows by comparing their underlying topic distributions and quantifying common terms and their probabilities. We chose to quantify common terms based on the overlap between them. The overlap can be captured using a contingency table.