Data Mining Academic Emails to Model Employee Behaviors and Analyze Organizational Structure
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Email correspondence has become the predominant method of communication for businesses. If not for the inherent privacy concerns, this electronically searchable data could be used to better understand how employees interact. After the Enron dataset was made available, researchers were able to provide great insight into employee behaviors based on the available data despite the many challenges with that dataset. The work in this thesis demonstrates a suite of methods to an appropriately anonymized academic email dataset created from volunteers' email metadata. This new dataset, from an internal email server, is first used to validate feature extraction and machine learning algorithms in order to generate insight into the interactions within the center. Based solely on email metadata, a random forest approach models behavior patterns and predicts employee job titles with