Visualizing Categorical Time Series Data with Applications to Computer and Communications Network Traces

Files

etd.pdf (19.95 MB)
Downloads: 400

TR Number

Date

1997-04-04

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Visualization tools allow scientists to comprehend very large data sets and to discover relationships which are otherwise difficult to detect. Unfortunately, not all types of data can be visualized easily using existing tools. In particular, long sequences of nonnumeric data cannot be visualized adequately. Examples of this type of data include trace files of computer performance information, the nucleotides in a genetic sequence, a record of stocks traded over a period of years, and the sequence of words in this document. The term categorical time series is defined and used to describe this family of data.

When visualizations designed for numerical time series are applied to categorical time series, the distortions which result from the arbitrary conversion of unordered categorical values to totally ordered numerical values can be profound. Examples of this phenomenon are presented and explained.

Several new, general purpose techniques for visualizing categorical time series data have been developed as part of this work and have been incorporated into the Chitra perfor- mance analysis and visualization system. All of these new visualizations can be produced in O(n) time. The new visualizations for categorical time series provide general purpose techniques for visualizing aspects of categorical data which are commonly of interest. These include periodicity, stationarity, cross-correlation, autocorrelation, and the detection of recurring patterns.

The effective use of these visualizations is demonstrated in a number of application domains, including performance analysis, World Wide Web traffic analysis, network routing simulations, document comparison, pattern detection, and the analysis of the performance of genetic algorithms.

Description

Keywords

visualization, categorical data, time series, data mining, performance analysis, information visualization

Citation