A Deterministic Approach to Partitioning Neural Network Training Data for the Classification Problem

TR Number

Date

2006-08-07

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

The classification problem in discriminant analysis involves identifying a function that accurately classifies observations as originating from one of two or more mutually exclusive groups. Because no single classification technique works best for all problems, many different techniques have been developed. For business applications, neural networks have become the most commonly used classification technique and though they often outperform traditional statistical classification methods, their performance may be hindered because of failings in the use of training data. This problem can be exacerbated because of small data set size.

In this dissertation, we identify and discuss a number of potential problems with typical random partitioning of neural network training data for the classification problem and introduce deterministic methods to partitioning that overcome these obstacles and improve classification accuracy on new validation data. A traditional statistical distance measure enables this deterministic partitioning. Heuristics for both the two-group classification problem and k-group classification problem are presented. We show that these heuristics result in generalizable neural network models that produce more accurate classification results, on average, than several commonly used classification techniques.

In addition, we compare several two-group simulated and real-world data sets with respect to the interior and boundary positions of observations within their groups' convex polyhedrons. We show by example that projecting the interior points of simulated data to the boundary of their group polyhedrons generates convex shapes similar to real-world data group convex polyhedrons. Our two-group deterministic partitioning heuristic is then applied to the repositioned simulated data, producing results superior to several commonly used classification techniques.

Description

Keywords

convex sets, discriminant analysis, Neural networks, data partitioning

Citation