Increasing the Precision of Forest Area Estimates through Improved Sampling for Nearest Neighbor Satellite Image Classification
Blinn, Christine Elizabeth
MetadataShow full item record
The impacts of training data sample size and sampling method on the accuracy of forest/nonforest classifications of three mosaicked Landsat ETM+ images with the nearest neighbor decision rule were explored. Large training data pools of single pixels were used in simulations to create samples with three sampling methods (random, stratified random, and systematic) and eight sample sizes (25, 50, 75, 100, 200, 300, 400, and 500). Two forest area estimation techniques were used to estimate the proportion of forest in each image and to calculate forest area precision estimates. Training data editing was explored to remove problem pixels from the training data pools. All possible band combinations of the six non-thermal ETM+ bands were evaluated for every sample draw. Comparisons were made between classification accuracies to determine if all six bands were needed. The utility of separability indices, minimum and average Euclidian distances, and cross-validation accuracies for the selection of band combinations, prediction of classification accuracies, and assessment of sample quality were determined. Larger training data sample sizes produced classifications with higher average accuracies and lower variability. All three sampling methods had similar performance. Training data editing improved the average classification accuracies by a minimum of 5.45%, 5.31%, and 3.47%, respectively, for the three images. Band combinations with fewer than all six bands almost always produced the maximum classification accuracy for a single sample draw. The number of bands and combination of bands, which maximized classification accuracy, was dependent on the characteristics of the individual training data sample draw, the image, sample size, and, to a lesser extent, the sampling method. All three band selection measures were unable to select band combinations that produced higher accuracies on average than all six bands. Cross-validation accuracies with sample size 500 had high correlations with classification accuracies, and provided an indication of sample quality. Collection of a high quality training data sample is key to the performance of the nearest neighbor classifier. Larger samples are necessary to guarantee classifier performance and the utility of cross-validation accuracies. Further research is needed to identify the characteristics of "good" training data samples.
- Doctoral Dissertations