Statistical characterization of area and distance in arc-node geographic information systems

TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Polytechnic Institute and State University


While Geographic Information Systems (GIS) have proven to be effective tools for the management and analysis of forest resources data, estimates of the reliability of area and distance measures computed in GIS have been lacking. Using fairly weak assumptions regarding the variability of point location errors, expressions for computing the mean, variance and covariance of polygon area, and an approximate distribution for distance are derived.

Assumptions about point location errors include unbiasedness, independence between X and Y coordinate errors, known and equal variance of errors in X and Y coordinates, and correlation between errors at adjacent points. For the derivation of distance from a point to a line, the assumption of normality of errors is added. Because the variance of polygon area that was derived depends on the location of the centroid, a centroid location which minimizes polygon variance was defined.

After the mean and variance of polygon area errors were obtained, polygon area was shown to be approximately normally distributed in a simulation of errors in regular polygons. Distance between a point and a line consists of two cases: distance from the point to a vertex of the line, and perpendicular distance to a line segment. The square of vertex distance was shown to be distributed as a non-central chi-square random variable when normal errors are assumed. The normal distribution was demonstrated to be a reasonable approximation for perpendicular distance under similar assumptions.

As an application of the polygon variance and covariance formulas, the variability of value of a tract of land was estimated, based upon fixed per-acre values and assumptions regarding variability of location errors. Under moderate assumptions of variability and correlation, the coefficient of variation of mean tract value was 8%. To demonstrate the application of the distribution of distance, a probabilistic point-in-polygon analysis was performed using timber cruise plot locations in a timber stand map. Over half of the plots were ambiguously located when evaluated using the most liberal set of assumptions tested. The advantages and disadvantages of the models developed herein are discussed.