The Problem of classifying members of a population on a continuous scale

TR Number
Date
1964
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Polytechnic Institute
Abstract

Having available a vector of measurements for each individual in a random sample from a multivariate population, we assume in addition that these individuals can be ranked on some criterion of interest. As an example of this situation, we may have measured certain physiological characteristics (blood pressure, amounts of certain chemical substances in the blood, etc.) in a random sample of schizophrenics. After a series of treatments (perhaps shock treatments, doses of a tranquillizer, etc.) these individuals might be ranked on the basis of favorable response to treatment. We shall in general be interested in predicting which individuals in a new group would respond most favorably. Thus, in the example, we should wish to know·which individuals would most likely benefit from the series of treatments.

Some difficulties in applying the classical discriminant function analysis to problems of this type are noted.

We have chosen to use the multiple correlation coefficient of ranks with measured variates as a statistic in testing whether ranks are associated with measurements. We give to this coefficient the name "quasi-rank multiple correlation coefficient", and proceed to find its first four exact moments under the assumption that the underlying probability distribution is multivariate normal.

Two methods are used to approximate the power of tests based on the quasi-rank multiple correlation coefficient in the case of just one measured variate. The agreement for a sample size of twenty is quite good.

The asymptotic relative efficiency of the squared quasi-rank coefficient vis-a-vis the squared standard multiple correlation coefficient is 9/π² , a result which does not depend on the number of measured variates.

If the null hypothesis that ranks are not associated with measurements is rejected, it is appropriate to use the measurements in some way to predict the ranks. The quasi-rank multiple correlation coefficient is, however, the maximized simple correlation of ranks with linear combinations of the measured variates. The maximizing linear combination of measured variates is taken as a discriminant function, and its values for subsequently chosen individuals is used to rank these individuals in order of merit.

A demonstration study is included in which we employ a random sample of size twenty from a six-variate normal distribution of known structure (for which the population multiple correlation coefficient is .655). The null hypothesis of no association of ranks with measurements is rejected in a two-sided size .05 test. The discriminant function is obtained and is used to "predict" the true ranks of the twenty individuals in the sample. The predicted ranks represent the true ranks rather well, with no predicted rank more than four places from the true rank. For other populations in which the population multiple correlation coefficient is greater than .655 we should expect to obtain even better sets of predicted ranks.

In developing the moments of the quasi-rank multiple correlation coefficient it was necessary to obtain exact moments of a certain linear combination of quasi-ranges in a random sample from a normal population. Since this quasi-range statistic may be useful in other investigations, we include also its moment generating function and some derivatives of this moment generating function.

Description
Keywords
Citation