On the analysis of paired ranked observations

TR Number
Date
1957
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Polytechnic Institute
Abstract

The problem considered in this dissertation is the following: let π₁ and π₂ be two bivariate populations having unknown cumulative distribution functions F₁(x₁, x₂) and F₂(x₁, x₂), respectively. Assume that F₁ and F₂ are continuous and identical except possibly in location parameters. It is desired to test the null hypothesis

H₀: F₁(x₁, x₂) ≡ F₂(x₁, x₂)

against the alternative

H₀: F₁(x₁, x₂) ≠ F₂(x₁, x₂)

It cannot be assumed that the variables x₁ and x₂ are statistically independent.

Suppose there are n₁pairs of observations (x₁₁, x₂₁),..., (x1n1, x2n1) from the population π₁ and n₂ pairs of observations (xln+1. X2n1+1),..., (x1N, x2N) from population π₂, where N = n₁ + n₂. The x₁ᵢ (i = 1,2,..., N) are ranked according to magnitude, the largest being assigned rank 1 and the smallest assigned rank N. In a similar manner, ranks are assigned to the observations x₂ᵢ (i = 1, 2, …, N). It is assumed that there are no ties in ranks.

Let u₁ᵢ and u₂ᵢ denote the ranks assigned to x₁ᵢ and x₂ᵢ if these observations belong to population π₁, and let u’₁ᵢ and u’₂ᵢ denote the ranks of the same observations if they belong to population π₂. Since the sum of the first N integers is (N(N+1))/2, it follows that

Σk=1n₁ uik + Σk=n₁ + 1N uik’ = (N(N+1))/2

If the N pairs of ranks are plotted on a plane, it is likely that the n₁ points from population π₁ and the n₂ points from population π₂ will be interspersed forming a circular or elliptical pattern under the assumption that F₁(x₁, x₂) and F₂(x₁, x₂) are identical. Under the alternative hypothesis, it is likely that there will be a segregation of the points into two groups. A test statistic, S₁² is constructed to measure the extent of this segregation .

The S₁²-statistic proposed here, is based on the Euclidean distance between the centroids of the ranks belonging to π₁ and π₂, in particular

S₁²= (ū₁-ū₁')² + (ū₂-ū₂')²

where

ūᵢ = n₁⁻¹ Σk=1n₁ uik , uᵢ’ = n₂⁻¹ Σk=n₁ + 1N uik

The first two moments of S₁² are derived under the following conditional randomization procedures keeping the ranks paired as given in the sample, n₁ pairs are selected at random (with equal probabilities) from among the N = n₁ + n₂ pairs and assigned to population π₁; the remaining n₂ pairs are assigned to population π₂. It is shown that

E(S₁²) = (N²(N+1))/6n₁n₂

and

σ²S₁² = a₀₀ + a₁₁A₁₁+ a₁₂A₁₂ + a₂₁A₂₁ + a₂₂A₂₂ + a₁₁,₁₁A²₁₁

Where Ars = Σk=1N₁ u1kru2ks are parameters depending on the sample, and the coefficients a₀₀, a₁₁, a₁₂, a₂₁, a₂₂ and a₁₁,₁₁ have been tabulated for values of n₁ and n₂ up to 20.

The exact sampling distribution of S₁² is unknown However, it is sho•Nn that the distribution of (kE(S₁²))/ σ²S₁² is approximately χ² with (2[E(S₁²)]²/ σ²S₁² degrees of freedom.

A rank analogue of Wald’s modification of Hotelling's T² is given and the first two moments obtained. Also, a multivariate extension is considered and a statistic, S₁²(k,2), constructed. The expectation and variance of S₁²(k,2) are derived. A multi-populatiun extension for the case of bivariate populations is given and the expectation is derived for a statistic, S₁²(2,p). A statistic, S₁²(k,p) is constructed for the most general case and its expectation is given.

An alternative approach to the problem, also investigated, is by means of discriminant analysis. In this case simplified formulas are given for the calculation of the components of a vector which provides optimum discrimination. It is shown that this method is not a fruitful one for the construction of tests of significance pertaining to the original null hypothesis.

Description
Keywords
Citation