This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Technical Report # KU-EC-08-5:
QR-Adjustment for Clustering Tests Based on Nearest Neighbor Contingency Tables

Elvan Ceyhan Department of Mathematics, Koç University, 34450 Sarıyer, Istanbul, Turkey
(July 28, 2025)
Abstract

The spatial interaction between two or more classes of points may cause spatial clustering patterns such as segregation or association, which can be tested using a nearest neighbor contingency table (NNCT). A NNCT is constructed using the frequencies of class types of points in nearest neighbor (NN) pairs. For tests based on NNCTs (i.e., NNCT-tests), the null pattern is either complete spatial randomness (CSR) of the points from two or more classes (called CSR independence) or random labeling (RL). The RL pattern implies that the locations of the points in the study region are fixed, while the CSR independence pattern implies that they are random. The distributions of the NNCT-test statistics depend on the number of reflexive NNs (denoted by RR) and the number of shared NNs (denoted by QQ), both of which depend on the allocation of the points. Hence QQ and RR are fixed quantities under RL, but random variables under CSR independence. However given the difficulty in calculating the expected values of QQ and RR under CSR independence, one can use their observed values in NNCT analysis, which makes the distributions of the NNCT-test statistics conditional on QQ and RR under CSR independence. In this article, I use the empirically estimated expected values of QQ and RR under CSR independence pattern to remove the conditioning of NNCT-tests (such a correction is called the QR-adjustment, henceforth). I present a Monte Carlo simulation study to compare the conditional NNCT-tests (i.e., tests with the observed values of QQ and RR are used) and unconditional NNCT-tests (i.e., empirically QR-adjusted tests) under CSR independence and segregation and association alternatives. I demonstrate that QR-adjustment does not significantly improve the empirical size estimates under CSR independence and power estimates under segregation or association alternatives. For illustrative purposes, I apply the conditional and empirically corrected tests on two example data sets.

Keywords: Association; complete spatial randomness; conditional test; nearest neighbor contingency table; random labeling; spatial clustering; spatial pattern; segregation


corresponding author.
e-mail: elceyhan@ku.edu.tr (E. Ceyhan)

1 Introduction

Spatial patterns have been studied extensively and have important implications in many fields such as epidemiology, population biology, and ecology. It is of practical interest to study the univariate spatial patterns (i.e., patterns of only one class) as well as multivariate patterns (i.e., patterns of multiple classes) (Pielou, (1961), Whipple, (1980), and Dixon, (1994, 2002)). For convenience and generality, I call the different types of points as “classes”, but a class label can stand for any characteristic of a measurement at a particular location. For example, the spatial segregation pattern has been investigated for species (Diggle, (2003)), age classes of plants (Hamill and Wright, (1986)), fish species (Herler and Patzner, (2005)), and sexes of dioecious plants (Nanami et al., (1999)). Many of the epidemiological applications are for a two-class system of case and control labels (Waller and Gotway, (2004)).

In this article, for simplicity, I describe the spatial point patterns for two classes only; the extension to multi-class case is straightforward. The null pattern is usually one of the two (random) pattern types: complete spatial randomness (CSR) of two or more classes or random labeling (RL) of a set of fixed points with two classes. That is, when the points from each class are assumed to be uniformly distributed over the region of interest, the the null hypothesis is the CSR of points from two classes. This type of CSR pattern is also referred to as “population independence” in literature (Goreaud and Pélissier, (2003)). In the univariate spatial analysis, CSR refers to the pattern in which locations of points from a single class are random over the study area. To distinguish the CSR of points from two-classes and CSR of points from one class, I call the former as “CSR independence” and the latter as “CSR”, henceforth. Note that CSR independence is equivalent to the case that RL procedure is applied to a given set of points from a CSR pattern in the sense that after points are generated uniformly in the region, the class labels are assigned randomly. When only the labeling of a set of fixed points (the allocation of the points could be regular, aggregated, or clustered, or of lattice type) is random, the null hypothesis is the RL pattern.

Many tests of spatial segregation have been developed in literature (Orton, (1982)). These include comparison of Ripley’s K(t)K(t) or L(t)L(t) functions (Ripley, (2004)), comparison of nearest neighbor (NN) distances (Diggle, (2003), Cuzick and Edwards, (1990)), and analysis of nearest neighbor contingency tables (Pielou, (1961), Meagher and Burdick, (1980)). Nearest neighbor contingency tables (NNCTs) are constructed using the frequencies of classes of points in NN pairs. Kulldorff, (2006) provides an extensive review of tests of spatial randomness that adjust for an inhomogeneity of the densities of the underlying populations. Pielou, (1961) proposed various tests and (Dixon, (1994)) introduced an overall test of segregation, cell-, and class-specific tests based on NNCTs for the two-class case and extended his tests to multi-class case (Dixon, (2002)). These tests based on NNCTs (i.e., NNCT-tests) were designed for testing the RL of points. Pielou, (1961) used the usual Pearson’s χ2\chi^{2}-test of independence for detecting the segregation of the two classes. Due to the ease in computation and interpretation, Pielou’s test of segregation is frequently used for both CSR independence and RL patterns. However it has been shown that Pielou’s test is not appropriate for testing RL (Meagher and Burdick, (1980), Dixon, (1994)). Dixon, (1994) derived the appropriate (asymptotic) sampling distribution of cell counts using Moran join count statistics (Moran, (1948)) and hence the appropriate test which also has a χ2\chi^{2}-distribution (Dixon, (1994)). For the two-class case, Ceyhan, (2006) compared these tests, extended the tests for testing CSR independence, and demonstrated that Pielou’s tests are only appropriate for a random sample of (base, NN) pairs. Furthermore, Ceyhan, (2007) proposed three new overall segregation tests. Since Pielou’s test is not appropriate, NNCT-tests only refer to Dixon’s overall test and the three new segregation tests proposed by Ceyhan, (2007). However the distributions of the NNCT-test statistics depend on the number of reflexive NNs (denoted by RR) and the number of shared NNs (denoted by QQ), both of which depend on the allocation of the points. Hence QQ and RR are fixed under RL, but random under CSR independence. But expectations of QQ and RR seem to be not available analytically under the CSR independence, so their observed values were used by Ceyhan, (2007). In this article, I replace the expectations of QQ and RR by their empirical estimates under CSR independence. Such a correction for removing the conditional nature of NNCT-tests is called “QR-adjustment”, henceforth.

The NNCT-tests are designed for testing a more general null hypothesis, namely, Ho:H_{o}: randomness in the NN structure, which usually results from CSR independence or RL. The distinction between CSR independence and RL is very important when defining the appropriate null model for each empirical case, i.e., the null model depends on the particular context. Goreaud and Pélissier, (2003) discuss the differences between these two null hypotheses and demonstrate that the misinterpretation is very common. They assert that under CSR independence the (locations of the points from) two classes are a priori the result of different processes (e.g., individuals of different species or age cohorts), whereas under RL some processes affect a posteriori the individuals of a single population (e.g., diseased versus non-diseased individuals of a single species). Notice that although CSR independence and RL are not same, they lead to the same null model (i.e., randomness in NN structure) for NNCT-tests, since a NNCT does not require spatially-explicit information.

I consider two major types of (bivariate) spatial clustering patterns, namely, association and segregation as alternative patterns. Association occurs if the NN of an individual is more likely to be from another class. Segregation occurs if the NN of an individual is more likely to be of the same class as the individual; i.e., the members of the same class tend to be clumped or clustered (see, e.g., Pielou, (1961)). For more detail on these alternative patterns, see (Ceyhan, (2007)). I assess the effects of QR-adjustment on the size of the NNCT-tests under CSR independence and on the power of the tests under the segregation or association alternatives by an extensive Monte Carlo study.

Throughout the article I adopt the convention that random quantities are denoted by capital letters, while fixed quantities are denoted by lower case letters. I describe the construction of NNCTs in Section 2.1, provide Dixon’s tests in Sections 2.2 and 2.4, empirical significance levels of the tests in Section 3, two illustrative examples in Section 5, and discussion and conclusions in Section 6.

2 Nearest Neighbor Contingency Tables and Related Tests

2.1 Construction of the Nearest Neighbor Contingency Tables

NNCTs are constructed using the NN frequencies of classes. I describe the construction of NNCTs for two classes; extension to multi-class case is straightforward. Consider two classes with labels {1,2}\{1,2\}. Let NiN_{i} be the number of points from class ii for i{1,2}i\in\{1,2\} and nn be the total sample size, so n=N1+N2n=N_{1}+N_{2}. If I record the class of each point and the class of its NN, the NN relationships fall into four distinct categories: (1,1),(1,2);(2,1),(2,2)(1,1),\,(1,2);\,(2,1),\,(2,2) where in cell (i,j)(i,j), class ii is the base class, while class jj is the class of its NN. That is, the nn points constitute nn (base, NN) pairs. Then each pair can be categorized with respect to the base label (row categories) and NN label (column categories). Denoting NijN_{ij} as the frequency of cell (i,j)(i,j) for i,j{1,2}i,j\in\{1,2\}, I obtain the NNCT in Table 1 where CjC_{j} is the sum of column jj; i.e., number of times class jj points serve as NNs for j{1,2}j\in\{1,2\}. Furthermore, NijN_{ij} is the cell count for cell (i,j)(i,j) that is the sum of all (base, NN) pairs each of which has label (i,j)(i,j). Note also that n=i,jNijn=\sum_{i,j}N_{ij}; ni=j=12Nijn_{i}=\sum_{j=1}^{2}\,N_{ij}; and Cj=i=12NijC_{j}=\sum_{i=1}^{2}\,N_{ij}. By construction, if NijN_{ij} is larger (smaller) than expected, then class jj serves as NN more (less) to class ii than expected, which implies (lack of) segregation if i=ji=j and (lack of) association of class jj with class ii if iji\not=j. Hence, column sums, cell counts are random, while row sums and the overall sum are fixed quantities in a NNCT.

NN class
class 1 class 2 sum
class 1 N11N_{11} N12N_{12} n1n_{1}
base class class 2 N21N_{21} N22N_{22} n2n_{2}
sum C1C_{1} C2C_{2} nn
Table 1: The NNCT for two classes.

Observe that, under segregation, the diagonal entries, NiiN_{ii} for i=1,2i=1,2, tend to be larger than expected; under association, the off-diagonals tend to be larger than expected. The general alternative is that some cell counts are different than expected under CSR independence or RL.

In the two-class case, Pielou, (1961) used Pearson’s χ2\chi^{2}-test of independence to detect any deviation from CSR independence or RL. But, under CSR independence or RL, this test is liberal, i.e., has larger size than the nominal level (Ceyhan, (2006)), hence not considered in this article. Dixon, (1994) proposed a series of tests for segregation based on NNCTs. He first devised four cell-specific tests in the two-class case, and then combined them to form an overall test. For his tests, the probability of an individual from class jj serving as a NN of an individual from class ii depends only on the class sizes (i.e., row sums), but not the total number of times class jj serves as NNs (i.e., column sums).

2.2 Dixon’s Cell-Specific Tests

The level of segregation is estimated by comparing the observed cell counts to the expected cell counts under RL of points that are fixed. Dixon demonstrates that under RL, one can write down the cell frequencies as Moran join count statistics (Moran, (1948)). He then derives the means, variances, and covariances of the cell counts (frequencies) in a NNCT (Dixon, (1994, 2002)).

The null hypothesis under RL is given by

Ho:𝐄[Nij]={ni(ni1)(n1)if i=j,ninj(n1)if ij.H_{o}:\,\mathbf{E}[N_{ij}]=\begin{cases}\frac{n_{i}(n_{i}-1)}{(n-1)}&\text{if $i=j$,}\\ \frac{n_{i}\,n_{j}}{(n-1)}&\text{if $i\not=j$.}\end{cases} (1)

Observe that the expected cell counts depend only on the size of each class (i.e., row sums), but not on column sums.

The cell-specific test statistics suggested by Dixon are given by

ZijD=Nij𝐄[Nij]𝐕𝐚𝐫[Nij],Z^{D}_{ij}=\frac{N_{ij}-\mathbf{E}[N_{ij}]}{\sqrt{\mathbf{Var}[N_{ij}]}}, (2)

where

𝐕𝐚𝐫[Nij]={(n+R)pii+(2n2R+Q)piii+(n23nQ+R)piiii(npii)2if i=j,npij+Qpiij+(n23nQ+R)piijj(npij)2if ij,\mathbf{Var}[N_{ij}]=\begin{cases}(n+R)\,p_{ii}+(2\,n-2\,R+Q)\,p_{iii}+(n^{2}-3\,n-Q+R)\,p_{iiii}-(n\,p_{ii})^{2}&\text{if $i=j$,}\\ n\,p_{ij}+Q\,p_{iij}+(n^{2}-3\,n-Q+R)\,p_{iijj}-(n\,p_{ij})^{2}&\text{if $i\not=j$,}\end{cases} (3)

with pxxp_{xx}, pxxxp_{xxx}, and pxxxxp_{xxxx} are the probabilities that a randomly picked pair, triplet, or quartet of points, respectively, are the indicated classes and are given by

pii\displaystyle p_{ii} =ni(ni1)n(n1),\displaystyle=\frac{n_{i}\,(n_{i}-1)}{n\,(n-1)}, pij\displaystyle p_{ij} =ninjn(n1),\displaystyle=\frac{n_{i}\,n_{j}}{n\,(n-1)},
piii\displaystyle p_{iii} =ni(ni1)(ni2)n(n1)(n2),\displaystyle=\frac{n_{i}\,(n_{i}-1)\,(n_{i}-2)}{n\,(n-1)\,(n-2)}, piij\displaystyle p_{iij} =ni(ni1)njn(n1)(n2),\displaystyle=\frac{n_{i}\,(n_{i}-1)\,n_{j}}{n\,(n-1)\,(n-2)}, (4)
piijj\displaystyle p_{iijj} =ni(ni1)nj(nj1)n(n1)(n2)(n3),\displaystyle=\frac{n_{i}\,(n_{i}-1)\,n_{j}\,(n_{j}-1)}{n\,(n-1)\,(n-2)\,(n-3)}, piiii\displaystyle p_{iiii} =ni(ni1)(ni2)(ni3)n(n1)(n2)(n3).\displaystyle=\frac{n_{i}\,(n_{i}-1)\,(n_{i}-2)\,(n_{i}-3)}{n\,(n-1)\,(n-2)\,(n-3)}.

Furthermore, QQ is the number of points with shared NNs, which occur when two or more points share a NN and RR is twice the number of reflexive pairs. Then Q=2(Q2+3Q3+6Q4+10Q5+15Q6)Q=2\,(Q_{2}+3\,Q_{3}+6\,Q_{4}+10\,Q_{5}+15\,Q_{6}) where QkQ_{k} is the number of points that serve as a NN to other points kk times. One-sided and two-sided tests are possible for each cell (i,j)(i,j) using the asymptotic normal approximation of ZijDZ^{D}_{ij} given in Equation (2) (Dixon, (1994)). The test in Equation (2) is the same as Dixon’s ZAAZ_{AA} when i=j=1i=j=1; same as ZBBZ_{BB} when i=j=2i=j=2 (Dixon, (1994)). Note also that in Equation (2) four different tests are defined as there are four cells and each is testing the deviation from the null case in the respective cell. These four tests are combined and used in defining an overall test of segregation in Section 2.4.

Under CSR independence, the null hypothesis, the test statistics, and the variances are as in the RL case for the cell-specific tests, except for the fact that the variances are conditional on QQ and RR.

2.3 The Status of QQ and RR under CSR Independence and RL

Note the difference in status of the variables QQ and RR under CSR independence and RL models. Under RL, QQ and RR are fixed quantities; while under CSR independence, they are random. The quantities given in Equations (1), (3), and all the quantities depending on these expectations also depend on QQ and RR. Hence these expressions are appropriate under the RL pattern. Under CSR independence pattern they are conditional variances and covariances obtained by using the observed values of QQ and RR. The unconditional variances and covariances can be obtained by replacing QQ and RR with their expectations.

Unfortunately, given the difficulty of calculating the expectations of QQ and RR under CSR independence, Ceyhan, (2007) employed the conditional variances and covariances (i.e., the variances and covariances for which observed QQ and RR values are used) even when assessing their behavior under CSR independence pattern. Alternatively, I can estimate the values of QQ and RR empirically as follows. I generate n{10,20,30,40,50,100,500,1000}n\in\{10,20,30,40,50,100,500,1000\} points that are iid (independently and identically distributed) from 𝒰((0,1)×(0,1))\mathcal{U}((0,1)\times(0,1)), the uniform distribution on the unit square. I repeat this procedure Nmc=1000000N_{mc}=1000000 times. At each Monte Carlo replication, I calculate QQ and RR values, and record the ratios Q/nQ/n and R/nR/n. I plot these ratios in Figure 1 as a function of sample size nn. Observe that the ratios seem to converge as nn increases. For homogeneous planar Poisson pattern, I have 𝐄[Q/n].6327860\mathbf{E}[Q/n]\approx.6327860 and 𝐄[R/n]0.6211200\mathbf{E}[R/n]\approx 0.6211200. Hence, I replace QQ and RR by 0.63n0.63\,n and 0.62n0.62\,n, respectively, to obtain the QR-adjusted variances and covariances.

Refer to caption
Refer to caption
Figure 1: Plotted are the empirically estimated expectations 𝐄[Q/n]\mathbf{E}[Q/n] (left) and 𝐄[R/n]\mathbf{E}[R/n] (right) as a function of total sample size nn.

2.4 Dixon’s Overall Segregation Test

Dixon’s overall test of segregation tests the hypothesis that expected cell counts in the NNCT are as in Equation (1). In the two-class case, he calculates Zii=(Nii𝐄[Nii])/𝐕𝐚𝐫[Nii]Z_{ii}=(N_{ii}-\mathbf{E}[N_{ii}])\big{/}\sqrt{\mathbf{Var}[N_{ii}]} for both i{1,2}i\in\{1,2\} and combines these test statistics into a statistic that is asymptotically distributed as χ22\chi^{2}_{2} under RL (Dixon, (1994)). The suggested test statistic is given by

C=𝐘Σ1𝐘=[N11𝐄[N11]N22𝐄[N22]][𝐕𝐚𝐫[N11]𝐂𝐨𝐯[N11,N22]𝐂𝐨𝐯[N11,N22]𝐕𝐚𝐫[N22]]1[N11𝐄[N11]N22𝐄[N22]],C=\mathbf{Y}^{\prime}\Sigma^{-1}\mathbf{Y}=\left[\begin{array}[]{c}N_{11}-\mathbf{E}[N_{11}]\\ N_{22}-\mathbf{E}[N_{22}]\end{array}\right]^{\prime}\left[\begin{array}[]{cc}\mathbf{Var}[N_{11}]&\mathbf{Cov}[N_{11},N_{22}]\\ \mathbf{Cov}[N_{11},N_{22}]&\mathbf{Var}[N_{22}]\\ \end{array}\right]^{-1}\left[\begin{array}[]{c}N_{11}-\mathbf{E}[N_{11}]\\ N_{22}-\mathbf{E}[N_{22}]\end{array}\right], (5)

where 𝐄[Nii]\mathbf{E}[N_{ii}] are as in Equation (1), 𝐕𝐚𝐫[Nii]\mathbf{Var}[N_{ii}] are as in Equation (3), and

𝐂𝐨𝐯[N11,N22]=(n23nQ+R)p1122n2p11p22.\mathbf{Cov}[N_{11},\,N_{22}]=(n^{2}-3\,n-Q+R)\,p_{1122}-n^{2}\,p_{11}\,p_{22}. (6)

Dixon’s CC statistic given in Equation (5) can also be written as

C=ZAA2+ZBB22rZAAZBB1r2,C=\frac{Z_{AA}^{2}+Z_{BB}^{2}-2rZ_{AA}Z_{BB}}{1-r^{2}},

where r=𝐂𝐨𝐯[N11,N22]/𝐕𝐚𝐫[N11]𝐕𝐚𝐫[N22]r=\mathbf{Cov}[N_{11},N_{22}]\Big{/}\sqrt{\mathbf{Var}[N_{11}]\mathbf{Var}[N_{22}]} (Dixon, (1994)).

Under CSR independence, the expected values, variances and covariances are as in the RL case. However, the variance and covariance terms include QQ and RR which are random under CSR independence and fixed under RL. Hence Dixon’s test statistic CC asymptotically has a χ12\chi^{2}_{1}-distribution under CSR independence conditional on QQ and RR. Replacing QQ and RR by their empirical estimates given in Section 2.3, I obtain the QR-adjusted version of Dixon’s test which is denoted by CqrC_{qr}.

2.5 Version I of the New Segregation Tests

Ceyhan, (2007) proposed tests based on the correct sampling distribution of the cell counts in a NNCT under CSR independence or RL. In defining the new segregation or clustering tests, I follow a track similar to that of Dixon’s (Dixon, (1994)) where he defines a cell-specific test statistic for each cell and then combines these four tests into an overall test.

For cell (i,j)(i,j), let

TijI=NijniCjn and then let NijI=TijInicj/n=(Nijnicj/n)nicj/n.T^{I}_{ij}=N_{ij}-\frac{n_{i}\,C_{j}}{n}~~~\text{ and then let }~~~N^{I}_{ij}=\frac{T^{I}_{ij}}{\sqrt{n_{i}\,c_{j}/n}}=\frac{\left(N_{ij}-n_{i}\,c_{j}/n\right)}{\sqrt{n_{i}\,c_{j}/n}}. (7)

Furthermore, let 𝐍𝐈\mathbf{N_{I}} be the vector of NijIN^{I}_{ij} values concatenated row-wise and let ΣI\Sigma_{I} be the variance-covariance matrix of 𝐍𝐈\mathbf{N_{I}} based on the correct sampling distribution of the cell counts. That is, ΣI=(𝐂𝐨𝐯[NijI,NklI])\Sigma_{I}=\left(\mathbf{Cov}\left[N^{I}_{ij},N^{I}_{kl}\right]\right) where

𝐂𝐨𝐯[NijI,NklI]=nnicjnkcl𝐂𝐨𝐯[Nij,Nkl]\mathbf{Cov}\left[N^{I}_{ij},N^{I}_{kl}\right]=\frac{n}{\sqrt{n_{i}\,c_{j}\,n_{k}\,c_{l}}}\mathbf{Cov}\left[N_{ij},N_{kl}\right]

with 𝐂𝐨𝐯[Nij,Nkl]\mathbf{Cov}\left[N_{ij},N_{kl}\right] is as in Equation (3) if (i,j)=(k,l)(i,j)=(k,l) and as in Equation (6) if (i,j)=(1,1)(i,j)=(1,1) and (k,l)=(2,2)(k,l)=(2,2). Since ΣI\Sigma_{I} is not invertible, I use its generalized inverse which is denoted by ΣI\Sigma_{I}^{-} (Searle, (2006)). Then the first version of segregation tests suggested by Ceyhan, (2007) is

𝒳I2=𝐍𝐈ΣI𝐍𝐈\mathcal{X}_{I}^{2}=\mathbf{N^{\prime}_{I}}\Sigma_{I}^{-}\mathbf{N_{I}} (8)

which asymptotically has a χ12\chi^{2}_{1} distribution.

Under CSR independence, the expected values, variances, and covariances related to 𝒳I2\mathcal{X}_{I}^{2} are as in the RL case, except they are not only conditional on column sums (i.e., on Cj=cjC_{j}=c_{j}), but also conditional on QQ and RR. Hence 𝒳I2\mathcal{X}_{I}^{2} has asymptotically χ12\chi^{2}_{1} distribution conditional on column sums, QQ and RR under CSR independence. Replacing QQ and RR by their empirical estimates given in Section 2.3, I obtain the QR-adjusted version of this test which is denoted by 𝒳I,qr2\mathcal{X}_{I,qr}^{2}, which is still conditional on column sums.

2.6 Version II of the New Segregation Tests

For cell (i,j)(i,j), let

TijII=Nijninjn and then let NijII=TijIIninj/n=(Nijninj/n)ninj/n.T^{II}_{ij}=N_{ij}-\frac{n_{i}\,n_{j}}{n}~~~\text{ and then let }~~~N^{II}_{ij}=\frac{T^{II}_{ij}}{\sqrt{n_{i}\,n_{j}/n}}=\frac{\left(N_{ij}-n_{i}\,n_{j}/n\right)}{\sqrt{n_{i}\,n_{j}/n}}. (9)

Furthermore, let 𝐍𝐈𝐈\mathbf{N_{II}} be the vector of NijIIN^{II}_{ij} concatenated row-wise and let ΣII\Sigma_{II} be the variance-covariance matrix of 𝐍𝐈𝐈\mathbf{N_{II}} based on the correct sampling distribution of the cell counts. That is, ΣII=(𝐂𝐨𝐯[NijII,NklII])\Sigma_{II}=\left(\mathbf{Cov}\left[N^{II}_{ij},N^{II}_{kl}\right]\right) where

𝐂𝐨𝐯[NijII,NklII]=nninjnknl𝐂𝐨𝐯[Nij,Nkl].\mathbf{Cov}\left[N^{II}_{ij},N^{II}_{kl}\right]=\frac{n}{\sqrt{n_{i}\,n_{j}\,n_{k}\,n_{l}}}\mathbf{Cov}\left[N_{ij},N_{kl}\right].

Since ΣII\Sigma_{II} is not invertible, I use its generalized inverse ΣII\Sigma_{II}^{-}. Then second version of the tests proposed by Ceyhan, (2007) is

𝒳II2=𝐍𝐈𝐈ΣII𝐍𝐈𝐈\mathcal{X}_{II}^{2}=\mathbf{N^{\prime}_{II}}\Sigma_{II}^{-}\mathbf{N_{II}} (10)

which asymptotically has a χ22\chi^{2}_{2} distribution under RL. Note that ΣII\Sigma_{II} can be obtained from Σ\Sigma used in Equation (5) by multiplying Σ\Sigma entry-wise with the matrix CMII=(nninjnknl)C^{II}_{M}=\left(\frac{n}{\sqrt{n_{i}\,n_{j}\,n_{k}\,n_{l}}}\right). This version of the segregation test is asymptotically equivalent to Dixon’s segregation test.

Under CSR independence, the expectations, variances, and covariances related to 𝒳II2\mathcal{X}^{2}_{II} are as in the RL case, but the variances and covariances are conditional on QQ and RR. Hence, the asymptotic distribution of 𝒳II2\mathcal{X}^{2}_{II} is also conditional on QQ and RR. Replacing QQ and RR with their empirical estimates, I obtain the QR-adjusted version of this test which is denoted by 𝒳II,qr2\mathcal{X}^{2}_{II,qr} and is not conditional any more.

2.7 Version III of the New Segregation Tests

Notice that version I is a conditional test (conditional on column sums), while version II is asymptotically equivalent to Dixon’s test, Furthermore, both Dixon’s test and version II incorporate only row sums (i.e., class sizes) in the NNCTs.

Ceyhan, (2007) suggests another test statistic which uses both the column sums (i.e., number of times a class serves as NN) and row sums and is not conditional on the column sums. Let

TijIII={Nij(ni1)(n1)Cjif i=j,Nijni(n1)Cjif ij.T^{III}_{ij}=\begin{cases}N_{ij}-\frac{(n_{i}-1)}{(n-1)}\,C_{j}&\text{if $i=j$,}\\ N_{ij}-\frac{n_{i}}{(n-1)}\,C_{j}&\text{if $i\not=j$.}\end{cases} (11)

Let 𝐍𝐈𝐈𝐈\mathbf{N_{III}} be the vector of TijIIIT^{III}_{ij} values concatenated row-wise and let ΣIII\Sigma_{III} be the variance-covariance matrix of 𝐍𝐈𝐈𝐈\mathbf{N_{III}} based on the correct sampling distribution of the cell counts. That is, ΣIII=(𝐂𝐨𝐯[TijIII,TklIII])\Sigma_{III}=\left(\mathbf{Cov}\left[T^{III}_{ij},T^{III}_{kl}\right]\right) where the explicit forms of 𝐂𝐨𝐯[TijIII,TklIII]\mathbf{Cov}\left[T^{III}_{ij},T^{III}_{kl}\right] are provided in (Ceyhan, (2007)). Since ΣIII\Sigma_{III} is not invertible, I use its generalized inverse ΣIII\Sigma_{III}^{-}. Then the proposed test statistic by (Ceyhan, (2007)) for overall segregation is the quadratic form 𝒳III2=𝐍𝐈𝐈𝐈ΣIII𝐍𝐈𝐈𝐈\mathcal{X}_{III}^{2}=\mathbf{N^{\prime}_{III}}\Sigma_{III}^{-}\mathbf{N_{III}} which asymptotically has a χ12\chi^{2}_{1} distribution.

Under CSR independence, the discussion related to and derivation of 𝒳III2\mathcal{X}^{2}_{III} are as in the RL case; however, the variance and covariance terms (hence the asymptotic distribution) are conditional on QQ and RR. Replacing QQ and RR with their empirical estimates, I obtain the QR-adjusted version of this test which is denoted by 𝒳III,qr2\mathcal{X}^{2}_{III,qr}.

Remark 2.1.

Extension to Multi-Class Case: So far, I have described the segregation tests for the two class case in which the corresponding NNCT is of dimension 2×22\times 2. The cell counts for the diagonal cells have asymptotic normality. For the off-diagonal cells, although the asymptotic normality is supported by Monte Carlo simulation results (Dixon, (2002)), it is not rigorously proven yet. Nevertheless, if the asymptotic normality held for all q2q^{2} cell counts in the NNCT, under RL, Dixon’s test and version II would have χq(q1)2\chi^{2}_{q(q-1)} distribution, versions I and III would have χ(q1)22\chi^{2}_{(q-1)^{2}} distribution asymptotically. Under CSR independence, these tests will have the corresponding asymptotic distributions conditional on QQ and RR. The QR-adjusted versions can be obtained by replacing QQ and RR with their empirical estimates.

3 Empirical Significance Levels of NNCT-Tests under the CSR Independence

For the null case, Ho:H_{o}: CSR independence, I simulate the CSR case only with classes 1 and 2 (i.e., XX and YY) of sizes n1n_{1} and n2n_{2}, respectively. At each of Nmc=10000N_{mc}=10000 replicates, I generate data for some sample size combinations of n1,n2{10,30,50,100}n_{1},n_{2}\in\{10,30,50,100\} points iid from 𝒰((0,1)×(0,1))\mathcal{U}((0,1)\times(0,1)). These sample size combinations are chosen so that one can examine the influence of small and large samples, and the relative abundance of the classes on the tests. The corresponding test statistics are recorded at each Monte Carlo replication for each sample size combination. Then I record how many times the pp-value is at or below α=.05\alpha=.05 for each test to estimate the empirical size. I present the empirical sizes for NNCT-tests in Table 2, where α^D\widehat{\alpha}_{D} is the empirical significance level for Dixon’s test, α^I,α^II\widehat{\alpha}_{I},\,\widehat{\alpha}_{II} and α^III\widehat{\alpha}_{III} are for versions I, II, and III, respectively, and α^D,qr\widehat{\alpha}_{D,qr}, α^I,qr,α^II,qr\widehat{\alpha}_{I,qr},\,\widehat{\alpha}_{II,qr} and α^III,qr\widehat{\alpha}_{III,qr} are for the corresponding QR-adjusted versions. The empirical sizes significantly smaller (larger) than .05 are marked with c (), which indicate that the corresponding test is conservative (liberal). The asymptotic normal approximation to proportions is used in determining the significance of the deviations of the empirical size estimates from the nominal level of .05. For these proportion tests, I also use α=.05\alpha=.05 to test against empirical size being equal to .05. With Nmc=10000N_{mc}=10000, empirical sizes less (greater) than .0464 (.0536) are deemed conservative (liberal) at α=.05\alpha=.05 level.

Observe that the (unadjusted) NNCT-tests are about the desired level (or size) when n1n_{1} and n2n_{2} are both 30\geq 30, and mostly conservative otherwise. The same trend holds for the QR-adjusted versions. Furthermore, comparing the empirical sizes of QR-adjusted versions with those of unadjusted ones, I see that for almost all cases they are not significantly different (at α=.05\alpha=.05 based on tests on equality of the proportions).

Empirical significance levels of the NNCT-tests
conditional (i.e., unadjusted) unconditional (i.e., QR-adjusted)
(n1,n2)(n_{1},n_{2}) α^D\widehat{\alpha}_{D} α^I\widehat{\alpha}_{I} α^II\widehat{\alpha}_{II} α^III\widehat{\alpha}_{III} α^D,qr\widehat{\alpha}_{D,qr} α^I,qr\widehat{\alpha}_{I,qr} α^II,qr\widehat{\alpha}_{II,qr} α^III,qr\widehat{\alpha}_{III,qr}
(10,10) .0432c .0593 .0461c .0439c .0470 .0595 .0486 .0365c,<
(10,30) .0440c .0451c .0421c .0410c .0411c .0465 .0381c .0461c,>
(10,50) .0482 .0335c .0423c .0397c .0497 .0345c .0411c .0431c
(30,10) .0390c .0411c .0383c .0391c .0402c .0423c .0379c .0436c
(30,30) .0464 .0544 .0476 .0427c .0492 .0552 .0478 .0409c
(30,50) .0454c .0507 .0481 .0504 .0411c .0517 .0464 .0515
(50,10) .0529 .0326c .0468 .0379c .0510 .0334c .0428c .0402c
(50,30) .0429c .0494 .0468 .0469 .0405c .0518 .0466 .0492
(50,50) .0508 .0494 .0497 .0499 .0528 .0494 .0524 .0488
(50,100) .0560 .0501 .0564 .0516 .0556 .0493 .0573 .0494
(100,50) .0483 .0463c .0492 .0479 .0495 .0457 .0501 .0460
(100,100) .0504 .0524 .0519 .0489 .0513 .0524 .0523 .0463c
Table 2: The empirical significance levels for Dixon’s, and the new versions of the NNCT-tests by (Ceyhan, (2007)) as well as their QR-adjusted versions based on 10000 Monte Carlo simulations of CSR independence pattern. α^D\widehat{\alpha}_{D} stands for the empirical significance level for Dixon’s test, α^I,α^II\widehat{\alpha}_{I},\,\widehat{\alpha}_{II} and α^III\widehat{\alpha}_{III} for versions I, II, and III, respectively; and α^D,qr\widehat{\alpha}_{D,qr}, α^I,qr,α^II,qr\widehat{\alpha}_{I,qr},\,\widehat{\alpha}_{II,qr} and α^III,qr\widehat{\alpha}_{III,qr} stand for the corresponding QR-adjusted versions. (c (): the empirical size is significantly smaller (larger) than .05; i.e., the test is conservative (liberal). < (>): the empirical size of QR-adjusted version is significantly smaller (larger) than that of unadjusted version.)

4 Empirical Power Analysis

To evaluate the power performance of the QR-adjusted and unadjusted NNCT-tests, I only consider alternatives against the CSR pattern. That is, the points are generated in such a way that they are from an inhomogeneous Poisson process in a region of interest (unit square in the simulations) for at least one class. Furthermore, the tests considered in this article seem to have the desired nominal level for large samples under CSR, and QR-adjustment is not necessary under the RL pattern. Hence I avoid the alternatives against the RL pattern; i.e., I do not consider non-random labeling of a fixed set of points that would result in segregation or association.

4.1 Empirical Power Analysis under Segregation Alternatives

For the segregation alternatives (against the CSR pattern), three cases are considered. I generate Xiiid𝒰((0,1s)×(0,1s))X_{i}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{U}((0,1-s)\times(0,1-s)) for i=1,2,,n1i=1,2,\ldots,n_{1} and Yjiid𝒰((s,1)×(s,1))Y_{j}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{U}((s,1)\times(s,1)) for j=1,2,,n2j=1,2,\ldots,n_{2}. In the pattern generated, appropriate choices of ss will imply XiX_{i} and YjY_{j} to be more segregated than expected under CSR. That is, it will be more likely to have (X,X)(X,X) NN pairs than mixed NN pairs (i.e., (X,Y)(X,Y) or (Y,X)(Y,X) pairs). The three values of ss I consider constitute the three segregation alternatives:

HSI:s=1/6,HSII:s=1/4, and HSIII:s=1/3.H_{S}^{I}:s=1/6,\;\;\;H_{S}^{II}:s=1/4,\text{ and }H_{S}^{III}:s=1/3. (12)

Observe that, from HSIH_{S}^{I} to HSIIIH_{S}^{III} (i.e., as ss increases), the segregation gets stronger in the sense that XX and YY points tend to form one-class clumps or clusters. By construction, the points are uniformly generated, hence exhibit homogeneity with respect to their supports for each class, but with respect to the unit square these alternative patterns are examples of departures from first-order homogeneity which implies segregation of the classes XX and YY. The simulated segregation patterns are symmetric in the sense that, XX and YY classes are equally segregated (or clustered) from each other.

Refer to caption
Refer to caption
Refer to caption
Figure 2: Three realizations for HSI:s=1/6H_{S}^{I}:s=1/6, HSII:s=1/4H_{S}^{II}:s=1/4, and HSIII:s=1/3H_{S}^{III}:s=1/3 with n1=100n_{1}=100 XX points (solid squares \blacksquare) and n2=100n_{2}=100 YY points (triangles \triangle).

The power estimates against the sample size combinations are presented in Figure 3, where β^D\widehat{\beta}_{D} is for Dixon’s test, β^I\widehat{\beta}_{I}, β^II\widehat{\beta}_{II}, and β^III\widehat{\beta}_{III} are for versions I, II , and III, respectively, and the QR-adjusted versions are indicated by qrqr in their subscripts. Observe that, as n=(n1+n2)n=(n_{1}+n_{2}) gets larger, the power estimates get larger. For the same n=(n1+n2)n=(n_{1}+n_{2}) values, the power estimate is larger for classes with similar sample sizes. Furthermore, as the segregation gets stronger, the power estimates get larger. The NNCT-tests have about the same power performance under these segregation alternatives. Notice also that for small samples the power estimates of the QR-adjusted versions are slightly larger but for other sample size combinations the power estimates for the QR-adjusted versions and the unadjusted versions are virtually indistinguishable.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Empirical power estimates for the QR-adjusted and unadjusted NNCT-tests based on 10000 Monte Carlo replications under the segregation alternatives. The numbers in the horizontal axis labels represent sample (i.e., class) size combinations: 1=(10,10), 2=(10,30), 3=(10,50), 4=(30,10), 5=(30,30), 6=(30,50), 7=(50,10), 8=(50,30), 9=(50,50), 10=(50,100), 11=(100,50), 12=(100,100).

4.2 Empirical Power Analysis under Association Alternatives

For the association alternatives (against the CSR pattern), I also consider three cases. First, I generate Xiiid𝒰((0,1)×(0,1))X_{i}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{U}((0,1)\times(0,1)) for i=1,2,,n1i=1,2,\ldots,n_{1}. Then I generate YjY_{j} for j=1,2,,n2j=1,2,\ldots,n_{2} as follows. For each jj, I pick an ii randomly, then generate YjY_{j} as Xi+Rj(cosTj,sinTj)X_{i}+R_{j}\,(\cos T_{j},\sin T_{j})^{\prime} where Rjiid𝒰(0,r)R_{j}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{U}(0,r) with r(0,1)r\in(0,1) and Tjiid𝒰(0,2π)T_{j}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{U}(0,2\,\pi). In the pattern generated, appropriate choices of rr will imply YjY_{j} and XiX_{i} are more associated than expected. That is, it will be more likely to have (X,Y)(X,Y) NN pairs than self NN pairs (i.e., (X,X)(X,X) or (Y,Y)(Y,Y)). The three values of rr I consider constitute the three association alternatives:

HAI:r=1/4,HAII:r=1/7, and HAIII:r=1/10.H_{A}^{I}:r=1/4,\;\;\;H_{A}^{II}:r=1/7,\text{ and }H_{A}^{III}:r=1/10. (13)

Observe that, from HAIH_{A}^{I} to HAIIIH_{A}^{III} (i.e., as rr decreases), the association gets stronger in the sense that XX and YY points tend to occur together more and more frequently. By construction, XX points are from a homogeneous Poisson process with respect to the unit square, while YY points exhibit inhomogeneity in the same region. Furthermore, these alternative patterns are examples of departures from second-order homogeneity which implies association of the class YY with class XX.

Refer to caption
Refer to caption
Refer to caption
Figure 4: Three realizations for HAI:r=1/4H_{A}^{I}:r=1/4, HAII:s=1/7H_{A}^{II}:s=1/7, and HAIII:r=1/10H_{A}^{III}:r=1/10 with n1=20n_{1}=20 XX points (solid squares \blacksquare) and n2=100n_{2}=100 YY points (triangles \triangle).

The power estimates under the association alternatives are presented in Figure 5, where labeling is as in Figure 3. Observe that, for similar sample sizes as n=(n1+n2)n=(n_{1}+n_{2}) gets larger, the power estimates get larger at each association alternative. Furthermore, as the association gets stronger, the power estimates get larger at each sample size combination. The NNCT-tests have about the same power estimates under these association alternatives. Furthermore the QR-adjusted versions of the tests virtually have the same power estimates as the unadjusted versions; for the smaller samples QR-adjusted version has slightly lower power estimates.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Empirical power estimates for the QR-adjusted and unadjusted NNCT-tests under the association alternatives. The numbers in the horizontal axis labels represent sample (i.e., class) size combinations: 1=(10,10), 2=(10,30), 3=(10,50), 4=(30,10), 5=(30,30), 6=(30,50), 7=(50,10), 8=(50,30), 9=(50,50), 10=(50,100), 11=(100,50), 12=(100,100).
Remark 4.1.

Main Result of Monte Carlo Simulation Analysis: Based on the simulation results under CSR independence of the points, I observe that none of the NNCT-tests I consider has the desired level when at least one sample size is small so that the cell count(s) in the corresponding NNCT have a high probability of being 5\leq 5. This usually corresponds to the case that at least one sample size is 10\leq 10 or the sample sizes are very different in the simulation study. When sample sizes are small (hence the corresponding cell counts are 5\leq 5), the asymptotic approximation of the NNCT-tests is not appropriate. So Dixon, (1994) recommends Monte Carlo randomization for his test when some cell count(s) are 5\leq 5 in a NNCT. I extend this recommendation for all the NNCT-tests discussed in this article. Furthermore, among the NNCT-tests, Dixon’s and version III tests seem to be affected by the QR-adjustment more than the other tests in terms of empirical size. But QR-adjustment does not necessarily improve the results of the NNCT-analysis under CSR independence, as the empirical sizes of the adjusted and unadjusted versions are not significantly different. Furthermore, the QR-adjustment does not significantly improve the power performance under segregation and association alternatives. In fact the power estimates of QR-adjusted and unadjusted tests were about the same under these alternatives.

5 Examples

I illustrate the tests on two examples: an ecological data set, namely swamp tree data (Good and Whipple, (1982)), and an artificial data set.

5.1 Swamp Tree Data

Good and Whipple, (1982) considered the spatial patterns of tree species along the Savannah River, South Carolina, U.S.A. From this data, Dixon, (2002) used a single 50m ×\times 200m rectangular plot to illustrate his tests. All live or dead trees with 4.5 cm or more dbh (diameter at breast height) were recorded together with their species. Hence it is an example of a realization of a marked multi-variate point pattern. The plot contains 13 different tree species, four of which comprise over 90 % of the 734 tree stems. The remaining tree stems were categorized as “other trees”. The plot consists of 215 water tupelo (Nyssa aquatica), 205 black gum (Nyssa sylvatica), 156 Carolina ash (Fraxinus caroliniana), 98 bald cypress (Taxodium distichum), and 60 stems of 8 additional species (i.e., other species). I will only consider live trees from the two most frequent tree species in this data set (i.e., water tupelos and black gums). So a 2×22\times 2 NNCT-analysis is conducted for this data set. If segregation among the less frequent species were important, a more detailed 5×55\times 5 or a 12×1212\times 12 NNCT-analysis should be performed. The locations of these trees in the study region are plotted in Figure 6 and the corresponding 2×22\times 2 NNCT together with percentages based on row and grand sums are provided in Table 3. For example, for water tupelo as the base species and black gum as the NN species, the cell count is 54 which is 26 % of the 211 black gums (which is 54 % of all 394 trees). Observe that the percentages and Figure 6 are suggestive of segregation for all three tree species since the observed percentages of species with themselves as the NN are much larger than the row percentages.

Refer to caption
Figure 6: The scatter plot of the locations of water tupelos (circles \circ) and black gums (triangles \triangle).
NN species
W.T. B.G. sum
W.T. 157 (74 %) 54 (26 %) 211 (54 %)
base species B.G. 52 (28 %) 131 (72 %) 183 (46 %)
sum 209 (53 %) 185 (47 %) 394 (100 %)
Table 3: The NNCT for swamp tree data and the corresponding percentages (in parentheses), where the cell percentages are with respect to the row sums and marginal percentages are with respect to the total size. W.T. = water tupelos and B.G. = black gums.

The locations of the tree species can be viewed a priori resulting from different processes so the more appropriate null hypothesis is the CSR independence pattern. Hence our inference will be a conditional one (see Section 2.3) if I use the observed values of QQ and RR. I observe Q=270Q=270 and R=236R=236 for this data set, and the empirical estimates for these sample sizes are Q=249.68Q=249.68 and R=244.95R=244.95. I present the tests statistics and the associated pp-values for NNCT-tests in Table 4. Observe that the test statistics all decrease with the QR-adjustment, however this decrease is not substantial to alter the conclusions. Based on the NNCT-tests, I find that the segregation between both species is significant, since all the tests considered yield significant pp-values, and the diagonal cells (i.e., cells (1,1)(1,1) and (2,2)(2,2)) are larger than expected.

NNCT-test statistics and the associated pp-values
for swamp tree data
CC 𝒳I2\mathcal{X}_{I}^{2} 𝒳II2\mathcal{X}_{II}^{2} 𝒳III2\mathcal{X}_{III}^{2}
52.72 52.08 52.14 52.66
(<.0001<.0001) (<.0001<.0001) (<.0001<.0001) (<.0001<.0001)
CqrC_{qr} 𝒳I,qr2\mathcal{X}_{I,qr}^{2} 𝒳II,qr2\mathcal{X}_{II,qr}^{2} 𝒳III,qr2\mathcal{X}_{III,qr}^{2}
51.98 51.35 51.41 51.92
(<.0001<.0001) (<.0001<.0001) (<.0001<.0001) (<.0001<.0001)
Table 4: Test statistics and the associated pp-values (in parentheses) for NNCT-tests for the swamp tree data set. CC stands for Dixon’s overall test, 𝒳I2\mathcal{X}_{I}^{2}, 𝒳II2\mathcal{X}_{II}^{2}, and 𝒳III2\mathcal{X}_{III}^{2} stand for versions I, II, and III of the tests by Ceyhan, (2007). CqrC_{qr}, 𝒳I,qr2\mathcal{X}_{I,qr}^{2}, 𝒳II,qr2\mathcal{X}_{II,qr}^{2}, and 𝒳III,qr2\mathcal{X}_{III,qr}^{2} are the QR-adjusted versions of these tests.

5.2 Artificial Data Set

In the swamp tree example, although the test statistics for unadjusted and QR-adjusted versions are different for Pielou’s and Dixon’s tests and pp-values for QR-adjusted versions are larger than unadjusted ones, I have the same conclusion: there is strong evidence for segregation of tree species. Below, I present an artificial example, a random sample of size 100 (with 5050 XX-points and 5050 YY-points uniformly generated on the unit square). The question of interest is the spatial interaction between XX and YY classes. I plot the locations of the points in Figure 7 and the corresponding NNCT together with percentages are provided in Table 6. Observe that the percentages are suggestive of mild segregation, with equal degree for both classes.

Refer to caption
Figure 7: The scatter plot of the locations of XX (circles \circ) and YY points (triangles \triangle) in the artificial data set.
NN class
XX YY sum
XX 30 (60 %) 20 (40 %) 50 (50 %)
base class YY 19 (38 %) 31 (62 %) 50 (50 %)
sum 49 (49 %) 51 (51 %) 100 (100 %)
Table 5: The NNCT for the artificial data and the corresponding percentages (in parentheses), where the cell percentages are with respect to the row sums and marginal percentages are with respect to the total size.

The data is generated to resemble the CSR independence pattern, so I assume the null pattern is CSR independence, which implies that our inference will be a conditional one if I use the observed values of QQ and RR. I observe Q=70Q=70 and R=60R=60 for this data set, and the empirical estimates for these sample sizes are Q=63.37Q=63.37 and R=62.17R=62.17. I present the tests statistics and the associated pp-values for NNCT-tests in Table 6. Observe that the test statistics all decrease with the QR-adjustment, however this decrease is not substantial to alter the conclusions. Based on the NNCT-tests, I find that the spatial interaction between XX and YY is not significantly different from CSR independence.

In both examples although QR-adjustment did not change the conclusions, it might make a difference if the pattern is a close call between CSR independence and segregation/association. That is, if a segregation test has a pp-value about .05, after the QR-adjustment, it might get to be significant or insignificant, depending on the case.

NNCT-test statistics and the associated
pp-values for the artificial data
CC 𝒳I2\mathcal{X}_{I}^{2} 𝒳II2\mathcal{X}_{II}^{2} 𝒳III2\mathcal{X}_{III}^{2}
3.36 3.02 3.07 3.30
(.1868) (.0825) (.2152) (.0693)
CqrC_{qr} 𝒳I,qr2\mathcal{X}_{I,qr}^{2} 𝒳II,qr2\mathcal{X}_{II,qr}^{2} 𝒳III,qr2\mathcal{X}_{III,qr}^{2}
3.32 2.97 3.04 3.25
(.1906) (.0846) (.2192) (.0713)
Table 6: Test statistics and the associated pp-values (in parentheses) for NNCT-tests for the artificial data set. The notation for the tests is as in 4.

6 Discussion and Conclusions

In this article, I discuss the effect of QR-adjustment on segregation or clustering tests based on nearest neighbor contingency tables (NNCTs). These tests include Dixon’s overall test (Dixon, (1994)), and the three new overall segregation tests introduced by (Ceyhan, (2007)). QR-adjustment is performed on these tests based on NNCTs (i.e., NNCT-tests) when the null case is the CSR of two classes of points (i.e., CSR independence), since under CSR independence, the NNCT-tests depend on number of reflexive NNs (denoted by RR) and the number of shared NNs (denoted by QQ), both of which depend on the allocation of the points. When the observed values of QQ and RR are used, the NNCT-tests are conditional tests, which might bias the results of the analysis. Given the difficulty in calculating the expected values of QQ and RR under CSR independence, I estimate them empirically based on extensive Monte Carlo simulations, and substitute these estimates for expected values of QQ and RR (which is called the QR-adjustment in this article).

I compare the empirical sizes and power estimates of the NNCT-tests with extensive Monte Carlo simulations. Based on the Monte Carlo analysis, I find that QR-adjustment does not affect the empirical sizes of the tests. Moreover, QR-adjustment does not have a substantial influence on these NNCT-tests under the segregation or association alternatives. Thus, one can use the QR-adjusted or the unadjusted versions of the NNCT-tests.

References

  • Ceyhan, (2006) Ceyhan, E. (2006). On the use of nearest neighbor contingency tables for testing spatial segregation. Accepted for publication in Environmental and Ecological Statistics.
  • Ceyhan, (2007) Ceyhan, E. (2007). New tests for spatial segregation based on nearest neighbor contingency tables. Under review.
  • Cuzick and Edwards, (1990) Cuzick, J. and Edwards, R. (1990). Spatial clustering for inhomogeneous populations (with discussion). Journal of the Royal Statistical Society, Series B, 52:73–104.
  • Diggle, (2003) Diggle, P. J. (2003). Statistical Analysis of Spatial Point Patterns. Hodder Arnold Publishers, London.
  • Dixon, (1994) Dixon, P. M. (1994). Testing spatial segregation using a nearest-neighbor contingency table. Ecology, 75(7):1940–1948.
  • Dixon, (2002) Dixon, P. M. (2002). Nearest-neighbor contingency table analysis of spatial segregation for several species. Ecoscience, 9(2):142–151.
  • Good and Whipple, (1982) Good, B. J. and Whipple, S. A. (1982). Tree spatial patterns: South Carolina bottomland and swamp forests. Bulletin of the Torrey Botanical Club, 109:529–536.
  • Goreaud and Pélissier, (2003) Goreaud, F. and Pélissier, R. (2003). Avoiding misinterpretation of biotic interactions with the intertype K12{K}_{12}-function: population independence vs. random labelling hypotheses. Journal of Vegetation Science, 14(5):681 –692.
  • Hamill and Wright, (1986) Hamill, D. M. and Wright, S. J. (1986). Testing the dispersion of juveniles relative to adults: A new analytical method. Ecology, 67(2):952–957.
  • Herler and Patzner, (2005) Herler, J. and Patzner, R. A. (2005). Spatial segregation of two common Gobius species (Teleostei: Gobiidae) in the Northern Adriatic Sea. Marine Ecology, 26(2):121–129.
  • Kulldorff, (2006) Kulldorff, M. (2006). Tests for spatial randomness adjusted for an inhomogeneity: A general framework. Journal of the American Statistical Association, 101(475):1289–1305.
  • Meagher and Burdick, (1980) Meagher, T. R. and Burdick, D. S. (1980). The use of nearest neighbor frequency analysis in studies of association. Ecology, 61(5):1253–1255.
  • Moran, (1948) Moran, P. A. P. (1948). The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B, 10:243–251.
  • Nanami et al., (1999) Nanami, S. H., Kawaguchi, H., and Yamakura, T. (1999). Dioecy-induced spatial patterns of two codominant tree species, Podocarpus nagi and Neolitsea aciculata. Journal of Ecology, 87(4):678–687.
  • Orton, (1982) Orton, C. R. (1982). Stochastic process and archeological mechanism in spatial analysis. Journal of Archeological Science, 9:1–23.
  • Pielou, (1961) Pielou, E. C. (1961). Segregation and symmetry in two-species populations as studied by nearest-neighbor relationships. Journal of Ecology, 49(2):255–269.
  • Ripley, (2004) Ripley, B. D. (2004). Spatial Statistics. Wiley-Interscience, New York.
  • Searle, (2006) Searle, S. R. (2006). Matrix Algebra Useful for Statistics. Wiley-Intersciences.
  • Waller and Gotway, (2004) Waller, L. A. and Gotway, C. A. (2004). Applied Spatial Statistics for Public Health Data. Wiley-Interscience, NJ.
  • Whipple, (1980) Whipple, S. A. (1980). Population dispersion patterns of trees in a Southern Louisiana hardwood forest. Bulletin of the Torrey Botanical Club, 107:71–76.