Daniel Turner
GIS Workshop Summer 2004
7.31.04
The measurement of spatial autocorrelation is an
important
capability of GIS systems. Statistical
tools for the analysis of spatial autocorrelation provide an objective
basis
for evaluating the existence and significance of spatial patterns. One such measurement of spatial
autocorrelation is the joint count. This
project will develop a software add-on for ESRI’s ArcGIS 8.3 for
generating global
and local joint count statistics, as well as provide a simple analysis
of
historical
This project’s focus is the development of a software add-on for ESRI’s ArcGIS 8.3 that can generate global and local joint count statistics for the analysis of spatial autocorrelation. As a secondary objective and a “test bed” for the software, the project generated US presidential election Local Indicators of Spatial Association (LISA) data at the county level. A simple analysis of this data is provided, focusing on important aspects of local joint count interpretation.
The measurement of spatial autocorrelation, the correlation of a variable with itself through space, is an important capability of geographic information systems. It combines the key geographic notion that “everything is related to everything else, but near things are more related than distant things” (Tobler, 1970) with a means of evaluating the significance of that relation. GIS is indispensable for calculating spatial autocorrelation of large datasets, particularly analyses using LISA.
Often spatial autocorrelation is measured at the global scale, indicating only whether or not a given spatial dataset is autocorrelated. However, such analysis cannot describe where such autocorrelation occurs in the dataset. LISA statistics can describe the degree to which one areal unit is autocorrelated relative to its neighbors. Therefore, LISA can provide a much finer grained analysis, enabling the identification of patterns that global analysis cannot.
The joint count statistic is perhaps the simplest statistical measure of spatial autocorrelation. Taking areal units with a binary attribute (classically, black and white), joint count analysis counts the number of each of the possible “joints” between neighbors. The possible joints, in the case of black and white, are BB, WW, and BW/WB. Given the expected probability of each count and combination, the joint count statistic is calculated as a Z-score indicating the statistical significance of any spatial autocorrelation.
The software will include the following functionality:
2. Extract to any folder on your hard drive.
3. Double-click “install.bat” in the installation folder.
4. Start ArcGIS 8.3.
5. Go to View->Toolbars->Customize.
6. Click the “New” button to create a new toolbar.
7. Click the “Commands” tab.
8. Under the “Categories” pane, select “UTD”.
9. Under the “Commands” pane, click and drag “Joint Count” to your new toolbar.
Note that the current version of the software does not include the following functionality:
As there is currently no software tool for generating local joint count statistics integrated into ArcGIS (or any other GIS, to my knowledge), this project aims to provide such a tool for researchers.
In regard to the analysis portion of the project, there are some issues in the interpretation of joint count results. I have attempted to highlight some of these issues using the 1968-2000 US presidential election LISA data generated by the tool as an example.
2001
County-level US presidential election data for 1968-2000 was collected from http://www.uselectionatlas.org/ and provided the following items:
National-Level/Election Year
County-Level/Election Year
An additional field was added and calculated to designate the party of the winning candidate for each county. This field serves as the nominal binary field on which the joint count is calculated. Datasets for each election year were then joined by creating a key field composed of county name and state FIPS.
The software was developed in C#, using Microsoft Visual Studio .NET 2003 running on Microsoft Windows 2000. The target deployment platform is ESRI’s ArcGIS 8.3.
The joint count statistic for a free sampling case is calculated as
Z = Observed # Joins – Expected # Joins / Standard Deviation of Expected
for Z(Jbb), Z(Jww), and Z(Jwb). Expected is calculated as

where k = the total number of joins. The standard deviations are

where

Note that the equations for the non-free sampling case are different. Non-free sampling functionality is unimplemented in the tool.
The compiled 1968-2000 US presidential election shapefile data is available here. The compiled county-level LISA output is available here.
Positive spatial autocorrelation indicates the existence of clustering behavior among like phenomena, where near observations are similar (Figure 1). Negative autocorrelation indicates the existence of dispersion of like phenomena, where near observations are dissimilar. Random distribution is indicated by Z-scores which are not statistically significant.

Figure 1: Types of Spatial Autocorrelation
Calculation of joint count statistics produces three results, a Z-score indicating the magnitude of autocorrelation for each of Jbb, Jww, and Jwb. The Jbb and Jww scores are interpreted as measures of positive spatial autocorrelation for black and white joints respectively. The Jwb Z-score is interpreted as a measure of negative spatial autocorrelation. In analysis, the researcher should be mindful of all three statistics when interpreting results.
In 1992,

Figure 2

Figure 3: Positive Spatial Autocorrelation (Democrat)

Figure 4: Positive Spatial Autocorrelation (Republican)

Figure 5: Negative Spatial Autocorrelation
Deriving weights from the popular vote,
Pb = 44,909,806 / (44,909,806 + 39,104,550) = % chance Democratic = .535
Pw = 1 – Pb = % chance Republican = .465
the LISA results for the election are displayed in
Figures 3-5. The LISA Jbb results for the
Democrats
(Figure 3) are notable because not a single county indicates positive
spatial
autocorrelation at the .05 level of statistical significance (Z >=
1.96). The southern tip of
The apparent imbalance in the Z(Jbb) and Z(ww)
results is
largely explained by the weighting. High
Z(Jbb) results are flattened by the weighting; since it was more likely
a given
county was won by the Democratic candidate, a cluster of counties won
by the Democrats
is less likely to be statistically significant.
Conversely, since it was less likely a given county was won by
the
Republican candidate, the statistical significance of a Republican
cluster of
counties is exaggerated. The results for
Z(Jwb) can be used to partially reconcile the Z(Jbb) and Z(Jww)
results,
however. Referring to Figure 5, we see
statistically significant positive spatial autocorrelation (in this
case,
indicated by Z-scores <= -1.96) for regions won by
Z(Jwb) = (Actual(Jwb) – Expected(Jwb) / Standard Deviation (Jwb) =
(0 - .995) / .711 = -1.4
Given the weightings, a Democratic county must have at least 4 neighbors (all Democratic), to achieve a Z-score of -1.96 or less.[1]
When interpreting joint count results, all three Z-score results should be considered in analyzing the data. In addition, care must be used in situations where the number of neighbors becomes small enough to affect the probability of expected joint counts.
Inspecting the county-level LISA election data for differences between urban and rural voting patterns, we see a marked pattern where rural areas often exhibit strong positive autocorrelation whereas urban areas do not. Figures 6-9 show Z(Jwb) scores for some example elections where this pattern is evident.
Given a cursory inspection of the results, one might conclude that rural areas exhibit more strongly positive autocorrelated voting patterns than urban areas. However, closer inspection of the data reveals that urban counties frequently vote differently than suburban counties, which naturally creates areas of lower positive spatial autocorrelation. Indeed, it might also be misleading to assume on this basis that urban and suburban counties exhibit lower positive spatial autocorrelation per se when in fact this may be a matter of a smaller-scale pattern (differences in urban and suburban voting) superseding a larger scale one (regional trends). In any case, when patterns approach the scale of the entities being analyzed, the utility of local joint count appears to rapidly decrease. Indeed, local joint count seems best suited for identifying “regional” patterns relative to the dataset’s spatial entities.

Figure 6

Figure 7

Figure 8

Figure 9
The capability of easily generating local joint count statistics provides a useful tool for researchers performing pattern recognition and statistical analysis tasks. Local joint count is particularly useful for identifying regional hot spots and outliers. However, some care should be taken in calculating and interpreting results, as the calculations are sensitive to relatively small differences in probability/weighting and the topological characteristics of the spatial data being analyzed.
Further development is needed to make the tool truly useful to the researcher. LISA statistics need to be made immediately available to the user to facilitate a “browsing” inspection of the results. Also, enabling non-free/randomization sampling and simulated sampling distribution sampling is required to complete full functionality.
The historical presidential election LISA statistics produced by the tool, however, provide a rich resource for further analysis. Some research projects that immediately present themselves include the identification and classification of regions that vote as blocks, as well as correlation of regional spatial patterns with winning parties and losing parties.
Anselin, Luc. 1995. “Local indicators of spatial association: LISA.” Geographical Analysis, 27:93-115.
Bartels, Larry. 2000. “Partisanship and Voting Behavior, 1952-1996.” American Journal of Political Science 44 : 35-50.
Cliff, A. D. and J.K. Ord. 1973. Spatial
Autocorrelation.
Fotheringham, A.S. 1997. “Trends in quantitative methods I: stressing the local.” Progress in Human Geography 21: 58-96.
Kim, Jeongdai, Euell Elliot, and Ding-Ming Wang. 2003. “A spatial analysis of county-level outcomes in US Presidential elections: 1988-2000.” Electoral Studies 22: 741–761
Lee,
Jay and
David W.S. Wong. 2001. Statistical
Analysis with ArcView GIS
Noel,
Hans. 2003. “The road to red and blue
Ord, J.K. and A. Getis. 1995. “Local spatial autocorrelation statistics: distributional issues and an application.” Geographical Analysis 27: 286-306.
O’Sullivan
and
David J. Unwin. 2003.Geographic Information Analysis
[1]
It should be
noted that these results were calculated using free sampling (sampling
with
replacement) algorithms for the joint count.
Non-free sampling (sampling without replacement) mitigates
against such
structural effects on the results, but ignores larger trends (i.e. the
popular
vote). Simulated sampling distribution (