Table of Contents


Abstract
Introduction
Installation
Problem Statement
Data Description
Methods
Results
Conclusions and Future Research
References




Spatial Autocorrelation:

ArcGIS Joint Count Tool and Analysis

 

Daniel Turner

University of Texas at Dallas

GIS Workshop Summer 2004

7.31.04

 

 

Abstract

 

The measurement of spatial autocorrelation is an important capability of GIS systems.  Statistical tools for the analysis of spatial autocorrelation provide an objective basis for evaluating the existence and significance of spatial patterns.  One such measurement of spatial autocorrelation is the joint count.  This project will develop a software add-on for ESRI’s ArcGIS 8.3 for generating global and local joint count statistics, as well as provide a simple analysis of historical US presidential election data as an exercise in generating and using the joint count statistic in spatial autocorrelation analysis.

 


Introduction

 

This project’s focus is the development of a software add-on for ESRI’s ArcGIS 8.3 that can generate global and local joint count statistics for the analysis of spatial autocorrelation.  As a secondary objective and a “test bed” for the software, the project generated US presidential election Local Indicators of Spatial Association (LISA) data at the county level.  A simple analysis of this data is provided, focusing on important aspects of local joint count interpretation.

 

The measurement of spatial autocorrelation, the correlation of a variable with itself through space, is an important capability of geographic information systems.  It combines the key geographic notion that “everything is related to everything else, but near things are more related than distant things” (Tobler, 1970) with a means of evaluating the significance of that relation.  GIS is indispensable for calculating spatial autocorrelation of large datasets, particularly analyses using LISA.

 

Often spatial autocorrelation is measured at the global scale, indicating only whether or not a given spatial dataset is autocorrelated.  However, such analysis cannot describe where such autocorrelation occurs in the dataset.  LISA statistics can describe the degree to which one areal unit is autocorrelated relative to its neighbors.  Therefore, LISA can provide a much finer grained analysis, enabling the identification of patterns that global analysis cannot.

 

The joint count statistic is perhaps the simplest statistical measure of spatial autocorrelation.  Taking areal units with a binary attribute (classically, black and white), joint count analysis counts the number of each of the possible “joints” between neighbors.  The possible joints, in the case of black and white, are BB, WW, and BW/WB.   Given the expected probability of each count and combination, the joint count statistic is calculated as a Z-score indicating the statistical significance of any spatial autocorrelation.

 

The software will include the following functionality:

 


Installation

 

Installation Prerequisites

·       ArcGIS 8.3

·       Microsoft .NET Runtime Framework 1.1 (download)

Installation

1.      Download UTD Joint Count (download).

2.      Extract to any folder on your hard drive.

3.      Double-click “install.bat” in the installation folder.

4.      Start ArcGIS 8.3.

5.      Go to View->Toolbars->Customize.

6.      Click the “New” button to create a new toolbar.

7.      Click the “Commands” tab.

8.      Under the “Categories” pane, select “UTD”.

9.      Under the “Commands” pane, click and drag “Joint Count” to your new toolbar.

 

Note that the current version of the software does not include the following functionality:

 

 

 


Problem Statement

 

As there is currently no software tool for generating local joint count statistics integrated into ArcGIS (or any other GIS, to my knowledge), this project aims to provide such a tool for researchers.

 

In regard to the analysis portion of the project, there are some issues in the interpretation of joint count results.  I have attempted to highlight some of these issues using the 1968-2000 US presidential election LISA data generated by the tool as an example. 

 

 

 


Data Description

 

 

2001 US county boundaries were downloaded in shapefile format from the US Department of the Interior’s http://www.nationalatlas.gov/.  Metadata for the the dataset is here.  Relevant field items include the following:

 

County-level US presidential election data for 1968-2000 was collected from http://www.uselectionatlas.org/ and provided the following items:

 

National-Level/Election Year

 

 

County-Level/Election Year

 

 

An additional field was added and calculated to designate the party of the winning candidate for each county.  This field serves as the nominal binary field on which the joint count is calculated.  Datasets for each election year were then joined by creating a key field composed of county name and state FIPS.

 

 


Methods

 

The software was developed in C#, using Microsoft Visual Studio .NET 2003 running on Microsoft Windows 2000.  The target deployment platform is ESRI’s ArcGIS 8.3. 

 

The joint count statistic for a free sampling case is calculated as

 

            Z = Observed # Joins – Expected # Joins / Standard Deviation of Expected

 

for Z(Jbb), Z(Jww), and Z(Jwb).  Expected is calculated as

 

 

where k = the total number of joins.  The standard deviations are

 

where

 

 

Note that the equations for the non-free sampling case are different.  Non-free sampling functionality is unimplemented in the tool.


Results

 

The compiled 1968-2000 US presidential election shapefile data is available here.  The compiled county-level LISA output is available here.

 

Interpreting Joint Count Statistics

 

Positive spatial autocorrelation indicates the existence of clustering behavior among like phenomena, where near observations are similar (Figure 1).  Negative autocorrelation indicates the existence of dispersion of like phenomena, where near observations are dissimilar.  Random distribution is indicated by Z-scores which are not statistically significant.

 

Figure 1: Types of Spatial Autocorrelation

 

Calculation of joint count statistics produces three results, a Z-score indicating the magnitude of autocorrelation for each of Jbb, Jww, and Jwb.  The Jbb and Jww scores are interpreted as measures of positive spatial autocorrelation for black and white joints respectively.  The Jwb Z-score is interpreted as a measure of negative spatial autocorrelation.  In analysis, the researcher should be mindful of all three statistics when interpreting results.

 

In 1992, Clinton defeated Dole, receiving 44,909,806 votes to Dole’s 39,104,550.  In the county-level map showing the winner in each county (Figure 2), it is immediately apparent that there are identifiable regions that went to each of the candidates.  The southernmost tip of Texas, for example, shows a cluster of counties that were won by Clinton. 


Figure 2

 


 

Figure 3: Positive Spatial Autocorrelation (Democrat)


 

Figure 4: Positive Spatial Autocorrelation (Republican)


 

Figure 5: Negative Spatial Autocorrelation


 

 

Deriving weights from the popular vote,

 

            Pb = 44,909,806  / (44,909,806 + 39,104,550) = % chance Democratic = .535

            Pw = 1 – Pb = % chance Republican = .465

 

the LISA results for the election are displayed in Figures 3-5.  The LISA Jbb results for the Democrats (Figure 3) are notable because not a single county indicates positive spatial autocorrelation at the .05 level of statistical significance (Z >= 1.96).  The southern tip of Texas, for example, indicates positive Z-scores, but nothing statistically significant.  However, the Z(Jww) Republican results (Figure 4) exhibit large regions of very strong, statistically significant positive autocorrelation. 

 

The apparent imbalance in the Z(Jbb) and Z(ww) results is largely explained by the weighting.  High Z(Jbb) results are flattened by the weighting; since it was more likely a given county was won by the Democratic candidate, a cluster of counties won by the Democrats is less likely to be statistically significant.  Conversely, since it was less likely a given county was won by the Republican candidate, the statistical significance of a Republican cluster of counties is exaggerated.  The results for Z(Jwb) can be used to partially reconcile the Z(Jbb) and Z(Jww) results, however.  Referring to Figure 5, we see statistically significant positive spatial autocorrelation (in this case, indicated by Z-scores <= -1.96) for regions won by Clinton.  Thus, the Z(Jwb) results are useful for rounding out a complete picture of the LISA results.  However, note also that the two of the southernmost counties in Texas are still not showing statistically significant Z(Jwb) scores.  In this case, the actual structure of county polygons and their neighbors is playing a factor in the calculation.  The southernmost county only has two neighbors.  While all three counties were won by Clinton,

           

            Z(Jwb) =          (Actual(Jwb) – Expected(Jwb) / Standard Deviation (Jwb) =

                                    (0 - .995) / .711 = -1.4

 

Given the weightings, a Democratic county must have at least 4 neighbors (all Democratic), to achieve a Z-score of -1.96 or less.[1]

 

When interpreting joint count results, all three Z-score results should be considered in analyzing the data.  In addition, care must be used in situations where the number of neighbors becomes small enough to affect the probability of expected joint counts.


Analysis Issues: Urban vs. Rural

 

Inspecting the county-level LISA election data for differences between urban and rural voting patterns, we see a marked pattern where rural areas often exhibit strong positive autocorrelation whereas urban areas do not.  Figures 6-9 show Z(Jwb) scores for some example elections where this pattern is evident.

 

Given a cursory inspection of the results, one might conclude that rural areas exhibit more strongly positive autocorrelated voting patterns than urban areas.  However, closer inspection of the data reveals that urban counties frequently vote differently than suburban counties, which naturally creates areas of lower positive spatial autocorrelation.  Indeed, it might also be misleading to assume on this basis that urban and suburban counties exhibit lower positive spatial autocorrelation per se when in fact this may be a matter of a smaller-scale pattern (differences in urban and suburban voting) superseding a larger scale one (regional trends).  In any case, when patterns approach the scale of the entities being analyzed, the utility of local joint count appears to rapidly decrease.  Indeed, local joint count seems best suited for identifying “regional” patterns relative to the dataset’s spatial entities.


Figure 6

 


 

 

Figure 7


 

Figure 8


 

Figure 9


 

Conclusions and Future Research

 

The capability of easily generating local joint count statistics provides a useful tool for researchers performing pattern recognition and statistical analysis tasks.  Local joint count is particularly useful for identifying regional hot spots and outliers.  However, some care should be taken in calculating and interpreting results, as the calculations are sensitive to relatively small differences in probability/weighting and the topological characteristics of the spatial data being analyzed.

 

Further development is needed to make the tool truly useful to the researcher.  LISA statistics need to be made immediately available to the user to facilitate a “browsing” inspection of the results.  Also, enabling non-free/randomization sampling and simulated sampling distribution sampling is required to complete full functionality.

 

The historical presidential election LISA statistics produced by the tool, however, provide a rich resource for further analysis.  Some research projects that immediately present themselves include the identification and classification of regions that vote as blocks, as well as correlation of regional spatial patterns with winning parties and losing parties.


References

 

Anselin, Luc. 1995. “Local indicators of spatial association: LISA.” Geographical Analysis, 27:93-115.

 

Bartels, Larry. 2000. “Partisanship and Voting Behavior, 1952-1996.” American Journal of Political Science 44 : 35-50.

 

Cliff, A. D. and J.K. Ord. 1973. Spatial Autocorrelation. London: Pion.

 

Fotheringham, A.S. 1997. “Trends in quantitative methods I: stressing the local.” Progress in Human Geography 21: 58-96.

 

Kim, Jeongdai, Euell Elliot, and Ding-Ming Wang. 2003. “A spatial analysis of county-level outcomes in US Presidential elections: 1988-2000.” Electoral Studies 22: 741–761

 

Lee, Jay and David W.S. Wong. 2001. Statistical Analysis with ArcView GIS  New York, NY: Wiley

 

Noel, Hans. 2003. “The road to red and blue America: measuring ideology at the mass level, 1980-1996.” Paper prepared for the 2003 annual meeting of the Midwest Political Science Association, April 3-6, 2003, Chicago, Ill.

 

Ord, J.K. and A. Getis. 1995. “Local spatial autocorrelation statistics: distributional issues and an application.” Geographical Analysis 27: 286-306.

 

O’Sullivan and David J. Unwin. 2003.Geographic Information Analysis  Hoboken, NJ: Wiley

 

 



[1]  It should be noted that these results were calculated using free sampling (sampling with replacement) algorithms for the joint count.  Non-free sampling (sampling without replacement) mitigates against such structural effects on the results, but ignores larger trends (i.e. the popular vote).  Simulated sampling distribution (Monte Carlo) can be used to derive probabilities/weightings incorporating the advantages of both sampling techniques.