This tutorial provides an overview of the various libraries that contain tools for calculating spatial statistics. However, be aware that the focus of this tutorial is on finding the libraries that implement these different statistics, not on defining them or discussing their substantive interpretation. It is strongly recommended that users familiarize themselves with spatial statistical methods from another source before jumping to this tutorial.
The first thing to know about libraries for spatial statistics is that there are lots of them, and there is also a lot of redundancy across libraries. In this tutorial, I will focus on a small subset of these libraries that seem to be the most well-established and comprehensive, but if you find you are missing something, additional resources abound.
spatstat
for Spatial Point PatternsThe best library for studying the statistical properties of point distributions is probably spatstat
, which also comes with a companion textbook, website, and [a set of tutorials](click the “Vignette” links)](https://cran.r-project.org/web/packages/spatstat/). spatstat
has also been around for a long time, has been widely used, and repeatedly updated, suggesting most bugs are likely to have been found and patched (which is not always the case for R packages).
Here are a few commonly used methods:
Kest()
: Ripley’s K testKest.fft()
: Fast Ripley’s K test for large data sets.nndist()
: nearest neighbor distancesdensity.ppp()
: Kernel density plotsspatstat
is not directly compatible with the sp
library, so to use spatstat
we need to do a quick conversion:
library(rgdal)
library(spatstat)
sf <- readOGR("rgis5_data/sfpd_incident_shapefile", "sfpd_incident_2015")
OGR data source with driver: ESRI Shapefile
Source: "rgis5_data/sfpd_incident_shapefile", layer: "sfpd_incident_2015"
with 124016 features
It has 11 fields
# Syntax is: ppp(x.coordinates, y.coordinates, x.range, y.range)
ss.object <- ppp(sf@coords[, "coords.x1"], sf@coords[, "coords.x2"], sf@bbox[1,
], sf@bbox[2, ])
# Do Kest! Note there are lots of versions of kest...
plot(Kest(ss.object, correction = "good"))
spdep
for Spatial Econometricsspatstat
only works with points. If you wish to analyze spatial correlation of polygons, spdep
can be very helpful. A primary author of spdep
is Roger Bivand, who also wrote the sp
library, so unlike spatstats
, spdep
plays very well with sp
objects!
moran
: global moran’s Ilocalmoran
: local moran’s Ispatstat
If install.packages("spatstat")
does not work for you:
Visit the CRAN spatstat site
In the “downloads” section, download the r-release binaries file associated with your operating system.
Set your working directory to wherever you placed the downloaded file.
Run install.packages("spatstat_1.43-0.zip", repos=NULL)
(updating the file name to the most recent version and appropriate suffix)
Hopefully you’re set to go!
We generally creating spatial weighting matrices in two steps:
nb
object)listw
object)An nb
object just records which features are “neighbors” of one another (you either are or are not a neighbor). The function allows users to specify the A listw
is a full, normalized weighting matrix. In most cases, you start by making an nb
object then convert it to a listw
using the nb2listw
command. Details of methods not shown below – like graph distance or k nearest neighbor methods can be found here.
library(spdep)
library(rgdal)
pa <- readOGR("rgis5_data/palo_alto_demographic_shapefile", "palo_alto")
OGR data source with driver: ESRI Shapefile
Source: "rgis5_data/palo_alto_demographic_shapefile", layer: "palo_alto"
with 371 features
It has 5 fields
pa$share.hispanic <- pa$hispanc/(pa$hispanc + pa$White)
plot(pa)
# Make continuity NB (neighbors share edges)
continuity.nb <- poly2nb(pa, queen = FALSE)
# Plot neighbors
plot(continuity.nb, coordinates(pa))
# Convert to weights and summarize
continuity.listw <- nb2listw(continuity.nb)
summary(continuity.listw)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 371
Number of nonzero links: 1944
Percentage nonzero weights: 1.41237
Average number of links: 5.239892
Link number distribution:
3 4 5 6 7 8 9 10 11 16
25 105 114 66 36 10 8 3 3 1
25 least connected regions:
8 35 87 88 89 93 139 147 159 160 171 178 227 251 257 263 296 298 305 312 315 336 342 353 370 with 3 links
1 most connected region:
313 with 16 links
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 371 137641 371 146.7926 1525.55
# Make continuity NB (neighbors share at least one vertex)
continuity.plus.vertex.nb <- poly2nb(pa, queen = TRUE)
continuity.plus.vertex.listw <- nb2listw(continuity.plus.vertex.nb)
# Make all polygons farther than d1 and closer than d2 neighbors Convert
# polygons to centroids
library(rgeos)
pa.in.utm <- spTransform(pa, CRS("+init=EPSG:32610"))
pa.centroids = gCentroid(pa.in.utm, byid = TRUE)
# Neighbors within 10 km
intermediate <- dnearneigh(pa.centroids, d1 = 0, d2 = 10000)
listw.10km <- nb2listw(intermediate)
spdep
One of the most commonly examined measures of spatial dependence is the Moran’s I, which is easily executed using the weights created above:
moran.test(pa$share.hispanic, continuity.listw)
Moran's I test under randomisation
data: pa$share.hispanic
weights: continuity.listw
Moran I statistic standard deviate = 24.69, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.798888034 -0.002702703 0.001054051
moran.plot(pa$share.hispanic, continuity.listw)
One can also calculate local Moran’s I statistics for each polygon rather than just global values:
library(maptools)
result <- localmoran(pa$share.hispanic, continuity.listw)
result
Ii E.Ii Var.Ii Z.Ii Pr(z > 0)
0 8.215135e-01 -0.002702703 0.19732293 1.8554633259 3.176505e-02
1 2.516570e-01 -0.002702703 0.16398645 0.6281225207 2.649618e-01
2 4.834256e-02 -0.002702703 0.16398645 0.1260524904 4.498452e-01
3 -1.307524e-02 -0.002702703 0.24732765 -0.0208568490 5.083201e-01
4 -1.111539e-01 -0.002702703 0.16398645 -0.2678121759 6.055781e-01
5 -1.411150e-02 -0.002702703 0.24732765 -0.0229405280 5.091511e-01
6 3.190916e-01 -0.002702703 0.14017468 0.8594953266 1.950336e-01
7 -1.626224e-01 -0.002702703 0.16398645 -0.3949098641 6.535453e-01
8 6.810898e-04 -0.002702703 0.33066885 0.0058844663 4.976525e-01
9 4.769649e-01 -0.002702703 0.19732293 1.0798206295 1.401110e-01
10 5.391738e-01 -0.002702703 0.24732765 1.0895922247 1.379464e-01
11 1.316338e+00 -0.002702703 0.19732293 2.9694052230 1.491884e-03
12 1.011352e+00 -0.002702703 0.24732765 2.0390365941 2.072319e-02
13 -2.181948e-01 -0.002702703 0.16398645 -0.5321418909 7.026861e-01
14 1.824294e+00 -0.002702703 0.24732765 3.6736816706 1.195403e-04
[ reached getOption("max.print") -- omitted 356 rows ]
attr(,"call")
localmoran(x = pa$share.hispanic, listw = continuity.listw)
attr(,"class")
[1] "localmoran" "matrix"
pa <- spCbind(pa, as.data.frame(result))
spplot(pa, "Z.Ii")
You can easily run the various Lagrange tests for spatial correction with one command once you have a weighting matrix:
my.model <- lm(share.hispanic ~ PrCpInc, data = pa)
lm.LMtests(my.model, continuity.listw, test = "all")
Lagrange multiplier diagnostics for spatial dependence
data:
model: lm(formula = share.hispanic ~ PrCpInc, data = pa)
weights: continuity.listw
LMerr = 232, df = 1, p-value < 2.2e-16
Lagrange multiplier diagnostics for spatial dependence
data:
model: lm(formula = share.hispanic ~ PrCpInc, data = pa)
weights: continuity.listw
LMlag = 264.26, df = 1, p-value < 2.2e-16
Lagrange multiplier diagnostics for spatial dependence
data:
model: lm(formula = share.hispanic ~ PrCpInc, data = pa)
weights: continuity.listw
RLMerr = 35.081, df = 1, p-value = 3.163e-09
Lagrange multiplier diagnostics for spatial dependence
data:
model: lm(formula = share.hispanic ~ PrCpInc, data = pa)
weights: continuity.listw
RLMlag = 67.34, df = 1, p-value = 2.22e-16
Lagrange multiplier diagnostics for spatial dependence
data:
model: lm(formula = share.hispanic ~ PrCpInc, data = pa)
weights: continuity.listw
SARMA = 299.34, df = 2, p-value < 2.2e-16
And run the regression you decide you want almost as easily!
# Spatial Auto-Regression Linear Model
my.model.sar <- lagsarlm(share.hispanic ~ PrCpInc, data = pa, continuity.listw)
# Error Auto-Regresive
my.model.ear <- errorsarlm(share.hispanic ~ PrCpInc, data = pa, continuity.listw)
Spatial interpolation is the process of using a set of observations of some attribute at specific points and attempts to interpolate the likely values between those points. There are a number of methods for spatial interpolation, including Inverse Distance Weighting (IDW), kernel density estimators, and Kriging, and many libraries for each.
The main library for this purpose is the gstat
library, the full manual for which you can find here.
Though this author has not worked with theme, there are bayesian spatial libraries available for interested parties!
spBayes
: Bayesian spatial statistical models using MCMC samplingramps
: Bayesian spatial statistical models using RAMPS sampling
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.