There are several specialty packages that will be used in this project due to the specific nature of the analyses. Some of these packages you will need to install while several others you may already have installed.
|cowplot
| dplyr
| geosphere
| ggplot2
| ggExtra
| maps
|
| maptools
| readxl
| rgdal
| rgeos
| sf
| sp
|
|spatialreg
| spdep
| stringr
| tidyr
| viridis
|
To begin, we will install the following:
packages<-c("cowplot", "dplyr", "geosphere", "ggplot2", "ggExtra", "maps", "maptools", "readxl", "rgdal", "rgeos", "sf", "sp", "spatialreg", "spdep", "stringr","tidyr", "viridis")
sapply(packages, require, character.only=T)
You can change require, in the line of script above that begins with
sapply
, to install.packages so that it readssapply(packages, install.packages, character.only=T)
and it will install all of the necessary packages. Then you can revert back to require or library to load them.
This data, collected by Dr. Brooks, examines the effect of numerous socioeconomic factors as they relate to the percentage of children under the age of 18 who are living below the poverty line in the Southern United States. The data was collected from the U.S. Census Bureau, American Community Survey, Bureau of Labor Force Statistics, U.S. Department of Agriculture Economic Research Service, and the County Health Rankings. Independent variables include: rural, urban, manufacturing, agriculture, retail, healthcare, construction, less than high school degree, unemployment, income ratio, teen birth, unmarried, single mother, uninsured, as well as Black and Hispanic race variables. The data is stored as a comma delimited file and can be found in the data folder of this project repository. To load the data we will use the following script to ensure all of the data is appropriately formatted.
se.data <- read.csv("./Data/childpov18_southfull.csv",
colClasses = c("character", "character", "character",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric"))
#rormat variables
names(se.data)[names(se.data)=="X2016.child.poverty"] <- "child.pov.2016"
se.data$FIPS <- str_pad(se.data$FIPS, 5, "left", pad = 0)
In the formatting portion, a “0” was added to each FIPS code that was only 4-digits in length creating a 5-digit code with a preceding 0.
using this primary dataset, we can create subsets for each of the regions represented include the South Atlantic, East South Central, and West South Central states.
#create subsets
satl.data <- subset(se.data, State %in% c("DE", "DC", "FL", "GA", "MD", "NC", "SC", "VA", "WV"))
esc.data <- subset(se.data, State %in% c("AL", "KY", "MS", "TN"))
wsc.data <- subset(se.data, State %in% c("AR", "LA", "OK", "TX"))
Next, we import shapefiles from an ESRI dataset to use as the basemap for the region. I attempted to use a dataset from R however, the number of counties did not match the number of counties in the *.csv file so we resorted to an external data source. These shapefiles can be found in the shapefiles folder of the project repository.
Because this shapefile had extraneous data, we removed several columns and set polygons names as the FIPS codes for each county. Additionall, we created subsets for each of the three regions.
For this analysis we will:
We will try to use straight-forward naming conventions to easily keep track of the data and specific analyses.
In order to determine if there is any underlaying spatial relationships in our data we can run a various tests, however, in order to do this we need to provide a spatial weights matrix. These spatial weights can be based on contiguity (common neighbors) or distance (k-nearest neighbors).
#Create contiguity neighbors
se.neighbors<-poly2nb(se.shape, queen=T, row.names = se.shape$FIPS)
names(se.neighbors) <- names(se.shape$FIPS)
#Create list of neighbors based on contiguity
se.neighbors.list<-nb2listw(se.neighbors, style="W", zero.policy = TRUE,
row.names(names(se.neighbors)))
#Create xy for all counties for distance weights
county.xy<-cbind(se.shape$Longitude,se.shape$Latitude)
colnames(county.xy)<-cbind("x","y")
rownames(county.xy)<-se.shape$FIPS
#Create distance centroid for the k=1 nearest neighbor
county.k1 <-knn2nb(knearneigh(county.xy, k=1, longlat = TRUE))
#Determine max k distance value for nearest neighbor
county.max.k1 <-max(unlist(nbdists(county.k1, county.xy, longlat=TRUE)))
#Calculate neighbors based on max k=1 distance
county.dist.k1 <-dnearneigh(county.xy, d1=0, d2=1 * county.max.k1, longlat = TRUE)
#Create neighbor list for each county based on k=1 distance
county.k1.neighbors <-nb2listw(county.dist.k1, style="W", zero.policy = TRUE)
With both the contiguity and distance weights created, we can now move on with the analyses. To begin with, we will determine if the model variables have any ability to explain child poverty in the southern United States by using an OLS.
To make further analyses easier, we can create an object that contains the equation. Additionally, to make reading the results easier, we will change the scientific notation to five significant digits,
From the OLS we can see not only are there a number of significant variables but the model itself is significant with a r2 value of 0.64. To test for spatial dependency in the residuals we can run a Global Moran’s I. This requires that we include the spatial weights matrix in the analysis.
#Morans test, distance based on contiguity
cont.morans <- lm.morantest(ols, se.neighbors.list)
cont.morans
#Morans test, distance based on distance
dist.morans <- lm.morantest(ols, county.k1.neighbors)
dist.morans
From the analysis above, we can conclude there is some spatial autocorrelation in the residuals when we use the distance model. However, we can also use LaGrange Multipliers to assist with model selection.
From the LM tests for contiguity we can see there is no spatial autocorrelation in the models. So next we can run LM tests for the distance matrix.
Following this analysis, we can see that there is spatial dependencies in the spatial error (6.1062266\times 10^{-15}
) and spatial lag (1.6653345\times 10^{-15}
) models. When comparing the two, the spatial lag distance model has a much smaller p-value and therefore should result in the best model for the data.
Continuing the analysis using the distance lag model, we can examine the results of the new model.
dist.lag.model <- spatialreg::lagsarlm(equation, data=se.data,
county.k1.neighbors)
dist.lag.summary <- summary(dist.lag.model, Nagelkerke = TRUE)
dist.lag.summary
Similar to the OLS, we can see that there are a number of significant variables. Additionally, the model has a pseudo r2 of 0.64546 and a p-value of 1.3578028\times 10^{-13}
. To examine the results of the Distance Lag Model, we can create a bivariate map to examine the modeled values versus the predicted values.
Because there is no built-in function for creating bivariate maps, we will need to create a number of custom object. So a brief description will precede each step below, but specific details can be found in the Spatial Regression exercise created for the Advanced Data Analytics course.
To begin making the map we will need to combine data from the original dataset with results from the model.
dist.lag.data <- cbind.data.frame(se.data$FIPS,
dist.lag.summary$fitted.values,
dist.lag.summary$residual,
se.data$child.pov.2016,
se.data$urban,
se.data$lnretail,
se.data$lnhealthss,
se.data$lnconstruction,
se.data$lnlesshs,
se.data$lnunemployment,
se.data$lnsinglemom,
se.data$lnhispanic,
se.data$lnuninsured,
se.data$lnincome_ratio,
se.data$lnunmarried,
stringsAsFactors = FALSE)
#Renaming columns
colnames(dist.lag.data) <- c("fips","fitted","resid","childpov", "urban","retail", "healthcare","construction","less_hs","unemployed", "single_mom","hispanic","uninsured","income_ratio","unmarried")
Next we need to break the variables we want to map, in this case the actual and predicted values, in to quantiles and rank each value to create a scale.
#quantiles
quantiles_fit <- dist.lag.data %>%
pull(fitted) %>%
quantile(probs = seq(0, 1, length.out = 4), na.rm = TRUE)
quantiles_pov <- dist.lag.data %>%
pull(childpov) %>%
quantile(probs = seq(0, 1, length.out = 4), na.rm = TRUE)
#ranks
fit_rank <- cut(dist.lag.data$fitted,
breaks= quantiles_fit,
labels=c("1", "2", "3"),
na.rm = TRUE,
include.lowest = TRUE)
pov_rank <- cut(dist.lag.data$childpov,
breaks= quantiles_pov,
labels=c("1", "2", "3"),
na.rm = TRUE,
include.lowest = TRUE)
#Join ranks and combined column to dataset
dist.lag.data$fit_score <- as.numeric(fit_rank)
dist.lag.data$pov_score <- as.numeric(pov_rank)
dist.lag.data$fit_pov <- paste(as.numeric(dist.lag.data$fit_score),
"-",
as.numeric(dist.lag.data$pov_score))
We will need to build a custom legend for this map that includes custom labels.
#legend
legend_colors <- tibble(
x = c(3,2,1,3,2,1,3,2,1),
y = c(3,3,3,2,2,2,1,1,1),
z = c("#574249", "#627f8c", "#64acbe", "#985356", "#ad9ea5", "#b0d5df", "#c85a5a", "#e4acac", "#e8e8e8"))
xlabel <- "Modeled,Low \u2192 High"
xlabel <- gsub(",", "\n", xlabel)
ylabel <- "Observed,Low \u2192 High"
ylabel <- gsub(",", "\n", ylabel)
legend <- ggplot(legend_colors, aes(x,y)) +
geom_tile(aes(fill=z)) +
theme_minimal() + theme(legend.position = "none") +
theme(panel.grid.major = element_blank(), panel.grid.minor =element_blank())+ theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank()) +
labs(x = xlabel, y = ylabel) +
scale_fill_identity() +
ggtitle("Legend") +
theme(axis.title.y = element_text(face = "italic", hjust = 0.5, size = 8)) +
theme(axis.title.x = element_text(face = "italic", hjust = 0.5, size = 8)) +
theme(plot.title = element_text(face="bold", hjust = 0.5, size = 10))
We will also need to join the data with the county polygons and attach the color codes for each ranked value.
#attach data to counties
county.data <- southern_counties %>%
left_join(dist.lag.data, by = c("fips" = "fips")) %>% fortify
#attach colors
bivariate_color_scale <- tibble(
"3 - 3" = "#574249",
"2 - 3" = "#627f8c",
"1 - 3" = "#64acbe",
"3 - 2" = "#985356",
"2 - 2" = "#ad9ea5",
"1 - 2" = "#b0d5df",
"3 - 1" = "#c85a5a",
"2 - 1" = "#e4acac",
"1 - 1" = "#e8e8e8") %>%
gather("group", "fill")
county.data <- county.data %>%
left_join(bivariate_color_scale, by = c("fit_pov" = "group"))
We are now ready to create the map. Because we want the map to have certain cartographic qualities we need to obtain additional base layers and create the map.
#The FIPS
fips <- county.fips
fips$fips <- str_pad(fips$fips, 5, "left", pad = 0)
#The Polygons
world <- map_data("world")
states <- map_data("state")
counties <- map_data("county")
#The Join
counties$polyname <- paste(counties$region, counties$subregion, sep = ",")
counties <- counties %>% left_join(fips, by = c("polyname" = "polyname"))
counties <- counties %>% left_join(se.data, by = c("fips" = "FIPS"))
#Subsets
southern_states <- subset(states, region %in%
c("texas", "arkansas", "louisiana", "mississippi",
"alabama", "georgia", "florida", "north carolina", "south carolina", "tennessee", "oklahoma", "kentucky", "west virginia", "virginia", "maryland", "delaware", "district of columbia"))
southern_counties <- subset(counties, region %in%
c("texas", "arkansas", "louisiana", "mississippi", "alabama", "georgia", "florida", "north carolina", "south carolina", "tennessee", "oklahoma", "kentucky", "west virginia", "virginia", "maryland", "delaware", "district of columbia"))
#fit map
fit_pov_map <- ggplot() +
geom_polygon(data = world, aes(x=long,y=lat, group=group), fill = "gray95", color = "gray30") +
geom_polygon(data = states, aes(x=long,y=lat, group=group), fill = "gray", color = "gray30") +
geom_polygon(data = county.data, aes(x=long, y=lat, group=group, fill = fill)) +
geom_polygon(data = southern_counties, aes(x=long,y=lat, group=group), fill = NA, color = "black", size = 0.05) +
geom_polygon(data = southern_states, aes(x=long,y=lat, group=group), fill = NA, color = "white") +
coord_map("conic", lat0 = 30, xlim=c(-106,-77), ylim=c(24.5,40.5)) +
scale_fill_identity() +
theme_grey() + theme(legend.position="bottom") + theme(legend.title.align=0.5) +
theme(panel.background = element_rect(fill = 'deepskyblue'),
panel.grid.major = element_line(colour = NA)) +
labs(x = "Longitude", y = "Latitude", fill = "Child Poverty",
title = "Bivariate Map of Observed vs Modeled Poverty Values") +
theme(plot.title = element_text(face = "bold", hjust = 0.5))
And finally, we can combined the final map and the legend to create the final output.
Table 1 | Southeast | ESC | SATL | WSC | |||||
---|---|---|---|---|---|---|---|---|---|
Neighbors | Model | P Value | r2 | P Value | r2 | P Value | r2 | P Value | r2 |
OLS | - | 2.20E-16 | 0.6397 | 2.20E-16 | 0.6477 | 2.20E-16 | 0.6789 | 2.20E-16 | 0.5708 |
Contiguity | Lag | - | - | - | - | - | - | - | - |
Contiguity | Err | - | - | - | - | - | - | - | - |
Contiguity | Pre-Lag | - | - | - | - | - | - | - | - |
Contiguity | Pre-Err | 0.02186 | 0.55169 | - | - | 0.02580 | 0.74402 | - | - |
K = 1 | Lag | 0.00000 | 0.65721 | - | - | 0.00012 | 0.69546 | 0.00002 | 0.60085 |
K = 1 | Err | 0.00000 | 0.65611 | - | - | 0.00256 | 0.69249 | 0.00001 | 0.60206 |
K = 1 | Pre-Lag | 0.00000 | 0.55732 | - | - | 0.00060 | 0.74697 | 0.01166 | 0.46203 |
K = 1 | Pre-Err | 0.00000 | 0.58516 | - | - | 0.00064 | 0.74692 | 0.00000 | 0.48571 |
K = 2 | Lag | 0.00000 | 0.65269 | 0.02404 | 0.66789 | 0.00045 | 0.69416 | 0.00076 | 0.59529 |
K = 2 | Err | 0.00000 | 0.65232 | - | - | 0.00611 | 0.69167 | 0.00298 | 0.59312 |
K = 2 | Pre-Lag | 0.00005 | 0.55521 | - | - | 0.00141 | 0.74629 | 0.00776 | 0.46286 |
K = 2 | Pre-Err | 0.00000 | 0.58475 | 0.01674 | 0.65211 | 0.00002 | 0.74967 | 0.00000 | 0.48681 |
K = 3 | Lag | 0.00000 | 0.65024 | 0.00062 | 0.67389 | 0.00135 | 0.69311 | 0.00414 | 0.59259 |
K = 3 | Err | 0.00000 | 0.65046 | - | - | - | - | 0.01582 | 0.59051 |
K = 3 | Pre-Lag | 0.00160 | 0.55317 | - | - | 0.01561 | 0.74441 | 0.00984 | 0.46237 |
K = 3 | Pre-Err | 0.00000 | 0.58085 | 0.00310 | 0.65499 | 0.00212 | 0.74596 | 0.00000 | 0.48368 |
K = 4 | Lag | 0.00000 | 0.64993 | 0.00060 | 0.67393 | 0.00681 | 0.69156 | 0.00335 | 0.59293 |
K = 4 | Err | 0.00000 | 0.65036 | - | - | - | - | 0.00686 | 0.59181 |
K = 4 | Pre-Lag | 0.00323 | 0.55277 | - | - | - | - | 0.01272 | 0.46185 |
K = 4 | Pre-Err | 0.00000 | 0.57933 | 0.00271 | 0.65523 | 0.00372 | 0.74552 | 0.00000 | 0.48213 |
K = 5 | Lag | 0.00000 | 0.64978 | 0.00060 | 0.67395 | 0.01313 | 0.69095 | 0.00341 | 0.59290 |
K = 5 | Err | 0.00000 | 0.64983 | 0.02000 | 0.66826 | - | - | 0.00754 | 0.59166 |
K = 5 | Pre-Lag | 0.00284 | 0.55284 | - | - | - | - | 0.00752 | 0.46292 |
K = 5 | Pre-Err | 0.00000 | 0.57890 | 0.00281 | 0.65516 | 0.00544 | 0.74522 | 0.00000 | 0.48345 |
See emailed excel sheet for model comparisons
In a non-spatial OLS model, \(y = X \beta + \varepsilon\), we examine the relationship between one or more independent variables and a dependent variable. With a spatially lagged, or SLX model, we see an addition of a spatial component to the equation that accounts for the average value of neighboring X values.
\(y = X \beta + WX\theta + \varepsilon\), where \(WX\theta\) is the average value of our neighbors independent variable(s)
So the SLX equation examines whether variables that impact our neighbors also impact us. This is consdiered a local model as there is only one-way interaction between our neighbors and us.
In the Spatial Lag model, we lose the lagged X values that were used in the SLX model and instead add a spatially lagged y value.
\(y = \rho W y + X \beta + \varepsilon\), where \(\rho W y\) is the spatially lagged y value of our neighbors
In this global model the dependent variable among our neighbors influences our dependent variable. Therefore there is a feedback loop that occurs where affects on our neighbor(s) y affects our y and our neighbor(s) y variable.
The Spatial Error model does not include lagged dependent or independent variables, but instead includes a function of our unexplained error and that of our neighbors.
\(y = X \beta + u\), \(u = \lambda W u + \varepsilon\), where \(\lambda W u + \varepsilon\) is the function of our unexplained error (residuals) and our neighbors residual values
This is a global model where the higher than expected residual values suggest a missing explanatory variable that is spatially correlated. This would lead to unexplained clusters of spatially correlated values within our neighborhood that were not included as a predictor variable(s).
The Spatial Durbin model includes both lagged y and lagged x values. This model can be simplified into global SLX, Spatial Lag, Spatial Error models, or OLS.
\(y = \rho Wy + X\beta + WX\theta + \varepsilon\)
Because of the inclusion of the lagged y, this model would be used if the relationships in your data are global, meaning that if the y variable is impacted in one region that impact will spill over to every region in the dataset.
This nested model does not contain a lagged y and can be simplified into a local Spatial Error, SLX, or OLS model.
\(y = X\beta + WX\theta + u\), \(u = \lambda Wu + \varepsilon\)
This nested model would be used if the relationships in your data are local, meaning changes in one region will only impact the immediate neighbors of the region and not all of the regions in the dataset.
summary(spatialreg::lagsarlm(formula = equation, data = se.data, listw = county.k1.neighbors, type = "mixed"))
Call:
spatialreg::lagsarlm(formula = equation, data = se.data, listw = county.k1.neighbors,
type = "mixed")
Residuals:
Min 1Q Median 3Q Max
-28.74491 -3.43236 -0.14565 3.48841 23.22000
Type: mixed
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -55.878981 11.720483 -4.7676 1.864e-06
rural 0.564021 0.516014 1.0930 0.2743786
urban -1.082678 0.426305 -2.5397 0.0110955
lnmanufacturing -0.100317 0.327628 -0.3062 0.7594579
lnag -0.115679 0.240896 -0.4802 0.6310831
lnretail -2.489002 0.649935 -3.8296 0.0001283
lnhealthss 1.169469 0.992463 1.1783 0.2386572
lnconstruction -1.731440 0.483725 -3.5794 0.0003444
lnlesshs 8.623386 0.919595 9.3774 < 2.2e-16
lnunemployment 8.676517 0.928129 9.3484 < 2.2e-16
lnsinglemom 3.840103 0.509206 7.5414 4.641e-14
lnblack 0.096474 0.185720 0.5195 0.6034406
lnhispanic -0.027515 0.216102 -0.1273 0.8986841
lnuninsured 7.337132 1.628030 4.5068 6.583e-06
lnincome_ratio 7.697120 1.435136 5.3633 8.170e-08
lnteenbirth 0.645879 0.448710 1.4394 0.1500330
lnunmarried 0.766344 0.209124 3.6645 0.0002478
lag.rural 4.021928 1.851745 2.1720 0.0298582
lag.urban 0.869751 1.403582 0.6197 0.5354781
lag.lnmanufacturing -0.293037 0.568966 -0.5150 0.6065290
lag.lnag 0.542286 0.522820 1.0372 0.2996274
lag.lnretail 4.059779 1.897014 2.1401 0.0323476
lag.lnhealthss 2.141309 2.541929 0.8424 0.3995667
lag.lnconstruction -3.151835 1.285967 -2.4509 0.0142481
lag.lnlesshs -5.750119 1.750921 -3.2841 0.0010233
lag.lnunemployment 0.133170 1.924122 0.0692 0.9448218
lag.lnsinglemom 0.101467 1.530051 0.0663 0.9471263
lag.lnblack -0.259104 0.276019 -0.9387 0.3478769
lag.lnhispanic -0.519524 0.404193 -1.2853 0.1986742
lag.lnuninsured -1.696353 2.608677 -0.6503 0.5155158
lag.lnincome_ratio -0.056359 4.332773 -0.0130 0.9896216
lag.lnteenbirth -0.567029 1.356737 -0.4179 0.6759939
lag.lnunmarried 1.171540 0.716882 1.6342 0.1022137
Rho: 0.31035, LR test value: 25.934, p-value: 3.5332e-07
Asymptotic standard error: 0.059539
z-value: 5.2125, p-value: 1.8626e-07
Wald statistic: 27.171, p-value: 1.8626e-07
Log likelihood: -4559.667 for mixed model
ML residual variance (sigma squared): 35.454, (sigma: 5.9544)
Number of observations: 1422
Number of parameters estimated: 35
AIC: 9189.3, (AIC for lm: 9213.3)
LM test for residual autocorrelation
test value: 1.4054, p-value: 0.23582
#sdm.impacts <- summary(spatialreg::impacts(sdm, listw = county.k1.neighbors, R = 100), zstats = TRUE) #[["pzmat"]]
sdm.impacts
Impact measures (mixed, exact):
Direct Indirect Total
rural 0.66395942 5.9857165 6.6496759
urban -1.06996878 0.7612221 -0.3087466
lnmanufacturing -0.10803647 -0.4623317 -0.5703682
lnag -0.10362077 0.7222055 0.6185847
lnretail -2.41072386 4.6883675 2.2776437
lnhealthss 1.22910043 3.5715641 4.8006646
lnconstruction -1.81928674 -5.2615143 -7.0808010
lnlesshs 8.55019174 -4.3839238 4.1662679
lnunemployment 8.74380845 4.0303333 12.7741418
lnsinglemom 3.87089757 1.8444208 5.7153183
lnblack 0.09101720 -0.3268317 -0.2358145
lnhispanic -0.04008918 -0.7531235 -0.7932127
lnuninsured 7.35096071 0.8282308 8.1791915
lnincome_ratio 7.75266027 3.3265250 11.0791853
lnteenbirth 0.63715009 -0.5228172 0.1143329
lnunmarried 0.79990403 2.0100489 2.8099529
========================================================
Simulation results (asymptotic variance matrix):
Direct:
Iterations = 1:100
Thinning interval = 1
Number of chains = 1
Sample size per chain = 100
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
rural 0.64948 0.4639 0.04639 0.04639
urban -0.96265 0.4102 0.04102 0.04102
lnmanufacturing -0.09933 0.3451 0.03451 0.02922
lnag -0.09812 0.2393 0.02393 0.02886
lnretail -2.47031 0.6360 0.06360 0.06360
lnhealthss 1.23974 0.8486 0.08486 0.08486
lnconstruction -1.82306 0.4442 0.04442 0.04442
lnlesshs 8.76830 0.8725 0.08725 0.07210
lnunemployment 8.75233 0.8030 0.08030 0.08030
lnsinglemom 3.87989 0.4671 0.04671 0.04671
lnblack 0.09768 0.1854 0.01854 0.01854
lnhispanic -0.06661 0.2210 0.02210 0.02210
lnuninsured 7.33728 1.4990 0.14990 0.14990
lnincome_ratio 7.72760 1.5062 0.15062 0.15062
lnteenbirth 0.68319 0.4435 0.04435 0.04435
lnunmarried 0.79675 0.2394 0.02394 0.02394
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
rural -0.1367 0.33139 0.66176 0.97535 1.4530
urban -1.7054 -1.19170 -0.97097 -0.68978 -0.1700
lnmanufacturing -0.6416 -0.30142 -0.10067 0.09197 0.5960
lnag -0.6014 -0.23746 -0.09778 0.04897 0.3992
lnretail -3.7728 -2.82351 -2.50686 -2.15117 -1.2376
lnhealthss -0.1764 0.58742 1.25282 1.82629 2.8271
lnconstruction -2.8331 -2.09735 -1.77510 -1.51917 -1.0677
lnlesshs 7.2955 8.12833 8.75733 9.39947 10.4428
lnunemployment 7.2797 8.23371 8.65382 9.25318 10.5617
lnsinglemom 2.9851 3.56602 3.94105 4.17810 4.8141
lnblack -0.1971 -0.02581 0.07340 0.24773 0.3832
lnhispanic -0.4193 -0.22976 -0.06691 0.06754 0.3800
lnuninsured 4.6677 6.19304 7.19712 8.35393 10.1037
lnincome_ratio 5.0635 6.56363 7.72029 8.74155 10.4752
lnteenbirth -0.1944 0.38922 0.68028 0.98794 1.5683
lnunmarried 0.3517 0.60394 0.77808 0.99755 1.2089
========================================================
Indirect:
Iterations = 1:100
Thinning interval = 1
Number of chains = 1
Sample size per chain = 100
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
rural 6.2068 2.6319 0.26319 0.26319
urban 0.8052 2.1022 0.21022 0.21022
lnmanufacturing -0.4713 0.8411 0.08411 0.09976
lnag 0.6880 0.7682 0.07682 0.07682
lnretail 4.5704 2.6442 0.26442 0.26442
lnhealthss 3.3521 3.1501 0.31501 0.31501
lnconstruction -5.3530 1.8997 0.18997 0.18997
lnlesshs -4.5057 2.4642 0.24642 0.19280
lnunemployment 3.9855 2.4741 0.24741 0.24741
lnsinglemom 2.0395 2.4055 0.24055 0.24055
lnblack -0.3831 0.3784 0.03784 0.04599
lnhispanic -0.7526 0.5665 0.05665 0.05665
lnuninsured 1.0683 3.0348 0.30348 0.36909
lnincome_ratio 3.2800 5.1012 0.51012 0.37257
lnteenbirth -0.8211 2.0750 0.20750 0.20750
lnunmarried 2.0518 0.9560 0.09560 0.09560
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
rural 1.3514 4.38619 6.2516 7.8218 10.73756
urban -3.5461 -0.36427 0.8559 2.1642 4.86746
lnmanufacturing -2.1445 -0.95913 -0.4645 0.2162 0.86045
lnag -0.7919 0.17087 0.6859 1.2042 2.11141
lnretail -0.6355 2.92212 4.4050 6.0483 9.95712
lnhealthss -3.2086 1.26114 3.5319 5.5064 9.04416
lnconstruction -9.3502 -6.55732 -5.2817 -3.9180 -2.09871
lnlesshs -9.1123 -6.55142 -4.6321 -2.9797 -0.06422
lnunemployment -0.7818 2.26412 4.0210 5.8546 8.47900
lnsinglemom -2.7195 0.43541 2.0316 3.8664 6.32021
lnblack -1.0368 -0.63271 -0.4069 -0.1729 0.41276
lnhispanic -1.7076 -1.14628 -0.7730 -0.3474 0.31501
lnuninsured -4.0538 -1.27874 1.1353 3.3509 6.35988
lnincome_ratio -6.7134 -0.06615 3.7615 6.4310 13.59521
lnteenbirth -4.3921 -2.33611 -0.8759 0.3762 3.80865
lnunmarried 0.2282 1.42876 2.1458 2.6493 3.87333
========================================================
Total:
Iterations = 1:100
Thinning interval = 1
Number of chains = 1
Sample size per chain = 100
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
rural 6.8563 2.7370 0.27370 0.27370
urban -0.1575 2.1538 0.21538 0.21538
lnmanufacturing -0.5706 0.7843 0.07843 0.09812
lnag 0.5899 0.7589 0.07589 0.07589
lnretail 2.1001 2.7981 0.27981 0.27981
lnhealthss 4.5918 3.2046 0.32046 0.32046
lnconstruction -7.1760 1.9544 0.19544 0.19544
lnlesshs 4.2625 2.3129 0.23129 0.23129
lnunemployment 12.7378 2.3551 0.23551 0.23551
lnsinglemom 5.9193 2.4657 0.24657 0.24657
lnblack -0.2854 0.3451 0.03451 0.04228
lnhispanic -0.8192 0.5035 0.05035 0.05035
lnuninsured 8.4055 2.6749 0.26749 0.34881
lnincome_ratio 11.0076 5.3149 0.53149 0.45919
lnteenbirth -0.1379 2.0980 0.20980 0.20980
lnunmarried 2.8485 1.0450 0.10450 0.10450
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
rural 1.9073 5.02062 6.8467 8.63773 12.06777
urban -4.5082 -1.44904 -0.1430 1.12172 4.01799
lnmanufacturing -2.1277 -1.12884 -0.5284 -0.01554 0.84870
lnag -0.7488 0.08901 0.5647 1.02317 2.09109
lnretail -3.7750 0.04119 2.2318 3.74644 7.66698
lnhealthss -2.1133 2.37093 4.6930 6.54375 10.71868
lnconstruction -10.9513 -8.42534 -7.0871 -5.48226 -3.73231
lnlesshs 0.5423 2.68958 4.2949 5.92573 8.44652
lnunemployment 8.0286 11.15464 12.8600 14.19289 17.02914
lnsinglemom 1.2408 4.16706 5.8662 7.77259 10.76220
lnblack -0.9196 -0.50601 -0.3034 -0.09270 0.45603
lnhispanic -1.6544 -1.19926 -0.7956 -0.50287 0.09474
lnuninsured 3.7940 6.39472 8.4964 10.37160 13.12195
lnincome_ratio 1.5459 7.31830 10.7307 14.45875 21.58740
lnteenbirth -3.4692 -1.67710 -0.1339 1.12701 4.47357
lnunmarried 0.8457 2.27900 3.0022 3.43096 4.87529
========================================================
Simulated standard errors
Direct Indirect Total
rural 0.4639472 2.6318923 2.7369508
urban 0.4101952 2.1021590 2.1538010
lnmanufacturing 0.3451226 0.8411384 0.7842604
lnag 0.2393493 0.7681785 0.7588977
lnretail 0.6359532 2.6441703 2.7981424
lnhealthss 0.8485706 3.1500662 3.2045987
lnconstruction 0.4442460 1.8997017 1.9544416
lnlesshs 0.8725012 2.4642399 2.3129049
lnunemployment 0.8029696 2.4740796 2.3550605
lnsinglemom 0.4670766 2.4055088 2.4656517
lnblack 0.1853917 0.3784499 0.3451149
lnhispanic 0.2209591 0.5665478 0.5035353
lnuninsured 1.4989680 3.0348381 2.6749369
lnincome_ratio 1.5062296 5.1012173 5.3148949
lnteenbirth 0.4435434 2.0749809 2.0979826
lnunmarried 0.2393913 0.9560119 1.0449632
Simulated z-values:
Direct Indirect Total
rural 1.3999011 2.3583126 2.50508884
urban -2.3468037 0.3830251 -0.07311168
lnmanufacturing -0.2878232 -0.5602570 -0.72754909
lnag -0.4099325 0.8955880 0.77725154
lnretail -3.8844141 1.7284783 0.75052841
lnhealthss 1.4609791 1.0641380 1.43289366
lnconstruction -4.1037167 -2.8177923 -3.67164950
lnlesshs 10.0496084 -1.8284537 1.84294100
lnunemployment 10.8999472 1.6109034 5.40870601
lnsinglemom 8.3067452 0.8478292 2.40072294
lnblack 0.5268966 -1.0122571 -0.82698937
lnhispanic -0.3014455 -1.3284586 -1.62698112
lnuninsured 4.8948906 0.3519972 3.14233177
lnincome_ratio 5.1304238 0.6429759 2.07107683
lnteenbirth 1.5403048 -0.3957177 -0.06573683
lnunmarried 3.3282171 2.1461967 2.72596749
Simulated p-values:
Direct Indirect Total
rural 0.16154294 0.0183582 0.01224206
urban 0.01893523 0.7017011 0.94171725
lnmanufacturing 0.77348211 0.5753041 0.46688965
lnag 0.68185550 0.3704729 0.43701038
lnretail 0.00010258 0.0839025 0.45293652
lnhealthss 0.14402118 0.2872662 0.15188823
lnconstruction 4.0657e-05 0.0048355 0.00024099
lnlesshs < 2.22e-16 0.0674815 0.06533763
lnunemployment < 2.22e-16 0.1072008 6.3482e-08
lnsinglemom < 2.22e-16 0.3965331 0.01636272
lnblack 0.59826541 0.3114151 0.40824309
lnhispanic 0.76307477 0.1840266 0.10374111
lnuninsured 9.8360e-07 0.7248403 0.00167608
lnincome_ratio 2.8909e-07 0.5202398 0.03835162
lnteenbirth 0.12348606 0.6923133 0.94758735
lnunmarried 0.00087404 0.0318573 0.00641133
Likelihood ratio for spatial linear models
data:
Likelihood ratio = 53.033, df = 16, p-value = 7.454e-06
sample estimates:
Log likelihood of sdm Log likelihood of dist.lag.model
-4559.667 -4586.183
Likelihood ratio for spatial linear models
data:
Likelihood ratio = 57.605, df = 16, p-value = 1.318e-06
sample estimates:
Log likelihood of sdm Log likelihood of dist.err.model
-4559.667 -4588.470
Likelihood ratio for spatial linear models
data:
Likelihood ratio = 25.934, df = 1, p-value = 3.533e-07
sample estimates:
Log likelihood of sdm Log likelihood of se.SLX.model
-4559.667 -4572.634
mc.morans <- spdep::moran.mc(se.data$child.pov.2016, county.k1.neighbors, nsim = 99)
mc.morans
lc.morans <- spdep::localmoran(se.data$child.pov.2016, county.k1.neighbors)
lc.morans
pov.rate <- scale(se.data$child.pov.2016) %>% as.vector()
pov.lag.rate <- lag.listw(county.k1.neighbors, pov.rate)
lisa.data <- se.data %>%
mutate(quad_sig = case_when(
pov.rate >= 0 & pov.lag.rate >= 0 & lc.morans[, 5] <= 0.05 ~ "high-high",
pov.rate <= 0 & pov.lag.rate <= 0 & lc.morans[, 5] <= 0.05 ~ "low-low",
pov.rate >= 0 & pov.lag.rate <= 0 & lc.morans[, 5] <= 0.05 ~ "high-low",
pov.rate <= 0 & pov.lag.rate >= 0 & lc.morans[, 5] <= 0.05 ~ "low-high",
lc.morans[, 5] > 0.05 ~ "not significant"
))
lisa <- lisa.data %>% fortify()
sdem <- spatialreg::errorsarlm(equation, se.data, county.k1.neighbors, etype = "emixed")
summary(sdem, Nagelkerke = TRUE)
Call:
spatialreg::errorsarlm(formula = equation, data = se.data, listw = county.k1.neighbors,
etype = "emixed")
Residuals:
Min 1Q Median 3Q Max
-28.77276 -3.39742 -0.12792 3.41260 22.20252
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -75.152030 13.829903 -5.4340 5.510e-08
rural 0.600422 0.519434 1.1559 0.2477161
urban -1.081972 0.425791 -2.5411 0.0110508
lnmanufacturing -0.070711 0.318617 -0.2219 0.8243680
lnag -0.089227 0.237843 -0.3752 0.7075475
lnretail -2.469082 0.654885 -3.7703 0.0001631
lnhealthss 1.147697 0.978141 1.1733 0.2406571
lnconstruction -1.795104 0.480331 -3.7372 0.0001861
lnlesshs 8.387964 0.900908 9.3106 < 2.2e-16
lnunemployment 8.746674 0.909156 9.6206 < 2.2e-16
lnsinglemom 3.899474 0.511380 7.6254 2.442e-14
lnblack 0.103187 0.181294 0.5692 0.5692397
lnhispanic -0.041356 0.212811 -0.1943 0.8459176
lnuninsured 7.549645 1.600011 4.7185 2.376e-06
lnincome_ratio 7.696772 1.430562 5.3802 7.438e-08
lnteenbirth 0.653337 0.446726 1.4625 0.1436042
lnunmarried 0.806709 0.211590 3.8126 0.0001375
lag.rural 4.082169 2.178894 1.8735 0.0609987
lag.urban 0.144238 1.642232 0.0878 0.9300112
lag.lnmanufacturing -0.601746 0.649205 -0.9269 0.3539806
lag.lnag 0.667111 0.611314 1.0913 0.2751527
lag.lnretail 3.711082 2.245880 1.6524 0.0984539
lag.lnhealthss 3.035336 3.053979 0.9939 0.3202736
lag.lnconstruction -4.200404 1.503273 -2.7942 0.0052033
lag.lnlesshs -3.221120 2.020254 -1.5944 0.1108434
lag.lnunemployment 4.109106 2.098553 1.9581 0.0502222
lag.lnsinglemom 1.183827 1.723808 0.6868 0.4922394
lag.lnblack -0.200189 0.313387 -0.6388 0.5229595
lag.lnhispanic -0.936762 0.487250 -1.9225 0.0545367
lag.lnuninsured 0.778286 2.970599 0.2620 0.7933244
lag.lnincome_ratio 0.870534 5.077380 0.1715 0.8638673
lag.lnteenbirth -0.464391 1.642562 -0.2827 0.7773890
lag.lnunmarried 1.665345 0.823122 2.0232 0.0430520
Lambda: 0.3211, LR test value: 24.642, p-value: 6.9033e-07
Asymptotic standard error: 0.05995
z-value: 5.3562, p-value: 8.5e-08
Wald statistic: 28.689, p-value: 8.5e-08
Log likelihood: -4560.313 for error model
ML residual variance (sigma squared): 35.468, (sigma: 5.9555)
Nagelkerke pseudo-R-squared: 0.66946
Number of observations: 1422
Number of parameters estimated: 35
AIC: 9190.6, (AIC for lm: 9213.3)
sdem.impacts <- summary(spatialreg::impacts(sdem, listw = county.k1.neighbors, R = 100), zstats = TRUE)#[["pzmat"]]
sdem.impacts
Impact measures (SDEM, estimable, n):
Direct Indirect Total
rural 0.60042169 4.0821686 4.68259029
urban -1.08197207 0.1442384 -0.93773365
lnmanufacturing -0.07071083 -0.6017458 -0.67245667
lnag -0.08922736 0.6671110 0.57788367
lnretail -2.46908215 3.7110823 1.24200015
lnhealthss 1.14769742 3.0353363 4.18303377
lnconstruction -1.79510377 -4.2004041 -5.99550786
lnlesshs 8.38796378 -3.2211199 5.16684390
lnunemployment 8.74667420 4.1091062 12.85578043
lnsinglemom 3.89947440 1.1838271 5.08330155
lnblack 0.10318714 -0.2001887 -0.09700154
lnhispanic -0.04135553 -0.9367622 -0.97811776
lnuninsured 7.54964542 0.7782856 8.32793097
lnincome_ratio 7.69677202 0.8705342 8.56730619
lnteenbirth 0.65333715 -0.4643905 0.18894662
lnunmarried 0.80670926 1.6653454 2.47205469
========================================================
Standard errors:
Direct Indirect Total
rural 0.5194343 2.1788941 2.2995636
urban 0.4257907 1.6422316 1.7052247
lnmanufacturing 0.3186170 0.6492055 0.6078157
lnag 0.2378433 0.6113144 0.5934771
lnretail 0.6548849 2.2458797 2.4013358
lnhealthss 0.9781407 3.0539787 3.0653405
lnconstruction 0.4803308 1.5032725 1.5587412
lnlesshs 0.9009078 2.0202537 1.8805548
lnunemployment 0.9091565 2.0985528 1.9943592
lnsinglemom 0.5113797 1.7238076 1.8342297
lnblack 0.1812935 0.3133873 0.2805792
lnhispanic 0.2128110 0.4872501 0.4611767
lnuninsured 1.6000106 2.9705990 2.5917085
lnincome_ratio 1.4305617 5.0773802 5.2529753
lnteenbirth 0.4467262 1.6425625 1.6972629
lnunmarried 0.2115904 0.8231223 0.8856625
========================================================
Z-values:
Direct Indirect Total
rural 1.1559146 1.87350486 2.0362952
urban -2.5410890 0.08783074 -0.5499179
lnmanufacturing -0.2219305 -0.92689585 -1.1063496
lnag -0.3751519 1.09127323 0.9737253
lnretail -3.7702535 1.65239588 0.5172122
lnhealthss 1.1733459 0.99389570 1.3646229
lnconstruction -3.7372237 -2.79417340 -3.8463781
lnlesshs 9.3105682 -1.59441358 2.7475104
lnunemployment 9.6206479 1.95806666 6.4460708
lnsinglemom 7.6253991 0.68675134 2.7713550
lnblack 0.5691716 -0.63879010 -0.3457189
lnhispanic -0.1943298 -1.92254894 -2.1209174
lnuninsured 4.7184970 0.26199617 3.2132977
lnincome_ratio 5.3802448 0.17145341 1.6309435
lnteenbirth 1.4625003 -0.28272321 0.1113243
lnunmarried 3.8125980 2.02320529 2.7911926
p-values:
Direct Indirect Total
rural 0.24771609 0.0609987 0.04172072
urban 0.01105078 0.9300112 0.58237567
lnmanufacturing 0.82436797 0.3539806 0.26857524
lnag 0.70754749 0.2751527 0.33019296
lnretail 0.00016308 0.0984539 0.60500805
lnhealthss 0.24065712 0.3202736 0.17237161
lnconstruction 0.00018606 0.0052033 0.00011988
lnlesshs < 2.22e-16 0.1108434 0.00600496
lnunemployment < 2.22e-16 0.0502222 1.1479e-10
lnsinglemom 2.4425e-14 0.4922394 0.00558235
lnblack 0.56923966 0.5229595 0.72955401
lnhispanic 0.84591763 0.0545367 0.03392876
lnuninsured 2.3759e-06 0.7933244 0.00131220
lnincome_ratio 7.4385e-08 0.8638673 0.10290223
lnteenbirth 0.14360417 0.7773890 0.91135918
lnunmarried 0.00013751 0.0430520 0.00525142