To be fair, cartography and spatial analysis is not one of the easiest things to do in R. While nearly every statistical analysis we want to do has several packages that can accomplish the same task, rarely do they rely on drastically different types of data. This is why a lot of spatial analysis is still performed in programs such as ArcGIS or QGIS. However, moving from one program to another often requires reformatting the data which can also be cumbersome. So maintaining the data within a single program to manage, format, analyze, and visualize helps to create a usable, repeatable workflow. Additionally, being able to view the code allows collaborators to see each decision made throughout the analysis. That’s why data analysis and visualization in R or Python is becoming standard practice in many disciplines.
When using R for spatial analysis there are a number
of graphics packages that can be used for everything from basic mapping
to interactive maps. A few of these packages you may already be familiar
with such as plot from base R,
ggplot2, plotly, or leaflet. Many
of these packages have dependencies that help us manage, organize, and
format spatial data. Packages such as sf “Simple Features
for R”, terra “Methods for spatial data analysis with
vector and raster data”, and raster “Geographic Data
Analysis and Modeling” provide access to spatial data and spatial data
formats for a number of other packages. In order to reduce the overall
complexity of this exercise, we will not discuss all of the data formats
or packages that can be used to analyze spatial data here. However, I
will provide links to useful online resources in the exercise README
document.
For this exercise you will use a few common packages for working with spatial data. Once again, many of these packages will require dependencies so installing them may take some time.
| ggspatial | leaflet | OpenStreetMap | raster | sf | terra | terrainr | tidyterra | tidyverse |
As discussed in a previous class, there are various scripts and
packages that can be used for package management. For this I use
pacman which is a “management tool that combines the
functionality of base library related functions into intuitively named
functions.” While my use of the package is limited the following script
is useful when providing a script to colleagues or making it available
on GitHub:
#install.packages('pacman')
pacman::p_load("ggspatial","leaflet","OpenStreetMap","raster","sf","terra","terrainr","tidyterra","tidyverse")
The p_load function in pacman is a wrapper
for the library and require functions from
base R. It will check to see if a package is installed and, if not,
attempt to install it from CRAN. You can view the other functions
available in pacman here.
These packages will provide the minimum functionality required to complete this exercise. As you experiment with various styles of mapping you may require additional packages to be installed at a later time.
Note in 2023: A number of packages used in my typical spatial work flow including
maptools,rgdal,rgeos, andspwere archived in October 2023. You can read the report here. This change impacts several packages such asggsn(used to add some essential map elements),adehabitat(for homerange estimation), andGapAnalysis(to calculate conservation indicators) which have been used for spatial analysis of ecological data. Some packages loaded outside of the script above with the baseinstall.packages()function while others would not. A few of these packages can still be loaded from the archive but each update of R will limit their functionality until they no longer operate. Here is the basic syntax to load archived packages:
require(devtools)
install_version("PACKAGE NAME", version = "PACKAGE VERSION", repos = "http://cran.us.r-project.org")
You can find archived packages at https://cran.r-project.org/src/contrib/Archive. Once you find the correct package, click on the folder to view the different published versions. Using the script above, add the package name and the version to the script and see if it will install. If you receive an error, occassionally there is an archived dependency that also needs loaded which makes this a bandaid and not a solution. Eventually these exercises will be update with the newest functions and packages.
There are a number of very simple maps you can create that use data
included in the packages that were downloaded. For this first example
you are going to create a map that depicts the location of Austin
Peay State University. To begin we need to load some data from
ggplot2 using the map_data command.
state <- map_data("state")
county <- map_data("county")
apsu_point <- data.frame("x" = -87.353069, "y" = 36.533654)
This quick script loaded all of the state and county information for US into the appropriate objects and created a point centered on the APSU campus. However, you can create a better map by filtering the data to focus on Tennessee and Montgomery County.
tn <- county %>%
filter(region=="tennessee")
montco <- county %>%
filter(region=="tennessee") %>%
filter(subregion=="montgomery")
With just these simple lines of code and included datasets you can
use ggplot2 to quickly create a simple locator map for
campus.
ggplot() + geom_polygon(data = state, aes(x=long, y = lat, group = group),
fill = "white", color="black") +
geom_polygon(data = tn, aes(x=long, y = lat, group = group),
fill = "gray", color="black") +
geom_polygon(data = montco, aes(x=long, y = lat, group = group),
fill = "red", color="black") +
geom_point(data = apsu_point, aes(x=x,y=y), color="black") +
coord_fixed(xlim = c(-91, -81), ylim = c(34, 37), ratio = 1.2) +
xlab("Longitude") + ylab("Latitude") + ggtitle("APSU, Montgomery Co., TN")
Keep in mind, that when using ggplot2 the order of the
geoms is important for proper display of the image. Because
you are mapping various layers, the first geom entered will
be the first layer drawn and the last geom will be drawn on
top of all others. Using the various options within ggplot2
you can customize the map based on your preferences.
It’s likely that the map you need to create is slightly more complex
than the one above. Maybe you are interested in using some sort of
imagery as a basemap to add complexity or provide additional information
for a location. In the data folder of this exercise is a *.csv file
called campus_points. This file contains the location of
several points on the main APSU campus. In order to view these files on
aerial imagery of campus you are going to use OpenStreetMap
to obtain a basemap for a closer look at campus.
Note in 2019: As of mid-2018, Google began requiring an API key for use with
get_mapthat allowed for Google imagery to be used as basemaps. Although you can obtain a certain number of maps free per month, I have chosen to avoid that set-up for this example. However, you can still use theggmappackage to obtain imagery, geocode, etc.
Note in 2021: OpenStreeMap requires Java and causes some issues for macOS users. One possible fix is to install Java JDK 11 from https://adoptopenjdk.net/. After installing the software input the following code into the console in RStudio: Sys.setenv(JAVA_HOME=‘/Library/Java/JavaVirtualMachines/temurin-11.jdk/Contents/Home’)
This fix was developed by members of the APSU OIT. Do not try if you are not familiar with installing software or setting system variables. Use at your own risk.
To use OpenStreetMap you need to first obtain upper-left
and lower-right coordinates for the location you are interested in
displaying. This can be done through programs like Google
Earth, Google Maps, or through
min() and max() commands on the coordinate
information in your dataset.
CampusMap <- openmap(c(36.5360,-87.3570),
c(36.5300,-87.3495),
"esri-imagery", zoom=18,
mergeTiles = TRUE)
APSU <- openproj(CampusMap, projection = "+proj=longlat +ellps=WGS84 +units=m +no_defs")
In the script above, the bounding box was identified, the type of map
listed (view ?openmap for additional options), and in the
second portion of the script, the projection for the map was set. Refer
to the lecture for information regarding the projection and
ellipsoid.
Alternatively, with the *.csv file loaded you could use a modified script like:
openmap(c(max(campus_points$y)+0.001,min(campus_points$x)-0.001),c(min(campus_points$y)-0.001,max(campus_points$x)+0.001),type="esri-imagery",zoom=18,mergeTiles = TRUE))
This takes the max() and min()
longitude/latitude values from the dataset and extends the bounding box
just beyond the furthest data points. To add this imagery to the map you
need to use the autoplot function in ggplot2.
This function extends the plotting functions to include some spatial
data classes, especially raster-based data. Otherwise the script for
creating the map is similar to the simple map above.
campus_points <- read.csv("https://raw.githubusercontent.com/chrismgentry/Mapping-Basics/master/Data/campus_points.csv")
autoplot.OpenStreetMap(APSU) +
geom_point(data=campus_points, aes(x = X, y = Y, color=Name), size = 5, alpha = 0.8) +
geom_text(data=campus_points, aes(x = X, y = Y, label=Name), color="black", vjust=-0.60, size=4.5, fontface="bold") +
geom_text(data=campus_points, aes(x = X, y = Y, label=Name), color="white", vjust=-0.60, size=4.0, fontface="bold") +
labs(x="Longtiude", y="Latitude") + theme(legend.position = "none")
With the addition of the aerial photo there is accompanying spatial
information that can be obtained from viewing the map. Similar to the
simple map above, you can customize the map through setting various
themes and aesthetics in ggplot2.
Occasionally you will receive information from a collaborator that is
not a *.csv file. Depending on the individual, that information might be
provided as a *.kml file from Google Earth or in an
ESRI shapefile. Using a simple import command
from the sf package, and some additional manipulation,
these files can also be used with ggplot2.
To read either shapefiles or KML files you can use the following syntax:
st_read(dsn="path to the shapefile",layer="name of the shapefile")
or
st_read("path to kml/name of the file.kml")
With this in mind, you can easily use Google Earth to create your own datasets and import them into R for spatial analysis and mapping. This can be useful when collaborating with individuals or with citizen science projects.
In the data folder of this exercise you will find the
Campus_Points.kml file and a Main_Campus.kml file. The
campus points file includes the same data as the previous *.csv file but
as a *.kml, and the main campus file contains a polygon outline of the
main portion of campus. You can use the script above to create datasets
out of those files. Alternatively, you can use the URL and the
download.file() function to download the *.kml to your
project folder:
download.file('https://raw.githubusercontent.com/chrismgentry/Mapping-Basics/master/Data/Campus_Points.kml', 'Campus_Points.kml')
Working in ggplot2 with SpatialPointsDataFrame,
SpatialPolygonDataFrame, or sf Class files can present some difficulties
due to the structure of the data types. So lets look at the data
structure for an sf object.
campusKML <- st_read("./Data/Campus_Points.kml")
## Reading layer `Campus' from data source
## `C:\Users\gentryc\Google Drive\APSU\Courses\BIOL 5700 Advanced Data Analytics\Exercise_9\Data\Campus_Points.kml'
## using driver `KML'
## Simple feature collection with 23 features and 2 fields
## Geometry type: POINT
## Dimension: XYZ
## Bounding box: xmin: -87.35582 ymin: 36.53173 xmax: -87.35074 ymax: 36.535
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
head(campusKML)
Notice the location data is not stored in a typical dataframe-like
structure with separate columns for x and y but
instead stored in a single column. By typing campusKML$
into the console RStudio will return options for “Name”, “Description”,
and “Geometry”. However, a check of campusKML$geometry will
reveal x, y, and z (elevation) data in a single field bound by
parenthesis.
ggplot2 has a specific geom that was made
for sf that allows for this information to be appropriately
plotted.
ggplot()+geom_sf(data = campusKML, aes(color=Name), size = 5) +
labs(x="Longtiude", y="Latitude") + theme(legend.position = "none")
Although this seems to work perfectly in a simple scatterplot, when
create more complex graphics that include the display of raster objects
it doesn’t work quite so well. Notice what happens when we use a
combination of the two codes from above:
autoplot.OpenStreetMap(APSU) +
geom_sf(data = campusKML, aes(color=Name), size = 5) +
labs(x="Longtiude", y="Latitude") + theme(legend.position = "none")
We now have an error because the use of autoplot has
caused an issue with the syntax of our sf object. It is
here where some knowledge of base R is important. You can directly
access a row, column, or specific record by use of bracketed commands
such as campusKML[1,] which will provide all of the
information in row 1 or campusKML[,1] that will let you
view all of the information in the first column. Therefore, we can alter
the script by providing explicit information on the location of the
x and y data needed for plotting the aesthetics for
ggplot by adding a function from sf called
st_coordinates which is used to retrieve the coordinates of
an sf object in matrix form.
autoplot.OpenStreetMap(APSU) +
geom_sf(data=campusKML, aes(x=st_coordinates(campusKML)[,1], y=st_coordinates(campusKML)[,2], color=Name), size = 5) +
labs(x="Longtiude", y="Latitude") + theme(legend.position = "none")
Now the autoplot and sf objects are able to
be used in the same script. While this wasn’t particularly difficult, it
does add some additional complexity that simple features was created to
avoid.
So there are a couple potential work arounds in order to avoid this
issue. One of them would be converting the sf class data to
a typical dataframe object you are more accustom to working with.
To convert the point kml to a data.frame:
campus_pointsKML <- data.frame(st_drop_geometry(campusKML), st_coordinates(campusKML)[,1:2])
head(campus_pointsKML)
Now look at the object. The data includes columns for “Name”,
“Description,”X”, and “Y”. This allowed us to create a data.frame out of
the coordinate values and data values so now you see a much more
familiar dataset structure. Now we can use a more familiar
ggplot syntax for the script.
autoplot.OpenStreetMap(APSU) +
geom_point(data=campus_pointsKML, aes(x=X, y=Y, color=Name), size = 5) +
geom_text(data=campus_pointsKML, aes(x=X, y=Y, label=Name), color="black", vjust=-0.60, size=4.5, fontface="bold") +
geom_text(data=campus_pointsKML, aes(x=X, y=Y, label=Name), color="white", vjust=-0.60, size=4.0, fontface="bold") +
labs(x="Longtiude", y="Latitude") + theme(legend.position = "none")