To be fair, cartography and spatial analysis is not one of the easiest things to do in R. While nearly every statistical analysis you want to do has several packages that can accomplish the same task, rarely do they rely on drastically different types of data. This is why a lot of spatial analysis is still performed in programs such as ArcGIS or QGIS. However, moving from one program to another often requires reformatting the data which can also be cumbersome. So maintaining the data within a single program to manage, format, analyze, and visualize helps to create a usable, repeatable workflow. That’s why data analysis and visualization in R is becoming standard practice in many disciplines.
When using R for spatial analysis there are a number
of graphics packages that can be used for everything from basic mapping
to interactive maps. A few of these packages you may already be familiar
with such as plot
from base R,
ggplot2
, plotly
, leaflet
, or
tmap
. Many of these packages are dependent on other
packages that help you manage, organize, and format spatial data.
Packages such as sf
“Simple Features for R”,
terra
“Methods for spatial data analysis with vector and
raster data”, and stars
“Tools for manipulating raster and
vector data cubes” provide access to spatial data and spatial data
formats for a number of other packages. In order to reduce the overall
complexity of this exercise, we will not discuss all of the data formats
or packages that can be used to analyze spatial data here. However, I
will provide links to useful online resources in the exercise README
document.
For this exercise you will use a few common packages for working with spatial data. Once again, many of these packages will require dependencies so installing them may take some time.
leaflet | OpenStreetMap | sf | TidyVerse |
As discussed in a previous class, there are various scripts and
packages that can be used for package management. For this I use
pacman
which is a “management tool that combines the
functionality of base library related functions into intuitively named
functions.” While my use of the package is limited the following script
is useful when providing a script to colleagues or making it available
on GitHub:
#install.packages('pacman')
pacman::p_load("leaflet","OpenStreetMap","sf","tidyverse")
The p_load
function in pacman
is a wrapper
for the library
and require
functions from
base R. It will check to see if a package is installed and, if not,
attempt to install it from CRAN. You can view the other functions
available in pacman
here.
These packages will provide the minimum functionality required to complete this exercise. As you experiment with various styles of mapping you may require additional packages to be installed at a later time.
Note in 2023: A number of packages used in my typical spatial work flow including
maptools
,rgdal
,rgeos
, andsp
are being archived by October 2023. You can read the report here. This change impacts several packages such asggsn
(used to add some essential map elements),adehabitat
(for homerange estimation), andGapAnalysis
(to calculate conservation indicators) which have been used for spatial analysis of ecological data. Some packages loaded outside of the script above with the baseinstall.packages()
function while others would not. A few of these packages can still be loaded from the archive but each update of R will limit their functionality until they no longer operate. Here is the basic syntax to load archived packages:
require(devtools)
install_version("PACKAGE NAME", version = "PACKAGE VERSION", repos = "http://cran.us.r-project.org")
You can find archived packages at https://cran.r-project.org/src/contrib/Archive. Once you find the correct package, click on the folder to view the different published versions. Using the script above, add the package name and the version to the script and see if it will install. If you receive an error, occassionally there is an archived dependency that also needs loaded which makes this a bandaid and not a solution. Eventually these exercises will be update with the newest functions and packages.
There are a number of very simple maps you can create that uses data
included in the package downloads. For this first example you are going
to create a map that depicts the location of Austin Peay State
University. To begin we need to load some data from
ggplot2
using the map_data
command.
state <- map_data("state")
county <- map_data("county")
apsu_point <- data.frame("x" = -87.353069, "y" = 36.533654)
This quick script loaded all of the state and county information for US into the appropriate objects and created a point centered on the APSU campus. However, you can create a better map by filtering the data to focus on Tennessee and Montgomery County.
tn <- county %>%
filter(region=="tennessee")
montco <- county %>%
filter(region=="tennessee") %>%
filter(subregion=="montgomery")
With just these simple lines of code and included datasets you can
use ggplot2
to quickly create a simple locator map for
campus.
ggplot() + geom_polygon(data = state, aes(x=long, y = lat, group = group),
fill = "white", color="black") +
geom_polygon(data = tn, aes(x=long, y = lat, group = group),
fill = "gray", color="black") +
geom_polygon(data = montco, aes(x=long, y = lat, group = group),
fill = "red", color="black") +
geom_point(data = apsu_point, aes(x=x,y=y), color="black") +
coord_fixed(xlim = c(-91, -81), ylim = c(34, 37), ratio = 1.2) +
xlab("Longitude") + ylab("Latitude") + ggtitle("APSU, Montgomery Co., TN")
Keep in mind, that when using ggplot2
the order of the
geoms
is important for proper display of the image. Using
the various options within ggplot2
you can customize the
map based on your preferences.
It’s likely that the map you need to create is slightly more complex
than the one above. Maybe you are interested in using some sort of
imagery of basemap to add complexity or provide additional information
for a locations. In the data folder of this exercise is a *.csv file
called campus_points. This file contains the location of
several points on the main campus. In order to view these files on
aerial imagery of campus you are going to use OpenStreetMap
to obtain a basemap for our closer look at campus.
Note in 2019: As of mid-2018, Google began requiring an API key for use with
get_map
that allowed for Google imagery to be used as basemaps. Although you can obtain a certain number of maps free per month, I have chosen to avoid that set-up for this example.
Note in 2021: OpenStreeMap requires Java and causes some issues for macOS users. One possible fix is to install Java JDK 11 from https://adoptopenjdk.net/. After installing the software input the following code into the console in RStudio: Sys.setenv(JAVA_HOME=‘/Library/Java/JavaVirtualMachines/temurin-11.jdk/Contents/Home’)
This fix was developed by members of the APSU OIT. Do not try if you are not familiar with installing software or setting system variables. Use at your own risk.
To use OpenStreetMap
you need to first obtain upper-left
and lower-right coordinates for the location you are interested in
displaying. This can be done through programs like Google
Earth or through min()
and max()
commands using the information in your dataset.
CampusMap <- openmap(c(36.5360,-87.3570),
c(36.5300,-87.3495), type='bing')
APSU <- openproj(CampusMap, projection = "+proj=longlat +ellps=WGS84 +units=m +no_defs")
In the script above, the bounding box was identified, the type of map
listed (view ?openmap
for additional options), and the
projection for the map was set. Refer to the lecture for information
regarding the projection and ellipsoid.
Alternatively, with the *.csv file loaded you could use a modified script like:
openmap(c(max(campus_points$y)+0.001,min(campus_points$x)-0.001),c(min(campus_points$y)-0.001,max(campus_points$x)+0.001),type='bing'))
This takes the max()
and min()
longitude/latitude values from the dataset and extends the bounding box
just beyond the furthest data points. To add this imagery to the map you
need to use the autoplot
function in ggplot2
.
This function extends the plotting functions to include some spatial
data classes, especially raster-based data. Otherwise the script for
creating the map is similar to the simple map above.
campus_points <- read.csv("https://raw.githubusercontent.com/chrismgentry/Mapping-Basics/master/Data/campus_points.csv")
autoplot.OpenStreetMap(APSU) +
geom_point(data=campus_points, aes(x = X, y = Y, color=Name), size = 4, alpha = 0.8) +
geom_text(data=campus_points,aes(x=X,y=Y,label=Name), color="black", vjust=-0.60, size=4.01, fontface="bold") +
geom_text(data=campus_points,aes(x=X,y=Y,label=Name), color="white", vjust=-0.75, fontface="bold") +
labs(x="Longtiude", y="Latitude") + theme(legend.position = "none")
With the addition of the aerial photo there is accompanying spatial
information that can be obtained from viewing the map. Similar to the
simple map above, you can customize the map through setting various
themes and aesthetics in ggplot2
.
Occasionally you will receive information from a collaborator that is
not a *.csv file. Depending on the individual, that information might be
provided as a *.kml file from Google Earth or in an
ESRI shapefile. Using a simple import command
in the sf
package, and some additional manipulation, these
files can also be used with ggplot2
.
To read either shapefiles or KML files you can use the following syntax:
st_read(dsn="path to the shapefile",layer="name of the shapefile")
or
st_read("path to kml/name of the file.kml")
With this in mind, you can easily use Google Earth to create your own datasets and import them into R for spatial analysis and mapping. This can be useful when collaborating with individuals or with citizen science projects.
In the data folder of this exercise you will find the
Campus_Points.kml file and a Main_Campus.kml file. The
campus points file includes the same data as the previous *.csv file but
as a kml, and the main campus file contains a polygon outline of the
main portion of campus. You can use the script above to create datasets
out of those files. Alternatively, you can use the URL and the
download.file()
function to download the *.kml to your
project folder:
download.file('https://raw.githubusercontent.com/chrismgentry/Mapping-Basics/master/Data/Campus_Points.kml', 'Campus_Points.kml')
Working in ggplot2
with SpatialPointsDataFrame,
SpatialPolygonDataFrame, or SF Class files can be difficult due to the
structure of the data types. So to avoid adding additional packages or
detailed explanations for this exercise, you will convert the points and
polygon to a data.frame
and learn how they can be used in
the spatial form as well. Before altering the dataset, examine its
structure after importing the object.
campusOGR <- st_read("./Data/Campus_Points.kml")
## Reading layer `Campus' from data source
## `/Users/gentryc/Google Drive/APSU/Courses/BIOL 5700 Advanced Data Analytics/Exercise_9/Data/Campus_Points.kml'
## using driver `KML'
## Simple feature collection with 23 features and 2 fields
## Geometry type: POINT
## Dimension: XYZ
## Bounding box: xmin: -87.35582 ymin: 36.53173 xmax: -87.35074 ymax: 36.535
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
Notice the location data is not stored in a typical dataframe-like
structure and instead stored in a single column. By typing
campusOGR$
into the console RStudio will return options for
“Name”, “Description”, and “Geometry”. However, a check of
campusOGR$geometry
will reveal x, y, and z (elevation) data
in a single field bound by paraenthesis.
It is here where some knowledge of base R is important. You can
directly access a row, column, or specific record by use of bracketed
commands such as campusOGR[1,]
which will provide all of
the information in row 1 or campusOGR[,1]
that will let you
view all of the information in the first column. This is one of the
reasons you will be converting the sf
class data to a
typical dataframe you are more accustom to working with.
So to convert the point kml to a data.frame:
campus_points <- cbind.data.frame(campusOGR$Name,
st_coordinates(campusOGR)[,1],
st_coordinates(campusOGR)[,2])
campus_points
Now look at the object. The data now includes columns for “name”,
“x”, and “y”. However, they retained a naming convention that’s
difficult to use in ggplot2
and not particularly inline
with data standards.
colnames(campus_points) <- c("Name","X","Y")
head(campus_points, 3)
This allowed us to create a data.frame out of the coordinate values
and data values, removed unnecessary issues, and rename the columns. If
you view(campus_points)
now you will see a much more
familiar dataset.
The data behind a polygon-based kml is a little more complex because one (1) individual polygon is made from no less than three (3) points. Therefore to draw a polygon R needs to be able to determine which point to begin with, which to move to next…, next…, next… , and finally how to return to the original point (usually determined by groups or IDs). This necessitates there being far more location information in the geometry column.
As well as the other slots in the previous SpatialPointsDataFrame. In this example there is only one (1) polygon which encompasses the main campus and no additional data.
In this example there is only one (1) polygon which encompasses the main campus with its associated “name”, “description”, and “geometry”. If there were several polygons, each would have its own row with additional data listed above. You can see how this information can quickly become complex. To convert this individual polygon kml to a dataframe:
outlineOGR <- st_read("./Data/Main_Campus.kml")
## Reading layer `Main_Campus.kml' from data source
## `/Users/gentryc/Google Drive/APSU/Courses/BIOL 5700 Advanced Data Analytics/Exercise_9/Data/Main_Campus.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: POLYGON
## Dimension: XYZ
## Bounding box: xmin: -87.35608 ymin: 36.53093 xmax: -87.34982 ymax: 36.5354
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
outline_poly <- cbind.data.frame(st_coordinates(outlineOGR)[,1],
st_coordinates(outlineOGR)[,2],
st_coordinates(outlineOGR)[,3],
st_coordinates(outlineOGR)[,4])
colnames(outline_poly) <- c("X","Y", "id", "group")
This script created a dataframe out of the coordinate values for the
polygon. Because there is only one polygon the order the values are
drawn is sequential which creates the correct shape for this exercise.
If you have a file with multiple polygons you will need to use the
ID or group variable that is unique for each polygon
to ensure all polygons are drawn correctly. Then in the
ggplot
aesthetics you would have
aes(x=x,y=y,group=group
to inform ggplot to draw each
polygon based on their group information. An example of this can be
found in view(tn)
from the script above. Notice the columns
for lat, long, and group along with other
identifiers.
Now you can use these files to map the data from the kml files similar to the previous map of the csv data.
autoplot.OpenStreetMap(APSU) +
geom_polygon(data=outline_poly, aes(x = X, y = Y), alpha = .5, size = 2, color="white") +
geom_point(data=campus_points, aes(x = X, y = Y, color=Name), size = 4, alpha = 0.8) +
geom_text(data=campus_points,aes(x=X,y=Y,label=Name), color="black", vjust=-0.60, size=4.01, fontface="bold") +
geom_text(data=campus_points,aes(x=X,y=Y,label=Name), color="white", vjust=-0.75, fontface="bold") +
labs(x="Longtiude", y="Latitude") + theme(legend.position = "none")
While the geom_sf
geometry in ggplot could have been
used to plot the information, you cannot use the autoplot function with
many sf
class data because of the syntax. Converting the
data to a typical dataframe provide a more straightforward syntax
withing ggplot2
that is consistent with with aesthetics for
other visualizations.
While there are a number of packages that can help create interactive
maps such as plotly
and mapview
, for this
example you are going to use leaflet
.
Leaflet allows you to create interactive web maps with
the javaScript ‘Leaflet’ library. Because this package already has
handlers for sf
class data, you can use the original data
imported from the kml files to complete this portion although the
example will use the dataframe versions.
In the most simple version, you set the point dataset you want to use
for the map, tell leaflet
the sort of markers you want to
use to identify the points, and then let leaflet
obtain the
OSM tiles for the background image.
leaflet(campus_points) %>%
addTiles() %>%
addMarkers(popup = campus_points$Name,
lng = campus_points$X,
lat = campus_points$Y)
Clicking on the markers will identify the place based on the “name” variable in our data. Because there is additional data from the polygon kml you can add to the map and customize it further.
leaflet(campus_points) %>%
addTiles() %>%
addPolygons(outline_poly,
lng = outline_poly$X,
lat = outline_poly$Y,
fillColor = "grey",
color = "black") %>%
addCircleMarkers(popup = campus_points$Name,
lng = campus_points$X,
lat = campus_points$Y,
weight = 2,
color = "grey",
fillColor = "red",
fillOpacity = 0.7)
Here you customized the map with APSU colors and added the polygon to the background to identify the “main” campus area. You can take this a step further and provide additional information about the campus buildings and polygon by editing the attributes of each dataset.
campus_points$Description <- c("Academic", "Administrative", "Academic",
"Library", "Academic", "Food", "Administrative",
"Academic", "Residential", "Administrative",
"Administrative", "Bookstore", "Academic",
"Academic", "Residential", "Academic", "Academic",
"Academic", "Academic", "Residential", "Residential",
"Administrative", "Academic")
In the script above you provided a description of each building that can be used to identify the purpose of the building when the cursor hovers over the point. This could have been completed in a previous step instead of removing the column.
leaflet(campus_points) %>%
addTiles() %>%
addPolygons(outline_poly,
lng = outline_poly$X,
lat = outline_poly$Y,
fillColor = "grey",
color = "black") %>%
addCircleMarkers(popup = campus_points$Name,
label = campus_points$Description,
lng = campus_points$X,
lat = campus_points$Y,
weight = 2,
color = "grey",
fillColor = "red",
fillOpacity = 0.7)
Notice now that when you move the cursor around there is additional information provided about each dataset. This can be used with numerous variables or to create an attribute table for each point when clicked.
Leaflet has a number of possible backgrounds that can be included as a basemap to your project. For example, if you were interested in elevation you can connect to a web-mapping service to obtain a zoomable terrain layer
leaflet() %>%
setView(lng = -112.956472, lat = 37.251343, zoom = 13) %>%
addWMSTiles("https://basemap.nationalmap.gov/arcgis/services/USGSTopo/MapServer/WmsServer", layers = "0") %>%
addMiniMap(zoomLevelOffset = -4) %>%
addScaleBar()
Additionally, you can incorporate several basemaps and add an interactive selection tool to chose what background to display, set transparencies for multiple backgrounds to be displayed at once, have live updates streamed to your map, or toggle the data off and on.
leaflet() %>%
addTiles(group = "OSM")%>%
addProviderTiles(providers$CartoDB.Positron, group = "CartoDB") %>%
addProviderTiles(providers$Esri.NatGeoWorldMap, group = "NatGeo") %>%
addProviderTiles(providers$Esri.WorldImagery, group = "ESRI") %>%
setView(lng = -87.353069, lat = 36.533654, zoom = 17) %>%
addPolygons(outlineOGR,
lng = outline_poly$X,
lat = outline_poly$Y,
fillColor = "grey",
color = "black") %>%
addCircleMarkers(popup = campus_points$Name,
label = campus_points$Description,
group = "APSU",
lng = campus_points$X,
lat = campus_points$Y,
weight = 2,
color = "grey",
fillColor = "red",
fillOpacity = 0.7) %>%
addLayersControl(
baseGroups = c("OSM", "CartoDB", "NatGeo", "ESRI"),
options = layersControlOptions(collapsed = FALSE),
overlayGroups = "APSU")
This is only a very small fraction of the possible applications for mapping that can be done in R.
Using the skills discussed in this exercise, create two site maps for your thesis project or other dataset of interest. You should create one static and one interactive map, and they can contain as many datasets or features as you feel are necessary to properly display your data. The maps should be added to a GitHub project page created for this exercise. For an added challenge, you can try to fork this repository to give you access to the data files and a jump start on your script.