Mapping in R

To be fair, cartography and spatial analysis is not one of the easiest things to do in R. While nearly every statistical analysis you want to do has several packages that can accomplish the same task, rarely do they rely on drastically different types of data. This is why a lot of spatial analysis is still performed in programs such as ArcGIS or QGIS. However, moving from one program to another often requires reformatting the data which can also be cumbersome. So maintaining the data within a single program to manage, format, analyze, and visualize helps to create a usable, repeatable workflow. That’s why data analysis and visualization in R is becoming standard practice in many disciplines.

Data Types

When using R for spatial analysis there are a number of graphics packages that can be used for everything from basic mapping to interactive maps. A few of these packages you may already be familiar with such as plot from base R, ggplot2, plotly, leaflet, or tmap. Many of these packages are dependent on other packages that help you manage, organize, and format spatial data. Packages such as sf “Simple Features for R”, terra “Methods for spatial data analysis with vector and raster data”, and stars “Tools for manipulating raster and vector data cubes” provide access to spatial data and spatial data formats for a number of other packages. In order to reduce the overall complexity of this exercise, we will not discuss all of the data formats or packages that can be used to analyze spatial data here. However, I will provide links to useful online resources in the exercise README document.

Packages

For this exercise you will use a few common packages for working with spatial data. Once again, many of these packages will require dependencies so installing them may take some time.

leaflet OpenStreetMap sf TidyVerse


As discussed in a previous class, there are various scripts and packages that can be used for package management. For this I use pacman which is a “management tool that combines the functionality of base library related functions into intuitively named functions.” While my use of the package is limited the following script is useful when providing a script to colleagues or making it available on GitHub:

#install.packages('pacman')
pacman::p_load("leaflet","OpenStreetMap","sf","tidyverse")

The p_load function in pacman is a wrapper for the library and require functions from base R. It will check to see if a package is installed and, if not, attempt to install it from CRAN. You can view the other functions available in pacman here.

These packages will provide the minimum functionality required to complete this exercise. As you experiment with various styles of mapping you may require additional packages to be installed at a later time.

Note in 2023: A number of packages used in my typical spatial work flow including maptools, rgdal, rgeos, and sp are being archived by October 2023. You can read the report here. This change impacts several packages such as ggsn (used to add some essential map elements), adehabitat (for homerange estimation), and GapAnalysis (to calculate conservation indicators) which have been used for spatial analysis of ecological data. Some packages loaded outside of the script above with the base install.packages() function while others would not. A few of these packages can still be loaded from the archive but each update of R will limit their functionality until they no longer operate. Here is the basic syntax to load archived packages:

require(devtools)
install_version("PACKAGE NAME", version = "PACKAGE VERSION", repos = "http://cran.us.r-project.org")

You can find archived packages at https://cran.r-project.org/src/contrib/Archive. Once you find the correct package, click on the folder to view the different published versions. Using the script above, add the package name and the version to the script and see if it will install. If you receive an error, occassionally there is an archived dependency that also needs loaded which makes this a bandaid and not a solution. Eventually these exercises will be update with the newest functions and packages.

Simple Maps

There are a number of very simple maps you can create that uses data included in the package downloads. For this first example you are going to create a map that depicts the location of Austin Peay State University. To begin we need to load some data from ggplot2 using the map_data command.

state <- map_data("state")
county <- map_data("county")
apsu_point <- data.frame("x" = -87.353069, "y" = 36.533654)

This quick script loaded all of the state and county information for US into the appropriate objects and created a point centered on the APSU campus. However, you can create a better map by filtering the data to focus on Tennessee and Montgomery County.

tn <- county %>% 
  filter(region=="tennessee")

montco <- county %>% 
  filter(region=="tennessee") %>% 
  filter(subregion=="montgomery")

With just these simple lines of code and included datasets you can use ggplot2 to quickly create a simple locator map for campus.

ggplot() + geom_polygon(data = state, aes(x=long, y = lat, group = group),
                        fill = "white", color="black") + 
           geom_polygon(data = tn, aes(x=long, y = lat, group = group),
                        fill = "gray", color="black") + 
           geom_polygon(data = montco, aes(x=long, y = lat, group = group),
                        fill = "red", color="black") + 
           geom_point(data = apsu_point, aes(x=x,y=y), color="black") +
  coord_fixed(xlim = c(-91, -81),  ylim = c(34, 37), ratio = 1.2) + 
  xlab("Longitude") + ylab("Latitude") + ggtitle("APSU, Montgomery Co., TN")

Keep in mind, that when using ggplot2 the order of the geoms is important for proper display of the image. Using the various options within ggplot2 you can customize the map based on your preferences.

Maps with raster images

It’s likely that the map you need to create is slightly more complex than the one above. Maybe you are interested in using some sort of imagery of basemap to add complexity or provide additional information for a locations. In the data folder of this exercise is a *.csv file called campus_points. This file contains the location of several points on the main campus. In order to view these files on aerial imagery of campus you are going to use OpenStreetMap to obtain a basemap for our closer look at campus.

Note in 2019: As of mid-2018, Google began requiring an API key for use with get_map that allowed for Google imagery to be used as basemaps. Although you can obtain a certain number of maps free per month, I have chosen to avoid that set-up for this example.

Note in 2021: OpenStreeMap requires Java and causes some issues for macOS users. One possible fix is to install Java JDK 11 from https://adoptopenjdk.net/. After installing the software input the following code into the console in RStudio: Sys.setenv(JAVA_HOME=‘/Library/Java/JavaVirtualMachines/temurin-11.jdk/Contents/Home’)
This fix was developed by members of the APSU OIT. Do not try if you are not familiar with installing software or setting system variables. Use at your own risk.

To use OpenStreetMap you need to first obtain upper-left and lower-right coordinates for the location you are interested in displaying. This can be done through programs like Google Earth or through min() and max() commands using the information in your dataset.

CampusMap <- openmap(c(36.5360,-87.3570),
                     c(36.5300,-87.3495), type='bing')

APSU <- openproj(CampusMap, projection = "+proj=longlat +ellps=WGS84 +units=m +no_defs")

In the script above, the bounding box was identified, the type of map listed (view ?openmap for additional options), and the projection for the map was set. Refer to the lecture for information regarding the projection and ellipsoid.

Alternatively, with the *.csv file loaded you could use a modified script like:

openmap(c(max(campus_points$y)+0.001,min(campus_points$x)-0.001),c(min(campus_points$y)-0.001,max(campus_points$x)+0.001),type='bing'))

This takes the max() and min() longitude/latitude values from the dataset and extends the bounding box just beyond the furthest data points. To add this imagery to the map you need to use the autoplot function in ggplot2. This function extends the plotting functions to include some spatial data classes, especially raster-based data. Otherwise the script for creating the map is similar to the simple map above.

campus_points <- read.csv("https://raw.githubusercontent.com/chrismgentry/Mapping-Basics/master/Data/campus_points.csv")
autoplot.OpenStreetMap(APSU) +
  geom_point(data=campus_points, aes(x = X, y = Y, color=Name), size = 4, alpha = 0.8) +
  geom_text(data=campus_points,aes(x=X,y=Y,label=Name), color="black", vjust=-0.60, size=4.01, fontface="bold") +
  geom_text(data=campus_points,aes(x=X,y=Y,label=Name), color="white", vjust=-0.75, fontface="bold") +
  labs(x="Longtiude", y="Latitude") + theme(legend.position = "none")

With the addition of the aerial photo there is accompanying spatial information that can be obtained from viewing the map. Similar to the simple map above, you can customize the map through setting various themes and aesthetics in ggplot2.

Using KMLs or Shapefiles

Occasionally you will receive information from a collaborator that is not a *.csv file. Depending on the individual, that information might be provided as a *.kml file from Google Earth or in an ESRI shapefile. Using a simple import command in the sf package, and some additional manipulation, these files can also be used with ggplot2.

To read either shapefiles or KML files you can use the following syntax:

    st_read(dsn="path to the shapefile",layer="name of the shapefile")

    or

    st_read("path to kml/name of the file.kml")

With this in mind, you can easily use Google Earth to create your own datasets and import them into R for spatial analysis and mapping. This can be useful when collaborating with individuals or with citizen science projects.

In the data folder of this exercise you will find the Campus_Points.kml file and a Main_Campus.kml file. The campus points file includes the same data as the previous *.csv file but as a kml, and the main campus file contains a polygon outline of the main portion of campus. You can use the script above to create datasets out of those files. Alternatively, you can use the URL and the download.file() function to download the *.kml to your project folder:

download.file('https://raw.githubusercontent.com/chrismgentry/Mapping-Basics/master/Data/Campus_Points.kml', 'Campus_Points.kml')

Working in ggplot2 with SpatialPointsDataFrame, SpatialPolygonDataFrame, or SF Class files can be difficult due to the structure of the data types. So to avoid adding additional packages or detailed explanations for this exercise, you will convert the points and polygon to a data.frame and learn how they can be used in the spatial form as well. Before altering the dataset, examine its structure after importing the object.

campusOGR <- st_read("./Data/Campus_Points.kml")
## Reading layer `Campus' from data source 
##   `/Users/gentryc/Google Drive/APSU/Courses/BIOL 5700 Advanced Data Analytics/Exercise_9/Data/Campus_Points.kml' 
##   using driver `KML'
## Simple feature collection with 23 features and 2 fields
## Geometry type: POINT
## Dimension:     XYZ
## Bounding box:  xmin: -87.35582 ymin: 36.53173 xmax: -87.35074 ymax: 36.535
## z_range:       zmin: 0 zmax: 0
## Geodetic CRS:  WGS 84

KML Points Data Example

Notice the location data is not stored in a typical dataframe-like structure and instead stored in a single column. By typing campusOGR$ into the console RStudio will return options for “Name”, “Description”, and “Geometry”. However, a check of campusOGR$geometry will reveal x, y, and z (elevation) data in a single field bound by paraenthesis.

It is here where some knowledge of base R is important. You can directly access a row, column, or specific record by use of bracketed commands such as campusOGR[1,] which will provide all of the information in row 1 or campusOGR[,1] that will let you view all of the information in the first column. This is one of the reasons you will be converting the sf class data to a typical dataframe you are more accustom to working with.

So to convert the point kml to a data.frame:

campus_points <- cbind.data.frame(campusOGR$Name,
                 st_coordinates(campusOGR)[,1],
                 st_coordinates(campusOGR)[,2])
campus_points

Now look at the object. The data now includes columns for “name”, “x”, and “y”. However, they retained a naming convention that’s difficult to use in ggplot2 and not particularly inline with data standards.

colnames(campus_points) <- c("Name","X","Y")
head(campus_points, 3)

This allowed us to create a data.frame out of the coordinate values and data values, removed unnecessary issues, and rename the columns. If you view(campus_points) now you will see a much more familiar dataset.

The data behind a polygon-based kml is a little more complex because one (1) individual polygon is made from no less than three (3) points. Therefore to draw a polygon R needs to be able to determine which point to begin with, which to move to next…, next…, next… , and finally how to return to the original point (usually determined by groups or IDs). This necessitates there being far more location information in the geometry column.

As well as the other slots in the previous SpatialPointsDataFrame. In this example there is only one (1) polygon which encompasses the main campus and no additional data.

KML Polygon Data Example

In this example there is only one (1) polygon which encompasses the main campus with its associated “name”, “description”, and “geometry”. If there were several polygons, each would have its own row with additional data listed above. You can see how this information can quickly become complex. To convert this individual polygon kml to a dataframe:

outlineOGR <- st_read("./Data/Main_Campus.kml")
## Reading layer `Main_Campus.kml' from data source 
##   `/Users/gentryc/Google Drive/APSU/Courses/BIOL 5700 Advanced Data Analytics/Exercise_9/Data/Main_Campus.kml' 
##   using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: POLYGON
## Dimension:     XYZ
## Bounding box:  xmin: -87.35608 ymin: 36.53093 xmax: -87.34982 ymax: 36.5354
## z_range:       zmin: 0 zmax: 0
## Geodetic CRS:  WGS 84
outline_poly <- cbind.data.frame(st_coordinates(outlineOGR)[,1],
                                 st_coordinates(outlineOGR)[,2],
                                 st_coordinates(outlineOGR)[,3],
                                 st_coordinates(outlineOGR)[,4])
colnames(outline_poly) <- c("X","Y", "id", "group")

This script created a dataframe out of the coordinate values for the polygon. Because there is only one polygon the order the values are drawn is sequential which creates the correct shape for this exercise. If you have a file with multiple polygons you will need to use the ID or group variable that is unique for each polygon to ensure all polygons are drawn correctly. Then in the ggplot aesthetics you would have aes(x=x,y=y,group=group to inform ggplot to draw each polygon based on their group information. An example of this can be found in view(tn) from the script above. Notice the columns for lat, long, and group along with other identifiers.

Now you can use these files to map the data from the kml files similar to the previous map of the csv data.

autoplot.OpenStreetMap(APSU) +
  geom_polygon(data=outline_poly, aes(x = X, y = Y), alpha = .5, size = 2, color="white") +
  geom_point(data=campus_points, aes(x = X, y = Y, color=Name), size = 4, alpha = 0.8) +
  geom_text(data=campus_points,aes(x=X,y=Y,label=Name), color="black", vjust=-0.60, size=4.01, fontface="bold") +
  geom_text(data=campus_points,aes(x=X,y=Y,label=Name), color="white", vjust=-0.75, fontface="bold") +
  labs(x="Longtiude", y="Latitude") + theme(legend.position = "none")

While the geom_sf geometry in ggplot could have been used to plot the information, you cannot use the autoplot function with many sf class data because of the syntax. Converting the data to a typical dataframe provide a more straightforward syntax withing ggplot2 that is consistent with with aesthetics for other visualizations.

Interactive Maps

While there are a number of packages that can help create interactive maps such as plotly and mapview, for this example you are going to use leaflet. Leaflet allows you to create interactive web maps with the javaScript ‘Leaflet’ library. Because this package already has handlers for sf class data, you can use the original data imported from the kml files to complete this portion although the example will use the dataframe versions.

In the most simple version, you set the point dataset you want to use for the map, tell leaflet the sort of markers you want to use to identify the points, and then let leaflet obtain the OSM tiles for the background image.

leaflet(campus_points) %>% 
  addTiles() %>% 
  addMarkers(popup = campus_points$Name,
             lng = campus_points$X, 
             lat = campus_points$Y)

Clicking on the markers will identify the place based on the “name” variable in our data. Because there is additional data from the polygon kml you can add to the map and customize it further.

leaflet(campus_points) %>% 
  addTiles() %>%
  addPolygons(outline_poly, 
              lng = outline_poly$X, 
              lat = outline_poly$Y,
              fillColor = "grey",
              color = "black") %>%
  addCircleMarkers(popup = campus_points$Name,
                   lng = campus_points$X, 
                   lat = campus_points$Y,
                   weight = 2,
                   color = "grey",
                   fillColor = "red",
                   fillOpacity = 0.7)

Here you customized the map with APSU colors and added the polygon to the background to identify the “main” campus area. You can take this a step further and provide additional information about the campus buildings and polygon by editing the attributes of each dataset.

campus_points$Description <- c("Academic", "Administrative", "Academic", 
                               "Library", "Academic", "Food", "Administrative", 
                               "Academic", "Residential", "Administrative", 
                               "Administrative", "Bookstore", "Academic", 
                               "Academic", "Residential", "Academic", "Academic", 
                               "Academic", "Academic", "Residential", "Residential", 
                               "Administrative", "Academic")

In the script above you provided a description of each building that can be used to identify the purpose of the building when the cursor hovers over the point. This could have been completed in a previous step instead of removing the column.

leaflet(campus_points) %>% 
  addTiles() %>%
  addPolygons(outline_poly, 
              lng = outline_poly$X, 
              lat = outline_poly$Y,
              fillColor = "grey",
              color = "black") %>%
  addCircleMarkers(popup = campus_points$Name,
                   label = campus_points$Description,
                   lng = campus_points$X, 
                   lat = campus_points$Y,
                   weight = 2,
                   color = "grey",
                   fillColor = "red",
                   fillOpacity = 0.7)

Notice now that when you move the cursor around there is additional information provided about each dataset. This can be used with numerous variables or to create an attribute table for each point when clicked.

Basic Terrain and Elevation

Leaflet has a number of possible backgrounds that can be included as a basemap to your project. For example, if you were interested in elevation you can connect to a web-mapping service to obtain a zoomable terrain layer

leaflet() %>% 
  setView(lng = -112.956472, lat = 37.251343, zoom = 13) %>%
  addWMSTiles("https://basemap.nationalmap.gov/arcgis/services/USGSTopo/MapServer/WmsServer", layers = "0") %>%
  addMiniMap(zoomLevelOffset = -4) %>%
  addScaleBar()

Selectable Layers

Additionally, you can incorporate several basemaps and add an interactive selection tool to chose what background to display, set transparencies for multiple backgrounds to be displayed at once, have live updates streamed to your map, or toggle the data off and on.

leaflet() %>% 
  addTiles(group = "OSM")%>%
  addProviderTiles(providers$CartoDB.Positron, group = "CartoDB") %>%
  addProviderTiles(providers$Esri.NatGeoWorldMap, group = "NatGeo") %>%
  addProviderTiles(providers$Esri.WorldImagery, group = "ESRI") %>%
  setView(lng = -87.353069, lat = 36.533654, zoom = 17) %>%
  addPolygons(outlineOGR, 
              lng = outline_poly$X, 
              lat = outline_poly$Y,
              fillColor = "grey",
              color = "black") %>%
  addCircleMarkers(popup = campus_points$Name,
                   label = campus_points$Description,
                   group = "APSU",
                   lng = campus_points$X, 
                   lat = campus_points$Y,
                   weight = 2,
                   color = "grey",
                   fillColor = "red",
                   fillOpacity = 0.7) %>%
  addLayersControl(
    baseGroups = c("OSM", "CartoDB", "NatGeo", "ESRI"),
    options = layersControlOptions(collapsed = FALSE),
    overlayGroups = "APSU")

This is only a very small fraction of the possible applications for mapping that can be done in R.

Your Turn!

Using the skills discussed in this exercise, create two site maps for your thesis project or other dataset of interest. You should create one static and one interactive map, and they can contain as many datasets or features as you feel are necessary to properly display your data. The maps should be added to a GitHub project page created for this exercise. For an added challenge, you can try to fork this repository to give you access to the data files and a jump start on your script.