The purpose of this exercise is to help familiarize you with simple ways to explore attributes in various datasets. These skills will help you extract new datasets, connect to tabular data, and qualitatively compare different variables.
With the unprecedented growth in middle Tennessee, the Montgomery County Commission, Stormwater Management, and Health Department are working with the Tennessee Department of Environment and Conservation on an initiative to assess the relationship of brownfield sites to our community and watersheds. Brownfields are locations in communities that pose risks to future land use and development as a result of previous land use practices, particularly commercial and industrial (check out more information on brownfields here: https://www.osha.gov/brownfields/brownfields-qna). They often contain high levels of soil and water contamination, and in some cases pollutants can remain in the ecosystem for decades. Unfortunately, brownfields are often point source locations for ground and surface water contamination. The goal of the initiative is to determine if there are any spatial characteristics of these hazardous locations that have the potential to impact current and future residents of the area. The primary objectives of the initative are to: a) examine the location of brownfields in the county, b) determine which watersheds would be primarily impacted, and c) ascertain if there is a relationship between brownfield sites and any particular demographics in the county. With these three objectives, the county partners may make data-informed decisions to best support and prioritize programs that keep our community and environment safe.
In this exercise you will:
Software specific directions can be found for each step below. Please submit the answer to the questions and your final map by the due date.
The datasets used in this exercise will be found on the Exercise 5 Github Page, previous exercises such as Exercise 2 and Exercise 3, and also from the Tennessee Geographic Information Council. TN GIS maintains a number of datasets in their collections that are useful for projects involving the state of Tennessee.
As with previous exercises you should begin by launching ArcGIS Pro, creating a new blank template, and creating a folder for this specific exercise. You should now see the typical starting screen that greeted you in all of the previous exercises. While some of the data for this exercise you may already have in previous exercise folders, you will start this lab by downloading a dataset from TN GIS. While they maintain a number of quality collections, you will specifically download the statewide watershed coverage (12 digit Hydrologic Unit Code) for Tennessee. This information can be found at the following link: http://www.tngis.org/water.htm. On that page you will find the link for “Download Watershed Coverage”. Click the link, and using the download button in the upper-right corner, save the tn_wbd zip file to your project folder.
Once you have downloaded the file, navigate to the saved location to unzip the file. Within the unzipped folder you will find three additional folders titled:
These are watershed files at varying levels of detail. For hydrologic units you are looking for one with the largest number of digits to get the largest scale data. So for this exercise you will unzip the tn_12dig_huc dataset.
Finally, with that final folder extracted you will find a folder titled hydrologic_units that will contain a shapefile named wbdhu12_a_tn.shp that will be used in this exercise. This is the polygon file representing the 12 digit hydrologic unit codes for the entire state of Tennessee.
Next, you will need the tornado_data file from Exercise 2 and the census_tracts data from Exercise 3. You have a few options for obtaining this data. You can download the data again (but this time to the new project folder), you can navigate to the Exercise 2 and Exercise 3 project folders, respectively, on your computer and copy the zip files to the Exercise 5 project folder, or you can copy the data over using the catalog pane in ArcGIS Pro. While the first two options are relatively straight forward, it is important to learn how to navigate and use the catalog in ArcGIS.
On the View tab, click the Catalog Pane button to open the Catalog Window Pane on the right side of the screen. On the project tab, right-click on the folders option and click "Add Folder Connection. In the resulting window navigate to the folder you would like to connect to and single-click the folder to select it. You don’t want to double-click into the folder. You should see the name of the folder appear at the bottom of the window and the OK button should be available.
Once you have connected to the additional folders you want to use in conjunction with this project you can navigate to them within the Folders link in the Catalog Pane. While you could add data directly from the other folders, the best practice might be to copy the data from one project to another. If for example you plan to alter the data then using it directly from the previous folder would alter it there as well. This could cause future issues when returning to that project. For this exercise you can navigate to the Exercise 2 folder and copy the tornado_data file and paste it in the Exercise 5 folder. This is the safest way to move data such as shapefiles or geodatabases. Because the various data types contain numerous individual files to make up a dataset, catalog will copy/move them all correctly. If you tried to move them using File Explorer and missed one of the files associated with that data it might not work appropriately. So for Exercise 5, you will need to copy the tornado_data and montco_tracts data from exercises two and three respectively.
Finally, you will need to download the Brownfields and Demographics data from the Exercise 5, GitHub Data page. Save both in your Exercise 5 project folder and unzip the brownfields.zip file to access the dataset.
Question No. 1What is the common name of the extracted files? How many are there? What are the various file extensions?The library of Congress has a great description of the various extensions here.
As with previous exercises you should begin by launching QGIS, creating a new empty project, and creating a project folder for this specific exercise. You should now see the typical starting screen that greeted you in all of the previous exercises. While some of the data for this exercise you may already have in previous exercise folders, you will start this lab by downloading a dataset from TN GIS. While they maintain a number of quality collections, you will specifically download the statewide watershed coverage (12 digit Hydrologic Unit Code) for Tennessee. This information can be found at the following link: http://www.tngis.org/water.htm. On that page you will find the link for “Download Watershed Coverage”. Click the link, and using the download button in the upper-right corner, save the tn_wbd zip file to your project folder.
Once you have downloaded the file, navigate to the saved location to unzip the file. Within the unzipped folder you will find three additional folders titled:
These are watershed files at varying levels of detail. For hydrologic units you are looking for one with the largest number of digits to get the largest scale data. So for this exercise you will unzip the tn_12dig_huc dataset.
Finally, with that final folder extracted you will find a folder titled hydrologic_units that will contain a shapefile named wbdhu12_a_tn.shp that will be used in this exercise. This is the polygon file representing the 12 digit hydrologic unit codes for the entire state of Tennessee.
Next, you will need the tornado_data file from Exercise 2 and the census_tracts data from Exercise 3. You have a few options for obtaining this data. You can download the data again (but this time to the new project folder), you can navigate to the Exercise 2 and Exercise 3 project folders, respectively, on your computer and copy the zip files to the Exercise 5 project folder, or you can copy the data over using the browser window in QGIS. While the first two options are relatively straight forward, it is important to be confident navigating and using the browser in QGIS.
If you created a “favorites” folder you will most likely navigate within that location, however, if you haven’t created a favorite folder you will search through your drives for the tornado_data file from Exercise 2. Once you locate the file, right/CRTL click on the file and select Export Layer > To File…. In the resulting window select ESRI Shapefile as the “Format”, for the “File name” click on the browse button and give it a file name and save it to your Exercise 5 project folder. If you check the “Add Saved File to Map” button and click OK the file will be added to your layers.
Repeat this process for the census_tracts data from Exercise 3. While you could add data directly from the other folders, the best practice might be to export the data from one project to another. If for example you plan to alter the data then using it directly from the previous folder would alter it there as well. This could cause future issues when returning to that project. With these two files added to your layers you only need to download the Brownfields and Demographics data from the Exercise 5, GitHub Data page. Save both in your Exercise 5 project folder and unzip the brownfields.zip file to access the dataset.
Question No. 1What is the common name of the extracted files? How many are there? What are the various file extensions?
The library of Congress has a great description of the various extensions here.
Before you begin, you will need to open the Ex5 Colab Notebook and insert tocolab after github in the URL to open in the Colab Environment. As you have seen before, R requires various packages to complete certain analyses. In this exercise you will be using a large number of packages including: googledrive, tidyverse, ggsn, cowplot, maps, mapproj, raster, rgeos, rgdal, sp, sf, biscale. Each of these packages also contain various dependencies so it will take a while to load. In previous exercises you installed and loaded packages individually. This requires two lines of code for each package. Therefore this exercise would begin with twenty-four lines to install and load the necessary packages. So in this exercise you will learn to install and load the packages in a three line script. The first line lists the packages, the second line installs all packages, and the third line loads them. In later exercises you will learn to use a library management package to check for libraries on your computer, install them if necessary, and load the packages necessary for the project. For this exercise you will use the following script:
<-c('googledrive','tidyverse','ggsn','cowplot','maps','mapproj',
packages'raster','rgeos','rgdal','sp','sf','biscale')
sapply(packages, install.packages, character.only = TRUE)
sapply(packages, require, character.only = TRUE)
As with Exercise 3 the tigris
package needs to be loaded separately from other packages with the following script:
::install_github('walkerke/tigris')
devtoolslibrary('tigris')
The datasets needed for this exercise include: census tracts from Exercise 3, watersheds data from TN GIS, brownfields information and demographics data from this exercise. As with previous exercises all of the data for this lab will be able to be downloaded direct from either a GitHub Page or from a public website. The data from the TN GIS is stored in a Google Drive folder. Most cloud storage platforms have unique data structure that require more detailed download information than a simple *.csv being stored on a webpage. So to download a file from Google Drive you will use the googledrive
package that was installed in the list of packages above.
To avoid connecting your own Google credientials you will begin by using the drive_deauth
function which suspends authorization credentials. Depending on your use of Google Drive within R you may need to provide your login credentials or an access token. If you navigate to http://www.tngis.org/water.htm you will find the link for “Download Watershed Coverage”.
By clicking the link, a new window will open with a Google Drive download page. On this screen you can locate the file ID within the URL.
This ID will be used in the drive_downloads
function to obtain the file.
drive_deauth()
drive_download(as_id("0B9UIdGiB_LXOeVVQNm91bGpvUUE"), overwrite = TRUE)
On the Google Colaboratory page you will see a folder button on the left that opens a new pane on the left of the screen. The “directory” for this location is /content/
which can be directly accessed within Colab.
The script above downloaded the tn_wbd.zip file that is now located in the contents folder. You can use the unzip
function to extract the necessary data.
unzip('tn_wbd.zip')
In the folders pane you can see three additional files were extracted. These are watershed files at varying levels of detail. For hydrologic units the file option with the largest number of digits (e.g. 8-digit Huc vs 12-digit HUC) provides the largest scale data. So for this exercise you will unzip the tn_12dig_huc.zip dataset. In order to help organize the data, you will add a exdir = call to the script to create a new folder for the data within the contents folder. Because this exercise is specific to Colab the scripts below will differ if you are using different IDE for R.
unzip('tn_12dig_huc.zip', exdir = "/content/watersheds")
Now you can open the watersheds folder and view the contents. Due to the file structure of the zip file, the uncompressed data now contains the characters hydrologic_units\ in front of the file name. R will not permit these characters within a file or object therefore you need to rename the data before you can continue.
To start this process you will create a list of files in the watershed folder that contains the “hydrologic_units” prefix. Since we need to consistently remove the first seventeen characters, you can use the sub
function to remove those characters. Alternatively if you just needed to rename some files you could add characters between the apostrophes.
<- list.files(path = "/content/watersheds", pattern = "hydrologic_units")
names <- sub('.................','',names) new_names
Because file.rename
operates at the root directory level, you need to temporarily direct the working directory to the location of the inappropriately named files, rename them, then return to the correct working directory. It is possible there is a way around this, unfortunately via Colab or RStudio I do not know how to use file.name
outside of the directory containing the values.
setwd("/content/watersheds")
file.rename(names,new_names)
setwd("/content")
If you navigate back to the files pane you can now see the watershed files have been renamed. To return to the content folder you can use the up directory button .
Now you can use the readOGR
function from rgeos
to rad in the shapefile to a new object.
<- readOGR("/content/watersheds/wbdhu12_a_tn.shp") watersheds_data
With the watersheds dataset created you need to address the remaining datasets. Using similar steps to previous exercises you will now download and create a brownfields dataset. You will also create a new folder for this data just like in the watershed script above.
download.file('https://github.com/chrismgentry/GIS1-Exercise-5/raw/main/Data/brownfields.zip', 'brownfields.zip')
unzip('brownfields.zip', exdir = "/content/brownfields")
<- readOGR("/content/brownfields/brownfields.shp") brownfields_data
The next dataset to import will be the demographics dataset from the exercise GitHub page. Because the data is a simple *.csv file it can easily be read in with the read.csv
function.
<- read.csv('https://raw.githubusercontent.com/chrismgentry/GIS1-Exercise-5/main/Data/demographics.csv') demographics
The final dataset to import is the census tract for Montgomery County. Using the tigris
package like in Exercise 3, Step 1 you can use tracts
to obtain the dataset.
<- tracts("TN", county = "Montgomery") montco_tracts
Three of these datasets (brownfields, watersheds, and census tracts) are already in spatial data formats. In order to perform further analysis you need to make sure they have the same coordinate reference system (crs).
crs(watersheds_data)
crs(brownfields_data)
crs(montco_tracts)
You can see they are all in different projections. Therefore you need to reproject them under a single crs. For this exercise you will use EPSG:4326 which in R appears as +proj=longlat +datum=WGS84 +no_defs when referenced in the script. Because the brownfields data is already in this crs you can use it or the EPSG to correct the other datasets. Remember from Exercise 3 that data from tigris
is in a slightly different spatial data structure ( sf vs. SpatialPolygonsDataFrame) so the process to reproject that data will vary from the watersheds dataset.
<- spTransform(watersheds_data, crs(brownfields_data))
watersheds_data <- st_transform(montco_tracts, 4326) montco_tracts
You can now check the crs information for each dataset and they should all be in EPSG:4326 (or +proj=longlat +datum=WGS84 +no_defs). With the data created and reprojected to the same crs you can move on to the analysis.
Question No. 1What is the common name of the extracted files? How many are there? What are the various file extensions?
The library of Congress has a great description of the various extensions here.
The data collected in the previous section requires additional processing so you can reduce the dataset to only the pertinent information for the analyses. In this step you will use additional geoprocessing techniques and data management tools to link two datasets for further examination.
With the data collected you can now add the brownfields, census tracts, tornado_data, and wbdhu12_a_tn (watersheds) data to your project. Although there are a number of ways of isolating data to make derived datasets (e.g. Select > Lasso in Exercise 4, Step 1), in this exercise you will use another tool from the Geoprocessing Toolbox to complete this task. On the View Tab click on the Geoprocessing Toolbox button to open the Geoprocessing pane on the right side of the screen. By navigating through the tools menus you will find Select under Analysis Tools > Extract. With this tool you will write a simple expression to “select” a small portion of the data you need for further analysis. To do this, double-click the Select tool and in the resulting pane input the following parameters:
*The tornado dataset is only being used to obtain a polygon for Montgomery County for the clip process in the next step.
This will add the new montgomery_county shapefile to your contents. You can now remove the tornado dataset because it will no longer be needed. With the polygon of Montgomery County available you can now use the Clip tool like in Exercise 4, Step One to clip the brownfields and watersheds datasets to reduce them to only those within Montgomery County. If you receive a “Datum conflict” warning, for the purposes of this exercise, you can ignore it an continue with the clip. Recall that the Input Features is the data you want to reduce, the Clip Feature is the data you want it to take the shape of, and Output Feature Class is what you are naming the new file. Refer back to Exercise 4, Step One for more information about Clip.
With the new clipped datasets you can remove or just uncheck (in case you want to use them in your final map) the full brownfields and watersheds datasets to reduce clutter. You can also now zoom in closer to view only Montgomery County.
In the final step to prepare the data, you are going to connect a non-spatial data to the census tract dataset. In Step One you downloaded a file titled demographics.csv. This file contains comma-separated values detailing additional demographic data that you need to append to the census tract data. Although the process is relatively straight-forward, there are a number of steps that need to be taken in order to join the data.
First, if you haven’t already, add the demographics.csv file to your table of contents. This can be done from the Catalog Pane or with the “Add Data” button like in previous exercises. Because ArcGIS Pro treats *.csv files as “read only” you need to convert it to a table that can be edited in the software. Now, right-click on the demographics.csv standalone table and go to Data > Export Table. In the resulting window choose the following options:
Before clicking OK, you need to expand the Fields section of the window and click on Tract in the Output Fields column. Then click on the Properties Tab and change the Type field to Text. Then click OK. If you continued without changing the field type, the variable would most likely be treated as a numerical value. If you open the attribute table for any dataset and mouse-over the variable column without clicking a pop-up window will appear detailing the Type and other parameters of the variable. In the census dataset from the previous exercise, the NAME variable is Type: Text (7). The seven in parenthesis means the max number of available characters is seven. So before you export a table it is good practice to make sure the variables match the variables you intend to join or that the variables will be treated in a manner necessary for additional analyses.
The new standalone table should have been added to the Table of Contents. If not you should add it now; the csv table can be removed. Now you can connect the new table to the census tract dataset. Begin by right-clicking on the census data and selecting Joins and Relates > Add Join. In the new Add Join window select the following options (your file names may vary):
For this exercise keep the “Keep All Target Features” button checked and if you receive an warning about an indexing error with the census data you can ignore it for this exercise. Then click the Validate Join button. This will pop-up an new window that will describe the process of checking the two datasets to see if they can be joined. At the bottom of the dialog you should see a line that says there were 39 joins. Close that message and click OK to run the join.
Finally, open the attribute table for the census tracts and scroll to the far right of the table. If the join worked properly you should see a number of additional fields added to the table.
This will provide all of the data and information you need to visualize the data and make comparisons of the watersheds.
Question No. 2How many watersheds cover Montgomery County? Although they have been clipped from their original geometry, which watershed is the largest? Which is the smallest?
With the data collected you can now add the brownfields, census tracts, tornado_data, and wbdhu12_a_tn (watersheds) data to your project. Although there are a number of ways of isolating data to make derived datasets (e.g. Select Features > Select Features by Freehand in Exercise 4, Step 1), in this exercise you will use another tool from Vector Selection in the Processing Toolbox to complete this task.
With this tool you will select only a small portion of the data you need for further analysis. To do this, double-click the “Select”Extract by Attribute" tool and in the resulting window input the following parameters (file names may vary):
Remember that in QGIS you have the ability to either create permanent files or temporary layers. Because you will only be using the Montgomery County dataset to clip files later on, you can decide whether to use the browse button to save the file for future use or just create a temporary file.
This will add the new montgomery_county temporary file (or shapefile if saved) to your layers. You can now remove the tornado dataset because it will no longer be needed. You may also consider renaming it if necessary. With the polygon of Montgomery County available you can now use the Clip tool like in Exercise 4, Step One to clip the brownfields and watersheds datasets to reduce them to only those within Montgomery County. Recall that the Input layer is the data you want to reduce (e.g. brownfields or watersheds), the Overlay layer is the data you want it to take the shape of. Refer back to Exercise 4, Step One for more information about Clip. You should go ahead and use the browse button to save these as permanent files. Be sure to use a naming convention that will allow you to recal what the files are later on (e.g. montco_brownfields).
With the new clipped datasets you can remove or just uncheck (in case you want to use them in your final map) the full brownfields and watersheds datasets to reduce clutter. You can also now zoom in closer to view only Montgomery County.
In the final step to prepare the data, you are going to connect a non-spatial data to the census tract dataset. In Step One you downloaded a file titled demographics.csv. This file contains comma-separated values detailing additional demographic data that you need to append to the census tract data. Although the process is relatively straight-forward, there are a number of steps that need to be taken in order to join the data.
First, you will add the demographics.csv using Layer > Add Layer > Add Delimited Text Layer from the menu bar, by clicking the “Add Delimited Layer” button , or by using the shortcut keys CRTL+Shift+T/CMD+Shift+T.
In the resulting window, use the browse button to find the demographics.csv data in your project folder. Be sure that “No geometry (attribute only table)” is selected and the the rest of the information as the default. Click Add.
The demographics table should now be added to your layers. To connect it to the population information, right/CRTL click on the census tract data and select “properties”. In the Properties Menu, select the Joins tab from the left side menu. At the bottom of the screen click the plus (+) symbol to add a join. In the new window select the following parameters:
Leave the “Cache join layer in memory” checked and scroll down to “Joined Fields”. Click all of the fields and scroll down to the “Custom field name prefix” option and check the box. Remove all of the text in the box and then click OK.
Be sure to click OK on the properties window as well to complete the join. Finally, open the attribute table for the census tracts and scroll to the far right of the table. If the join worked properly you should see a number of additional fields added to the table.
This will provide all of the data and information you need to visualize the data and make comparisons of the watersheds.
Question No. 2How many watersheds cover Montgomery County? Although they have been clipped from their original geometry, which watershed is the largest? Which is the smallest?
There are some additional processing steps needed on the collected datasets before you can move on to the visualizations. In Exercise 3, Step 2 you used the merge
function to combined the census tracts and population data. In this exercise you will repeat those steps to connect the demographics dataset to the census tracts.
<- merge(x = montco_tracts, y = demographics, by.x = "NAME", by.y = "Tract", all = TRUE) census_tracts
By plotting the brownfield or watersheds data using ggplot2
or quickly using plot(x)
where x = the name of the dataset, you will like be able to determine that the data included extends far beyond the boundaries of Montgomery County. In the previous exercise the intersect
function was used to clip out the counties within the hurricane buffer. In this exercise you will use the crop
function to see how it varies from intersect
. For this step you will want to crop the brownfields and watersheds datasets by the census tracts to retain only those within the county.
<- crop(brownfields_data,census_tracts)
montco_brownfields <- crop(watersheds_data,census_tracts) montco_watersheds
If you create a ggplot()
of the data you will see how the datasets were subset. Essentially, a bounding box (extent of the dataset from the most upper-left portion to the lower-right) was created for the census tract data and any data within that box was retained. Why might this potentially be problematic in some instances? In what instance might this be the best method?
Before proceeding to the visualization portion you need to create a couple tables to help answer the questions asked by your community partners. One being how many brownfield occur within each watershed and census tract. To do this you can use the over
function from the rgeos
to create a count of the overlapping data. This is called a spatial join and it returns a count of the points that fall within a specific polygon. More information on over
can be found here.
<- over(montco_brownfields,montco_watersheds)
brownfields_per_watershed <- as.data.frame(table(brownfields_per_watershed$HUC_12))
watershed_table <- transform(watershed_table, Var1 = as.character(Var1))
watershed_table colnames(watershed_table) <- c("HUC_12","BF_Count")
watershed_table
In the script above, over
created the spatial join analysis. The next line converted the information into a dataframe table with columns of variables and rows of observations. However, because the watershed names (HUC_12) a numeric, the table would not be able to be joined to a previous dataset because the numeric name for the watershed was treated as a character (nominal data). So the transform
function was used to convert the names to characters. Finally, colnames
was used to rename the columns to names that are more appropriate for the data.
Once the table has been created you will need to connect the table to the original data so the information can potentially be included in the visualization. Because some of the watershed do no contain a brownfield there will be NA values reported for those observations. So there will be a line added to replace the NA values with zeros in the script below:
<- merge(x = montco_watersheds, y = watershed_table, by.x = "HUC_12", by.y = "HUC_12", all = TRUE)
montco_watershed_data @data$BF_Count[is.na(montco_watershed_data@data$BF_Count)] <- 0
montco_watershed_data@data montco_watershed_data
While you have seen the merge
function before, notice the syntax for the second line. Because the data is a SpatialPolygonsDataFrame the data table that contains the data will be held in a slot (a container for information in certain file types) call data. So to examine the information you will call to the object and add _@data_ following the object name. In the example above, the script is identifying the BF_Count variable in the watersheds data slot and saying “if there is a na value in that variable within the slot, it should be replaced with a 0.” If you examine the object only you will notice there are actually several slots including:
Remember that you should always examine new datasets when they are imported into your project. Because the census tract data is a slightly different format than the watershed you will complete a very similar process to the above script with a slight modification to convert the census tracts to a SpatialPolygonsDataFrame.
<- sf::as_Spatial(census_tracts)
census_tracts_spdf <- over(montco_brownfields,census_tracts_spdf)
brownfields_per_tract <- as.data.frame(table(brownfields_per_tract$NAME))
census_tract_table <- transform(census_tract_table, Var1 = as.character(Var1), Freq = as.numeric(Freq))
census_tract_table colnames(census_tract_table) <- c("Name","BF_Count")
census_tract_table
With the table created you can now attach it to the original census data.
<- merge(x = census_tracts, y = census_tract_table, by.x = "NAME", by.y = "Name", all = TRUE)
census_tract_dataset $BF_Count[is.na(census_tract_dataset$BF_Count)] <- 0
census_tract_datasetstr(census_tract_dataset)
while you could have kept the SpatialPolygonsDataFrame version of the census data, it is important to know how to manage different classes of data. So in this exercise the census tract data will continue to be simple features (sf) data while the watershed data will be sp.
If you examine the brownfields dataset it is also a type of sp data called a SpatialPointsDataFrame. Because the values are points and not polygons you can see the type changes accordingly. If you create a visualization of the data you will find that while crop worked to subset the information based on the bounding rectangle of the census tracts, because of the shape of Montgomery County and the location of the brownfields one record was retained erroneously. The first record “The Compost Company” is located south of the county and therefore needs to be removed from the dataset. To do this you will convert the brownfields dataset to a data frame and remove the first row of data. Because items in a data frame can be identified by row (first value) and column (second value) simply adding [-1,] behind the object name will “subrtract” the entire row from the dataset. Since you are already editing the data, it would also make sense to rename the column to names that match ggplot2
nomenclature.
<- as.data.frame(montco_brownfields)
brownfields_dataset colnames(brownfields_dataset) <- c("Name", "long", "lat", "NA")
<- brownfields_dataset[-1,]
brownfields_dataset brownfields_dataset
Finally, just in case you need a simple outline of Montgomery County for your visualization in the next step you can create an object polygon of the county with a similar script to create the states information in Exercise 2, Step 1.
<- map_data("county")
counties <- subset(counties, region == "tennessee")
tn_counties <- subset(tn_counties, subregion == "montgomery") montco
How many watersheds cover Montgomery County? Although they have been clipped from their original geometry, which watershed is the largest? Which is the smallest?HINT: View acres in the watershed dataset.
In this step you will need to examine the spatial distribution of brownfields within the watersheds of Montgomery County and make some qualitative interpretations of potentially impacted urban areas.
Examine the spatial distribution of the brownfield throughout the county. The clustering should be relatively apparent and might match up with your knowledge of industrial activities in the various areas of Montgomery County. In order to help quantify the number of brownfields in each watershed you can use a Spatial join to create a count variable for this information. To do this, right-click on the montgomery county watershed dataset and go to Join and Relates > Spatial Join
In the resulting window, select the following parameters (your file names may vary):
You can leave the remaining items blank and click OK.
By examining the attribute table for the new dataset you should see a new variable called Join_Count. This is the number of brownfields that occur within each watershed.
Using the skills you learned in Exercises Two, Three, and Four you can now make a map that shows Montgomery County, the location of brownfields and watersheds in a graduated color scheme by number of brownfields. Remember to include cartographic elements such as legend, scale bar, north arrow, etc. In this visualization you may also want to add a different basemap or inset map that provides additional supporting information.
Question No. 3Which watershed contains the most brownfields?
Examine the spatial distribution of the brownfield throughout the county. The clustering should be relatively apparent and might match up with your knowledge of industrial activities in the various areas of Montgomery County. In order to help quantify the number of brownfields in each watershed you can use Join attributes by location (Summary) from the Vector General menu in the Processing Toolbox. this will create a count of the total number of brownfields per watershed polygon. In the Join attributes by location (Summary) select the following options (layer names may vary):
In the Summaries to calculate… you will click the browse button and select only “count”. When selected, click the blue arrow button to return to the previous page. As before, you can leave this as a temporary file* or you can choose to create a permanent file for future use. Now click Run.
*If you choose to make a temporary file you should rename it in the layers with a sensible name.
By examining the attribute table for the new dataset you should see a new variable called Name_count. This is the number of brownfields that occur within each watershed. Unfortunately, some of the cells are populated with “Null” values. You need to remove these in order to create a proper graduated color scheme. To do this you will open the attribute table for newly created dataset and click the Field calculator button . In the new window uncheck the box for “Create a new field” and check the box for “Update existing field”. In the drop-down menu below the box select Name_count. In the Expression box type the following and click OK:
if("Name_count" is null, 0, "Name_count")
This will replace all of the null values with zero values. Finally, click the Toggle editing mode button and save the changes to the table.
Now, using the skills you learned in Exercises Two, Three, and Four you can now make a map that shows Montgomery County, the location of brownfields and watersheds in a graduated color scheme by number of brownfields. Remember to include cartographic elements such as legend, scale bar, north arrow, etc. In this visualization you may also want to add a different basemap or inset map that provides additional supporting information.
Question No. 3Which watershed contains the most brownfields?
Examine the spatial distribution of the brownfield throughout the county. The clustering should be relatively apparent and might match up with your knowledge of industrial activities in the various areas of Montgomery County. Use the skills you developed in Exercises 4, 3, and 2 to complete the scripts below and create the visualizations for the number of brownfields per watershed and the number of brownfields per census tract. Don’t forget to add all of the necessary map elements and feel free to add any ancillary data from this or previous exercises that enhance your map.
Watershed Map:
#ggplot() +
#geom_polygon(data = DATASET, aes(x=long, y=lat, group=group, fill = SELECT THE APPROPRIATE VARIABLE), color = "gray", size = 0.25, linetype="dashed") +
#geom_polygon(data = montco, aes(x=long, y=lat, group=group), fill = NA, color = "black", size = 1) +
#geom_point(data = DATASET, aes(x=long, y=lat), color = "SELECT A COLOR") +
#coord_fixed() +
#ADD THE NECESSARY ELEMENTS HERE
Census Tract Map:
#ggplot() +
#geom_sf(data = DATASET, aes(fill = SELECT THE APPROPRIATE VARIABLE)) + scale_fill_viridis_c(direction = -1, option = "A") +
#geom_point(data = DATASET, aes(x=long, y=lat), color = "red") +
#coord_sf() +
#ADD THE NECESSARY ELEMENTS HERE
Remember to remove all # comment tags before running the edited scripts.
Question No. 3Which watershed contains the most brownfields?
After discussing the results of the previous analysis with your colleagues at County Commission, Stormwater Management, Health Department, and TDEC, they are interested in seeing how the location of brownfields impacts the community. Although the commission districts do not perfectly replicate the census tracts, the County Commissioners and the Health Department want to know if the brownfield sites are directly related to census tracts with large minority populations. They are concerned by a recently published report that states:
“While there is no single way to characterize communities located near our sites, this population is more minority, low income, linguistically isolated, and less likely to have a high school education than the U.S. population as a whole. As a result, these communities may have fewer resources with which to address concerns about their health and environment.”
During these discussions the Health Department would also like to know if the areas with a high number of brownfields have higher populations of children.
Using the skills you learned in this and previous exercises, create a new spatial join between the census tracts and brownfields datasets (for this exercise ignore any datum warning). As with the previous spatial join, you should now have an additional variable labeled “Join_Count” that details the number of brownfields per census tract.
One way you can view two variables at once on a map is to create bivariate symbology.
This creates a grid of colors with an X and Y axis with the following categories:
The other grid cells represent midpoints in the variables. The variables can be selected in the symbology pane where “Field 1” is one variable and “Field 2” is the other variable. So to visualize brownfields and one of the population demographics select one for each field. For “Grid Size” select 3x3. You can select your own color scheme with the drop-down menu. In the fields section below, you can rename the fields (e.g. “join count” means nothing so give it an appropriate name).
With these settings you should have a symbology that shows if a census tract is high or low in either of the particular variables. Test a number of different variable combinations with the count of brownfields and various demographic categories.
Question No. 4Which census tract contains the most brownfields?
Using the skills you learned in this and previous exercises, create a new Join attributes by location (Summary) between the census_tracts and brownfields datasets (for this exercise ignore any datum warning). As with the previous spatial join, you should now have an additional variable labeled “Name_count” that details the number of brownfields per census tract.
One way you can view two variables at once on a map is to create bivariate symbology. This creates a grid of colors with an X and Y axis with the following categories:
The other grid cells represent midpoints in the variables. You can create these bivariate maps in QGIS with some editing of the individual color palettes in your graduate colors map and by using a plug-in to create the legend. In order to create the map you first need to duplicate your layer. This example will use the census tract dataset with the “Name_count” variable for number of brownfields and variables for different demographics such as total population (tot_pop). To duplicate a layer, right/CRTL click on a layer and select Duplicate Layer. If it helps you to keep the two layers organized, you can amend their layer name to include the variable used for the graduated colors (e.g. tracts_brownfields_count or tracts_brownfields_pop). Otherwise just remember what each one is displaying.
Next, you will need to create a graduated symbology for each layer with three (3) categories and a specific color palette. Because bivariate maps blend two color schemes it is important to have the appropriate color selection. Here is an example of several bivariate color palettes from Joshua Stevens
This example will use the second color palette. There are a few steps required to match your graduated colors categories to the color palettes on the X and Y axes in the image above:
Finally, click OK on the windows in Steps 3 and 4 and repeat the process for the next color moving progressively up or right on the selected bivariate palette depending on the layer. This process will need to be completed for both of the layers (one with the brownfield count and the other with a demographic variable) you will use in your bivariate map.
Once you have completed this process for each of the layers you should return to the properties menu of the first layer and under the Symbology tab scroll down to Layer Rendering and expand the options to see the options for Blending mode. Choose multiply for the layer, and normal for the feature and click OK.
With both layers viewable you should now have a bivariate color palette for the data.
Because QGIS is unable to interpret the appropriate legend style from the available data you need to add a plug-in. Click Plugins > Manage and Install Plugins from the menu bar.
Next, search for “Bivariate Legend” in the search bar. Be sure All is selected in the left side column. This should bring up the Bivariate Legend plugin. Click Install Plugin in the lower right of the window to install the legend generator.
With the plugin installed you can return to your project. You will now have a Bivariate Legend button on your toolbar. Click the button to open the legend generator. Select the appropriate Top layer from the drop-down menu and click the box for Reverse colors. Select the appropriate Bottom layer, set the Square width to 48, and choose Multiply for the drop-down menu below. Then click Generate legend.
Finally, click Export legend to image and save it in your project folder with a .png extension. In your layout, use the Add image button and draw the image box while holding the shift key to constrain the box to a perfect square. Next, right-click in the box and go to Item properties to locate the image file. In the properties pane, select Raster Image and navigate to your project folder to add the image.
Remember that this is now simply an image which means you will need to manually create the legend using the image and inserted text. Be sure you remember which data belongs on which axis and think about how you want the reader to interpret the information. Here is an example of one that could be used for the legend created above.
You should experiment with different demographics to see how they relate to the number of brownfields before completing the assignment. You will want to select the variable that you believe shows which demographic would most be impacted by the presence of brownfields in the various tracts.
Question No. 4Which census tract contains the most brownfields?
As stated above, the County Commissioners and the Health Department want to see the relationship between census tracts with large minority populations and brownfields and the Health Department would also like to see the relationship between brownfields juvenile populations. The visualizations you created above simply show the number of brownfields per watershed/census tract. You could create a qualitative visualization where demographics (either minority class or age class) is used as the fill with brownfield locations as an overlay. However, you can also create a two variable quantitative map called a bivariate map that displays quantitative categories for two different variables. This creates a grid of colors with an X and Y axis with the following categories:
The other grid cells represent midpoints in the variables. You can create these bivariate maps in R manually or by using the biscale
package loaded at the beginning of this exercise. Because bivariate maps blend two color schemes it is important to have the appropriate color selection. Graduated symbology palettes have been created that cover a range of color possibilities for the legend. These include:
Like the visualization you created for Exercise 3, Step 3, you will rely on syntax from cowplot
to overlay the map and legend from biscale
. However, currently the only demographic data in this exercise is by race. So before you begin this process you need to take the additional step of connecting the population data from Exercise 3, Step 2. The same script can be used to obtain the dataset:
<- read.csv('https://raw.githubusercontent.com/chrismgentry/GIS1-Exercise-3/main/Data/mont_co_pop.csv', colClasses=c(Tract="character")) population
To connect the data to the census tract dataset complete the following script:
#census_with_pop <- merge(x = census_tract_dataset, y = DATASET, by.x = "VARIABLE", by.y = "VARIABLE", all = TRUE)
If you have questions about completing the script above review the information from Exercise 3, Step 2 or from similar steps earlier in this exercise.
To create the bivariate dataset you need to use the bi_class
function from biscale.
<- bi_class(census_with_pop, x = total_pop, y = BF_Count, dim = 3, style = "jenks") bivariate_data
In this script you identify the dataset to be used to create the bi_class data. You identify the x variable, in this example total population and the y variable for the count of brownfields. Finally you provide the number of dimensions (3) and style to calculate the divisions in the data. For this example you will use Jenks Natural Breaks Classification simply identified as “jenks”.
Now you can use the information above to create the visualization based on total population. You will need to create a new code block or alter the one above to identify an x variable appropriate to answer the questions posed by the County Commission and Health Department.
<- ggplot() +
bivariate_map geom_sf(data = bivariate_data, mapping = aes(fill = bi_class), color = "white", size = 0.1, show.legend = FALSE) +
geom_point(data = brownfields_dataset, aes(x=long, y=lat), color = "red") +
bi_scale_fill(pal = "DkViolet", dim = 3) +
theme_void()
You can use this same example script to create a map based on a different demographic variable (x) as long as it is identified appropriately in the bivariate_data script block above. Next you need to create an object to serve as the legend. Unfortunately, because biscale
legends are not linked to the data you will need to overlay it as an “image”. So be careful that any changes made to the data are reflected in the legend as well.
<- bi_legend(pal = "DkViolet",
legend dim = 3,
xlab = "Total Population",
ylab = "No. Brownfields",
size = 10)
With the map and legend objects created you can now script the base of the final map using the same process as the last exercise.
<- ggdraw() +
final_map draw_plot(bivariate_map, 0, 0, 1, 1) +
draw_plot(legend, 0.7, 0, 0.25, 0.25)
final_map
For this map, total population was simply used to provide an example script. So to finish this exercise you will need to alter the x variable in the bivariate_data script to determine if there is any relationship between demographics (race or age) for the County Commission and Health Department.
Question No. 4Which census tract contains the most brownfields?
In the report you provide to the County Commission, Stormwater Management, Health Department, and TDEC please provide the following information:
Be sure to include mpas you created to support your report. When complete, send a link to your Colab Notebook or word document with answers to Questions 1-4 and your completed map(s) via email.