In this module, we’ll continue to practice cleaning, joining, and other data wrangling tasks. However, we’ll take a spatial perspective. Rather than joining data by column, we’ll join them using spatial relationships. For example, which point locations (building permits, food pantries, homeless shelters) are within which census tract? What’s the closest transit stop to a school? How can we calculate areas and distances?
We’ll make heavy use of the geopandas library, which provides spatial extensions to pandas. We’ll also explore public transit data, making use of the standard GTFS format.
Learning objectives
By the end of this module, you should be able to:
- Create a GeoDataFrame by adding a geometry column
- Evaluate and choose an appropriate projection for your geometry
- Implement a spatial join using different predicates such as intersects and nearest neighbor
- Evaluate the success of a spatial join and troubleshoot where necessary
- Compute areas, lengths, and other attributes
- Import and parse transit data in GTFS format
There are no assigned readings for this module. People tend not to write about their data wrangling, so there are few good planning-related case studies!
Optional readings
Here’s an example application of the GTFS feeds discussed in the lecture on transit data (Module 3).
Video 4a: Spatial joins
This lecture examines joins in a spatial context, using the example of food pantries in Los Angeles.
As you watch the video, follow along with the code here.
Video 4b: Advanced spatial joins
We’ll consider different types of spatial joins, and practice troubleshooting when things go wrong.
As you watch the video, follow along with the code here.
Video 4c: Distances and nearest neighbors
This lecture shows how to find the nearest neighbors of a feature, such as the closest food pantry to a census tract. We’ll also explore distance, length, and other geometric calculations in geopandas.
As you watch the video, follow along with the code here.
Video 4d: Transit data
The General Transit Feed Specification is a common format for sharing transit data on schedules, fares, and so on. Here we’ll use the partridge library to parse GTFS data files.
Please take the quiz below to check your understanding of this module.
This notebook practices the concepts that we’ve developed in the lecture notebooks. We’ll work through it in class.
It’s the Module 4 class activity in your GitHub repository here.
Here are some tips on submitting via GitHub. They are from last year’s course; the URLs are different but the steps are the same.
You have now completed Module 4. Please navigate to the homepage or to the next module by using the green navigation bar at the top.