In this module, we’ll continue to practice cleaning, joining, and other data wrangling tasks. However, we’ll take a spatial perspective. Rather than joining data by column, we’ll join them using spatial relationships. For example, which point locations (building permits, food pantries, homeless shelters) are within which census tract? What’s the closest transit stop to a school? How can we calculate areas and distances?

We’ll make heavy use of the geopandas library, which provides spatial extensions to pandas. We’ll also explore public transit data, making use of the standard GTFS format.

Learning objectives

By the end of this module, you should be able to:

  1. Create a GeoDataFrame by adding a geometry column 
  2. Evaluate and choose an appropriate projection for your geometry
  3. Implement a spatial join using different predicates such as intersects and nearest neighbor
  4. Evaluate the success of a spatial join and troubleshoot where necessary
  5. Compute areas, lengths, and other attributes
  6. Import and parse transit data in GTFS format

There are no assigned readings for this module. People tend not to write about their data wrangling, so there are few good planning-related case studies! 

Optional readings

Here’s an example application of the GTFS feeds discussed in the lecture on transit data (Module 3).

Liu, Luyu and Harvey J Miller. 2021. Measuring risk of missing transfers. Urban Studies, 58(15): 3140-3156↓

Video 4a: Spatial joins

This lecture examines joins in a spatial context, using the example of food pantries in Los Angeles.

As you watch the video, follow along with the code here.

Video 4b: Advanced spatial joins

We’ll consider different types of spatial joins, and practice troubleshooting when things go wrong.

As you watch the video, follow along with the code here.

Video 4c: Distances and nearest neighbors

This lecture shows how to find the nearest neighbors of a feature, such as the closest food pantry to a census tract. We’ll also explore distance, length, and other geometric calculations in geopandas.

As you watch the video, follow along with the code here.

Video 4d: Transit data

The General Transit Feed Specification is a common format for sharing transit data on schedules, fares, and so on. Here we’ll use the partridge library to parse GTFS data files.

As you watch the video, follow along with the code here.

Please take the quiz below to check your understanding of this module.

Quiz for currently enrolled UCLA students

Quiz for other learners

This notebook practices the concepts that we’ve developed in the lecture notebooks. We’ll work through it in class.

It’s the Module 4 class activity in your GitHub repository here.

The homework assignment is here. Please submit on GitHub.

Here are some tips on submitting via GitHub. They are from last year’s course; the URLs are different but the steps are the same.

You have now completed Module 4. Please navigate to the homepage or to the next module by using the green navigation bar at the top.