Module 5

Overview

In this module and the next, we’ll look at one common use of machine learning models – classification. For example, which neighborhoods are likely to gentrify? On which parcels are Accessory Dwelling Units (e.g. backyard units) likely to be built? Which polluters are likely to exceed their permitted discharges?

These are examples of supervised machine learning problems. In other words, we know the right answer for at least a subset of the data, but want to predict which observations fall into which categories.

Machine learning is a very large field, and there are entire courses on the theory and applications. Here, we will give a very high-level overview. We’ll skate over the theoretical underpinnings and focus on implementing the models in Python.

In this module, we’ll walk through data preparation, and the process of estimating a common machine learning model: random forests.

Learning objectives

By the end of this module, you should be able to:

Perform more complex joins and other data wrangling operations
Split a dataset into training and testing portions
Estimate a random forests model
Interpret a random forests decision tree

Next: Readings

Readings

Required readings

These three papers are examples of applied machine learning in the urban and environmental planning context. Pick one to focus on and skim the others. In particular, think about the advantage (if any) of the machine learning algorithms compared to a traditional regression model. The first two are more conventional, while the third is at the cutting edge of using machine learning to make causal inferences (rather than descriptive or predictive analyses).

Reades, Jonathan; Jordan De Souza; and Phil Hubbard. 2019. Actions Understanding urban gentrification through machine learning, Urban Studies 56(5) 922-942.

Hino, M; E Benami; and N Brooks. 2018. Machine learning for environmental monitoring, Nature Sustainability 1: 583-588.

Nachtigall, F; F Wagner; P Berrill; and F Creutzig. 2025. Built environment and travel: Tackling non-linear residential self-selection with double machine learning, Transportation Research Part D: Transport and Environment 140: 104593

Optional readings

Here’s another applied example of random forests models in planning.

Tribby, Calvin; Harvey Miller; Barbara Brown; Carol Werner; and Ken Smith. 2017. “Analyzing walking route choice through built environments using random forests and discrete choice techniques,” Environment and Planning B: Urban Analytics and City Science 44(6): 1145-1167.

Next: Videos

Videos

Video 5a: Data preparation

In this lecture, we’ll practice data wrangling and spatial joins, through preparing a dataset for a machine learning analysis. The example: Accessory Dwelling Units in Los Angeles.

As you watch the video, follow along with the code here.

Video 5b: Random forests

We’ll demonstrate how and why to split a dataset into training and testing subsets, and how to estimate and interpret one of the most common machine learning models – random forests.