In this module and the next, we’ll look at one common use of machine learning models – classification. For example, which neighborhoods are likely to gentrify? On which parcels are Accessory Dwelling Units (e.g. backyard units) likely to be built? Which polluters are likely to exceed their permitted discharges?

These are examples of supervised machine learning problems. In other words, we know the right answer for at least a subset of the data, but want to predict which observations fall into which categories. 

Machine learning is a very large field, and there are entire courses on the theory and applications. Here, we will give a very high-level overview. We’ll skate over the theoretical underpinnings and focus on implementing the models in Python.

In this module, we’ll walk through data preparation, and the process of estimating a common machine learning model: random forests

Learning objectives

By the end of this module, you should be able to:

  1. Perform more complex joins and other data wrangling operations
  2. Split a dataset into training and testing portions
  3. Estimate a random forests model
  4. Interpret a random forests decision tree

Required readings

The first two papers are examples of applied machine learning in the urban and environmental planning context. Pick one to focus on and skim the second. In particular, think about the advantage (if any) of the machine learning algorithms compared to a traditional regression model.

Reades, Jonathan; Jordan De Souza; and Phil Hubbard. 2019. Actions Understanding urban gentrification through machine learning, Urban Studies 56(5) 922-942.

Hino, M; E Benami; and N Brooks. 2018. Machine learning for environmental monitoring, Nature Sustainability 1: 583-588.

Optional readings

Here’s another applied example of random forests models in planning.

Tribby, Calvin; Harvey Miller; Barbara Brown; Carol Werner; and Ken Smith. 2017. “Analyzing walking route choice through built environments using random forests and discrete choice techniques,” Environment and Planning B: Urban Analytics and City Science 44(6): 1145-1167.

Video 5a: Data preparation

In this lecture, we’ll practice data wrangling and spatial joins, through preparing a dataset for a machine learning analysis. The example: Accessory Dwelling Units in Los Angeles.

As you watch the video, follow along with the code here.

 

Video 5b: Random forests

We’ll demonstrate how and why to split a dataset into training and testing subsets, and how to estimate and interpret one of the most common machine learning models – random forests.

As you watch the video, follow along with the code here.

 

Please take the quiz below to check your understanding of this module.

Quiz for currently enrolled UCLA students

Quiz for other learners

This notebook practices the concepts that we’ve developed in the lecture notebooks. We’ll work through it in class.

It’s the Module 5 class activity in your GitHub repository here.

Now that we are halfway done with the course, please complete the mid-course check-in survey here. 

You have now completed Module 5. Please navigate to the homepage or to the next module by using the green navigation bar at the top.