Module 6

Overview

In this module, we’ll delve deeper into random forest models. We’ll introduce confusion matrices and other ways to assess their predictive performance – how do their predictions compare to the true values? We’ll explore how to interpret the results. For example, which variables turn out to be most important, and in which direction do they affect the outcome? And we’ll extend the concepts from random forests to other types of machine learning models, particularly neural networks. The scikit-learn library has a standardized syntax, so once you are familiar with random forests, other models are relatively simple.

Learning Objectives

By the end of this module, you should be able to:

Critically assess the predictive accuracy of a machine learning model
Interpret the results of a random forest model using feature importances and partial dependence plots
Standardize data
Apply the random forests concepts to other types of machine learning models, such as neural networks

Next: Readings

Readings

Required Readings

These three articles consider some of the ethical challenges with predictive modeling. Pick two to read and skim the third. Think about whether these problems are inherent to machine learning, and/or how they might be mitigated.

Sankin, Aaron; Dhruv Mehrotra; Surya Mattu; and Annie Gilbertson. 2021. “Crime Prediction Software Promised to Be Free of Biases. New Data Shows It Perpetuates Them,” The Markup, December 2, 2021.

The Economist. 2021. Demographic skews in training data create algorithmic errors. June 5, 2021. [PDF in case you have trouble with the paywall.]

Sisson, Patrick. 2024. For Tenants, AI-Powered Screening Can Be a New Barrier to Housing, Bloomberg CityLab, September 11, 2024.

Next: Videos

Videos

Video 6a: Assessing performance

In the last module, we learned how to estimate a random forest model. But we skimmed over how to assess its predictive performance. Here, we investigate confusion matrices and other ways to evaluate a model’s predictions.

As you watch the video, follow along with the code here.

Video 6b: Interpreting results

A machine learning model can make use of dozens or even thousands of predictors. But which are the most important, and in which direction does a variable change the outcome? For example, does increasing the value of a parcel of land make it more or less likely to have an ADU? This lecture shows how to use feature importances and partial dependence plots to interpret a model’s results.

As you watch the video, follow along with the code here.

Video 6c: Neural networks and logistic regression

Random forests are just one type of machine learning model. This lecture explores two more – neural networks and logistic regression. It also demonstrates how to standardize data – a necessary step for a neural network and also, as we’ll see in the next module, for cluster analysis.