In this module, we’ll look at clustering algorithms. Clustering is an exploratory technique to identify sensible groupings in a dataset – types of transit agency or neighborhood, perhaps. Clustering the data in this way can help identify regularities, and suggest policies that are appropriate for different types of city or agency. Or clustering can identify a peer group against which to benchmark (say) affordable housing construction costs or transit reliability.

Learning objectives

By the end of this module, you should be able to:

  1. Implement a k-means cluster analysis using the scikit-learn library
  2. Evaluate the appropriate number of clusters, using graphical tools such as radar plots
  3. Interpret the meaning of a cluster analysis

Required Readings

Here’s an empirical example of cluster analysis. Think about the extent to which the clustering adds understanding or explanatory power. How useful is it compared to simple cross tabulations, scatter plots, and other descriptive analysis tools?

Bohorqueza, John J, Anthony Dvarskas and Ellen K. Pikitch. 2019. Categorizing global MPAs: A cluster analysis approach. Marine Policy 108: 103663. 

Optional readings

Here’s an example from my own work. Focus on Section 3.2; if you want more technical detail, see Section F of the Supporting Information.

Barrington-Leigh, Christopher and Adam Millard-Ball. 2019. A global assessment of street-network sprawl, PLoS ONE 14(11): e0223078.

Video 7a: Introduction to clustering

This lecture introduces the purposes and potential uses of clustering, and other exploratory data analysis techniques such as pairplots. It then demonstrates how to implement one of the most common clustering algorithms, k-means, using the scikit-learn library.

As you watch the video, follow along with the code here.

 

Video 7b: Visualizing clusters

We’ll explore how to visualize and interpret clusters, particularly through radar plots, and continue to develop our mapping capabilities.

As you watch the video, follow along with the code here.

 

Video 7c: Spatial clusters

Cluster analysis can also be used to identify spatial clusters, such as neighborhoods with a large number of ADU. In this lecture, we’ll adapt k-means cluster analysis to the spatial context.

As you watch the video, follow along with the code here.

 

Please take the quiz below to check your understanding of this module.

Quiz for currently enrolled UCLA students

Quiz for other learners

This notebook practices the concepts that we’ve developed in the lecture notebooks. We’ll work through it in class.

It’s the Module 7 class activity in your GitHub repository here.

You have now completed Module 7. Please navigate to the homepage or to the next module by using the green navigation bar at the top.