Module 2

Overview

This week we’ll focus on how to scrape web pages.

We’ll use examples from the City of Seattle permits database, and from craigslist.

As before, the course videos will provide the basic concepts and backgrounds. As you watch them, follow along in the notebook (which is in your GitHub repository). Pause the video to explore the objects, try different things, and experiment. Don’t worry if you don’t get every last detail – we’ll use the class time to practice and introduce more examples.

Learning objectives

By the end of this module, you should be able to:

Evaluate when web scraping is needed, rather than a simpler solution
Design a scraping approach that considers the structure of a web page
Implement a web scraper for a given page
Critically analyze ethical, legal, and representational concerns (e.g. who is excluded) around webscraping

Next: Readings

Readings

Required readings

Boeing, Geoff and Paul Waddell. 2016. New Insights into Rental Housing Markets across the United States: Web Scraping and Analyzing Craigslist Rental Listings, Journal of Planning Education and Research 37(4): 457-476 ↓

(Here is an update about some of the legal questions raised in the Boeing and Waddell paper.)

Think about the following questions as you read the paper:

What are the biases in the Craigslist housing data? Are they more or less severe than in other housing market data? How should planners handle this?
What are any ethical or legal concerns in scraping craigslist data? Does this change if you are a planner in city government rather than an outside researcher?
What other questions might you be able to explore through scraping Craigslist (or similar websites)?