This week we’ll focus on how to scrape web pages.
We’ll use examples from the City of Seattle permits database, and from craigslist.
As before, the course videos will provide the basic concepts and backgrounds. As you watch them, follow along in the notebook (which is in your GitHub repository). Pause the video to explore the objects, try different things, and experiment. Don’t worry if you don’t get every last detail – we’ll use the class time to practice and introduce more examples.
By the end of this module, you should be able to:
- Evaluate when web scraping is needed, rather than a simpler solution
- Design a scraping approach that considers the structure of a web page
- Implement a web scraper for a given page
- Critically analyze ethical, legal, and representational concerns (e.g. who is excluded) around webscraping
Access the homework assignment here
. Please submit on GitHub.
Here are some tips on submitting via GitHub. They are from last year’s course; the URLs are different but the steps are the same.
You have now completed Module 2. Please navigate to the homepage or to the next module by using the green navigation bar at the top.