In this module, we’ll focus on two specific analysis techniques for natural language. Topic modeling tells you what a text is about – it identifies and classifies the latent topics in a body of text. Sentiment analysis identifies the tone, for example whether a text expresses positive or negative sentiment. These techniques fall under the broad umbrella of natural language processing or NLP.

Learning objectives

By the end of this module, you should be able to:

  1. Estimate and interpret a topic model that identifies underlying themes in a text
  2. Evaluate how to adjust model parameters in the search for a meaningful topic model
  3. Implement a scraper to obtain data from social media sites such as Reddit
  4. Critically interpret the findings from a sentiment analysis

Required readings

These two papers provide further examples of topic modeling and/or sentiment analysis. Choose one of them to read in depth, and skim the other. What does the paper do, and to what extent do you find the analysis helpful in understanding political communications and/or attitudes to transit? 

Han, Albert Tonghoon, Lucie Laurian and Jim Dewald. 2021. Plans Versus Political Priorities. Lessons From Municipal Election Candidates’ Social Media Communications. Journal of the American Planning Association, 87(2): 211-227.

Schweitzer, Lisa. 2014. Planning and Social Media: A Case Study of Public Transit and Stigma on Twitter, Journal of the American Planning Association, 80(3): 218-238.↓

Optional readings

Grimmer, Justin; Margaret E. Roberts; Brandon M. Stewart. 2022. Text as Data. A New Framework for Machine Learning and the Social Sciences. Princeton University Press.↓
This is a more theoretically grounded treatment of text analysis. I recommend it if you use any of the techniques for your projects, and/or to skim – particularly Ch 5 (Bag of Words) and Ch 13 (Topic Models). A copy is on order at the UCLA Library.

Steinert-Threlkeld, Zachary. 2018. Twitter as data. Cambridge: Cambridge University Press.↓               If you are interested in Twitter data, this is a great reference, especially Chapters 3 and 5.

Armstrong, John, Anna Nisi, and Adam Millard-Ball. A disciplinary divide in the framing of urbanization’s environmental impacts. Conservation Science & Practice 4(3): e624.↓
An example from my own work. Note that we didn’t find that standard sentiment analysis tools worked, so we had to develop our own.

Marti, Pablo, Leticia Serrano-Estrada and Almudena Nolasco-Cirugeda. 2019. Social Media data: Challenges, opportunities and limitations in urban studies. Computers, Environment and Urban Systems, 74: 161-174.↓

Jiang, Zhiqui and Andrew Mondschein. 2021. Analyzing Parking Sentiment and its Relationship to Parking Supply and the Built Environment Using Online Reviews. Journal of Big Data Analytics in Transportation, 3: 61–79.↓

Video 9a: Topic modeling

Topic models try to identify the underlying themes in a piece of text. This video demonstrates how to estimate a topic model using one of the most common algorithms, Latent Dirichlet Allocation (LDA), and how to adjust the model parameters in the search for a meaningful topic model.

As you watch the video, follow along with the code here.

Video 9b: Working with Reddit

Reddit is often a useful source of data for identifying people’s attitudes. In this lecture, we’ll examine how to obtain Reddit data via their API, using the PRAW library.

As you watch the video, follow along with the code here.

Video 9c: Sentiment analysis

Sentiment analysis provides a measure of how positive or negative a piece of text is. For example, it can distinguish between “I had a wonderful bus ride today” and “this bus was slow and filthy.” This video introduces sentiment analysis using the example of Reddit data. It also discusses how to create figures that are composed of multiple subplots.

As you watch the video, follow along with the code here.

Please take the quiz below to check your understanding of this module.

Quiz for currently enrolled UCLA students

Quiz for other learners

Class practice

This notebook practices the concepts that we’ve developed in the lecture notebooks. We’ll work through it in class.

It’s the Module 9 class activity in your GitHub repository here.

You have now completed Module 9. Please navigate to the homepage or to the next module by using the green navigation bar at the top.