In this module, we’ll focus on two specific analysis techniques for natural language. Topic modeling tells you what a text is about – it identifies and classifies the latent topics in a body of text. Sentiment analysis identifies the tone, for example whether a text expresses positive or negative sentiment. These techniques fall under the broad umbrella of natural language processing or NLP.
Learning objectives
By the end of this module, you should be able to:
- Estimate and interpret a topic model that identifies underlying themes in a text
- Evaluate how to adjust model parameters in the search for a meaningful topic model
- Implement a scraper to obtain data from social media sites such as Reddit
- Critically interpret the findings from a sentiment analysis
Required readings
These two papers provide further examples of topic modeling and/or sentiment analysis. Choose one of them to read in depth, and skim the other. What does the paper do, and to what extent do you find the analysis helpful in understanding political communications and/or attitudes to transit?
Optional readings
Grimmer, Justin; Margaret E. Roberts; Brandon M. Stewart. 2022. Text as Data. A New Framework for Machine Learning and the Social Sciences. Princeton University Press.↓
This is a more theoretically grounded treatment of text analysis. I recommend it if you use any of the techniques for your projects, and/or to skim – particularly Ch 5 (Bag of Words) and Ch 13 (Topic Models). A copy is on order at the UCLA Library.
Steinert-Threlkeld, Zachary. 2018. Twitter as data. Cambridge: Cambridge University Press.↓ If you are interested in Twitter data, this is a great reference, especially Chapters 3 and 5.
Armstrong, John, Anna Nisi, and Adam Millard-Ball. A disciplinary divide in the framing of urbanization’s environmental impacts. Conservation Science & Practice 4(3): e624.↓
An example from my own work. Note that we didn’t find that standard sentiment analysis tools worked, so we had to develop our own.
Video 9a: Topic modeling
Topic models try to identify the underlying themes in a piece of text. This video demonstrates how to estimate a topic model using one of the most common algorithms, Latent Dirichlet Allocation (LDA), and how to adjust the model parameters in the search for a meaningful topic model.
As you watch the video, follow along with the code here.
Video 9b: Working with Reddit
Reddit is often a useful source of data for identifying people’s attitudes. In this lecture, we’ll examine how to obtain Reddit data via their API, using the PRAW library.
As you watch the video, follow along with the code here.
Video 9c: Sentiment analysis
Sentiment analysis provides a measure of how positive or negative a piece of text is. For example, it can distinguish between “I had a wonderful bus ride today” and “this bus was slow and filthy.” This video introduces sentiment analysis using the example of Reddit data. It also discusses how to create figures that are composed of multiple subplots.
Please take the quiz below to check your understanding of this module.
Class practice
This notebook practices the concepts that we’ve developed in the lecture notebooks. We’ll work through it in class.
It’s the Module 9 class activity in your GitHub repository here.
You have now completed Module 9. Please navigate to the homepage or to the next module by using the green navigation bar at the top.