Text as Data: quantitative text analysis with R

The workshop will take place on July 13th, 10:00 AM CEST on Zoom

Course Details

This is an introductory course on social science text analysis. I will assume no prior knowledge of text analysis but will require competence in R for the practical exercises. Our four hours will be divided by topic, and each topic will have approximately 30m of lecture and 30m of practical. The practical materials will be provided shortly in advance of the class.

The aim of the workshop is to showcase a basic toolkit of model types available for social and data scientists for anayzing text as data. We will focus on evaluation and applications and not say much, if anything, about the computational challenges of fitting these models, their more recent extensions, or the acquisition of text data. Students interested in these are encouraged to take a dedicated machine learning or web scraping course.


Will Lowe
Dr. Will Lowe

William Lowe is Senior Research Scientist at the Hertie School. His research spans legislative politics as well as political economy, and most recently public policy, focusing on the causal inference behind estimates of racial bias in policing. Methodologically he is interested in statistical models of text and in causal inference. He joins the Hertie School from Princeton University where he was Senior Research Specialist and a Lecturer in the Department of Politics. He has a PhD in Cognitive Science from the University of Edinburgh, a Bachelor of Arts in Philosophy from the University of Warwick, and has previously held postdoctoral positions at Harvard University, the University of Nottingham, and the MZES.


Lecture: Introduction to text as data.

Practical: Introduction to the quanteda package(s)

Lecture: Document classification and evaluation.

Practical: Classifying documents and evaluating classifiers

Short Break

Lecture: Topic models and dictionaries.

Practical: Fitting and interpreting topic models

Lecture: Scaling and other spatial models.

Practical: Fitting and interpreting scaling models

Session Ends

Content Licensing

All workshop materials and recording are under Creative Commons Attribution-NonCommercial-ShareAlike 2.0 license. You are free to share — copy and redistribute the material in any medium or format, and adapt — remix, transform, and build upon the material. However, you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. You may not use the material for commercial purposes. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

Event Recording