Text as data: Quantitative text analysis for political science and public policy

In the study of politics, text of one kind or another is often essential to measuring important underlying concepts, e.g. policy sentiment, issue frames, or ideological positions. It also poses special challenges, notably to machines but also to researchers who themselves simply cannot read everything their research designs require and must therefore look to operationalise and extend their own understanding using quantitative tools. This course is about the kinds of statistical and computational tools that political scientists and policy analysts have found useful for treating text 'as data'.

The course is a mix of theoretical and practical work with regular exercises - usually a mix of conceptual questions and R programming. The first part of the course is designed to familiarise students with the data side of analysing text as data. This will be slightly more intensive than other courses whose data sources often do not need so much prior cleaning and arranging to work with. The next part of the course work introduces a small group of concepts in the form of summary statistics, including keyness, concordances, word embeddings and simple dictionary-based content analysis. The third part of the course shows how these are combined into useful analytical tools such as topic models, sentiment analysis and text scaling models.

This course is for 2nd year MIA and MPP students only.