Introduction

MACS 30405: Exploring Cultural Space
University of Chicago

What is cultural space?

Source: Inglehart, R., & Baker, W. E. (2000). Modernization, cultural change, and the persistence of traditional values. American sociological review, 19-51.

Source: Bourdieu, P. (1984). Distinction. London: Routledge.

Source: Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). The geometry of culture: Analyzing the meanings of class through word embeddings. American Sociological Review, 84(5), 905-949.

Some preliminary sketches

  • The units of analysis of a cultural space could be
    • subjects engaged in cultural production and/or comsumption, e.g. person/society/country, OR
    • objects being produced, e.g. artwork/idea/belief/word
  • These subjects/objects share some common dimensions, e.g. left vs. right, liberal vs. conservative, highbrow vs. lowbrow, etc.

Why do we need a space to describe culture?

  • Parsimony (fewer dimensions than things being described)
  • More importantly In culture, subjects and objects share the same meaning structure.

Methods covered in this class

  1. Principal Component Analysis (PCA) and Factor Analysis (FA)
  2. Correspondence Analysis (CA) and Multiple Correspondence Analysis (MCA)
  3. Basics of Natural Language Processing (NLP) (only selected topics related to this class will be covered.)
  4. World-embedding models (Word2Vec in particular)
  5. (if time permits) Latent Class Analysis (LCA), Multidimensional Scaling (MDS), Procrustes Analysis (PA), extension of neural-embedding models (Anything2Vec), etc

Pedagogical goals

  • Develop intuitive understandings of how the spatial models work
  • Apply the models to empirical data
  • Make meaningful interpretations, which help generate insights about the social world.
  • The more you understand about the maths behind the models, the better. However, you won’t be tested on this respect.

This class is not a general-purpose NLP class.

  • Things that will not be covered in the class:
    • topic models
    • cluster analysis
    • supervised machine learning

Classes that would complement the materials covered in this class

  • Classes on multivariate statistics: STAT 32900/32940/32950
  • Classes on computational content anlaysis or natural language processing: MACS 60000 LING 38610/38620

Now, let’s talk about the syllabus.

Check-ins

Mohr et al. (2020) Why measure culture?

  • Is culture measurable?
  • Cultural study is a study of meaning. Culture is real and something than can be empirically quantified.
  • What makes culture hard to measure?
    • Culture is always subject to interpretation.
    • Every measure only imperfectly captures part of reality.
  • The authors advocate for a “multipronged” approach that complement different levels of formal measures with interpretation.

Inglehart & Baker. (2000). Modernization, cultural change, and the persistence of traditional values.

  • Major argument: Modernization brings in two cultural dimensions:
    • A traditional/secular dimension brought by the industrial revolution
    • A survival/self expression dimension brought by the post-industrial transformation

Culture seems predictable. But why?

Culture as a system

Weber, Max. The Protestant Ethics and the Spirits of Capitalism.

  • Central argument: Capitalism was first developed in Western Europe because there was an elective affinity between the Calvinist concept of Calling and the capitalist mode of production.

Weber, Max. “Religious Rejections of the World and Their Directions.”

  • Every social sphere has a tendency to rationalize itself and become internally coherent.
  • The outcome is the separation of social spheres, with each sphere having its own space and own time and striving for self-consistency.
  • However, different social spaces would inevitably have conflicts with each. Example: universal brotherliness in religion vs. economic rationality.

Weber, Max. “Religious Rejections of the World and Their Directions.”

  • Problems demand solutions. But the law (or self-consistency) of each sphere determines that there are only a few ways out.
  • The end product: the Protestant Reformation.
  • Main takeaways: Each social sphere has its own space and own laws. Therefore, cultural change is predictable.

Geertz, Cliford. “Ideology as a cultural system.”

  • Geertz argues against the thesis that ideology should only be studied as symptoms of something else (such as material interests or psychological strains).
  • Rather, ideology can be studied as “systems of interacting symbols” (p. 207).
    • Think about the example of kneeling in American football mentioned at the begining of Mohr et al. (2020). Why is the same act interpreted very differently by different people?
    • Is it a symbolic act? What is a symbol?
  • The interworking of symbols is a social process “not in the head,” but in that public world “people talk together, name things, make assertions, and to a degree understand each other” (p. 213).

Break

Warm-ups

Conceiving data as a matrix

Conventional way of presenting some data
Variable 1 Variable \(p\)
Observation 1
Observation \(n\)

Dimension: n \(\times\) p (usually n is far greater than p.)

Example 1: survey responses
Sex Age Attitude on Abortion
Individual 1 0 25 3
Individual \(n\) 1 48 7
Example 2: word frequency counts
a the power
Document 1 928 824 8
Document \(n\) 451 552 5

Variable types:

  • continuous (scale): can take any real value
  • binary: 0/1
  • ordinal: e.g. educational degree
  • categorical e.g. favorite music genre

Statistics of continuous variables

  • Mean: \(\bar{x} = \frac{1}{n}\sum_{i=1}^{n}x_i\)
  • Variance: \(\text{var}(x) = \frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^2\)
  • Standard deviation: \(s_x = \sqrt{\text{var}(x)}\)
  • Covariance: \(\text{cov}(x, y) = \frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})\)
  • Pearson product-moment correlation: \(r_{xy} =\frac{\sum ^n _{i=1}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum ^n _{i=1}(x_i - \bar{x})^2} \sqrt{\sum ^n _{i=1}(y_i - \bar{y})^2}}\)

Sample PCA

  1. Given variables \(\vec{x_1}, \vec{x_2} ... \vec{x_p}\), find a linear combination of all the variables such that the new variable has maximal variance subject to the condition that the coefficients of the linear combination form a unit vector. Or to be more precise, we want to find a unit vector \(\vec{a_1} = [a_{11}, ... , a_{1p}]\) such that the new variable \(a_{11}\vec{x_1} + a_{12}\vec{x_2} + ... + a_{1p}\vec{x_p}\) has maximal variance subject to the condition that \(||a_1||_2 = 1\). \(\vec{a_1}\) is called the first principal component.

Sample PCA - geometric interpretation

  1. Then, we want to find a second linear combination vector (the second principal component) that does the same except for the fact that the newly formed variable \(a_{21}\vec{x_1} + a_{22}\vec{x_2} + ... + a_{2p}\vec{x_p}\) should be uncorrelated with the first variable.

  2. Following the same logic, the kth prinipal component is the \(k\)th linear combination vector that maximizes the variance of the newly formed variable subject to the condition that \(a_{k1}\vec{x_1} + a_{k2}\vec{x_2} + ... + a_{kp}\vec{x_p}\) should be uncorrelated with all previously transformed variables.

  3. The procedure can be iteratively performed until \(k\) reaches \(\text{min}(n,p)\).

  1. The variance of a variable is not scale-free. Note that \(\text{var}(a\vec{x}) = a^2\text{var}(x)\). Maximal variance may not be something we want. To address this problem, we oftentimes standardize the variables by transforming \(x\) into \(x' = \frac{x - \bar{x}}{s_x}\).

Solutions

  • PCA has closed-form solutions which make use of eigenvalue decomposition (EVD). You don’t really need to understand EVD in order to use PCA. But it is better to know how it works.
  • Explanations on blackboard.

Bring your laptop next week.