Week 3

MACS 30405: Exploring Cultural Space
University of Chicago

Annoucement

  • Readings for week 4 will be released by the end of today.

Plan for coding exercises

  • Exercise 1 (PCA): released on Mar 29; due on April 18
  • Exercise 2 (FA/CA): to be released on April 12, due on April 18 (for FA) and/or 25 (for CA)
  • Exercise 3 (Text analysis): to be released on April 17, due on May 2
  • Exercise 4 (World embedding): to be released on May 1, due on May 9
  • Exercise 5 (Aligning vector spaces): to be released on May 8, due on May 16

Recap

McPherson (2004)

  • Blau space: a multi-dimensional space spanned by various social-demographic variables
  • Benefit: gives relative positions
  • Core argument: cultural variations (dimensions) are the products of the sorting of different people into different spatially proximate positions (as a consequence of modernization)

Problems

  • Causality: what mechanisms sorted people into different positions in the first place? What is the dynamic? (Mark)

  • Necessity: are direct social interactions necessary for producing common cultural patterns?

  • Social determinism

    • No agency
    • Explanations contingent solely on social positions (Aida) / culture has no autonomy.
  • The Blau space is also multi-polar and not very parsimonious.

    • Lack of dimensions (Athmika)

Bourdieu

Three dimensions of social hierarchy:

  • Economic capital
  • Cultural capital
  • Social capital

Why are cultural tastes associated with education/class?

  • The association manifests in cultural tastes that are not necessarily learned in school.
  • The academic system classifies people and make them recognize their distinction/legitimacy.
  • Art separates forms from functions and creates boundaries. (When everybody can go to Louvre to see classical art, the elite class goes for avant-garde.)

Culture as autonomous fields

  • Artworks answer and reinforce all the others (43).
  • Artistic competence are derived from “perceiving and deciphering” styles (44).
  • Manners and cultivation created symbolic boundaries and classes (59).
  • Learning institutions rationalize the sense of beauty (60).

Habitus

  • Dispositions generated from classifiable conditions associated with long establishment in the cultural field
  • Structuring structure and structured structure that produces a system of differences
  • The system is nevertheless conditioned on some structure of fundamental oppositions (such as rich vs. poor) (167).

Homology between the spaces

Homology between the spaces

Casserole - working class

Cultural capital

  • People derive profit from the rarity of their positions in the space
  • Producers compete in the field to make distinct products
  • Consumers choose products based on their positions in the space.

Correspondence analysis

Problems

  • What does the correspondence analysis really say?
  • What is the difference between the field and the space?
  • Is the Bourdieusian space French specific?
  • Still social deterministic?
  • What are the possibilities of change?

Let’s talk about measurement.

Discussion

  • What are possible pitfalls of using surveys to study culture?
  • What can we social scientists do with the limitations in measurement?

Break

Factor Analysis

Recap: PCA as a dimension-reduction techinque

  • uses linear combinations of variables to find dimensions that explain the maximal variance.
  • can be applied to any matrix
  • The goal is to find a lower-dimensional representation of the original data and does not rely on any assumption on how the data is generated.

Factor analysis

  • assumes that the data is generated from a few hidden dimensions and seeks to recover the dimensions
  • Mathematical formulation: \[ x_1 = \lambda_{11}f_1 + \lambda_{12}f_2 + ... + \lambda_{1m}f_m + e_1 \\ x_2 = \lambda_{21}f_2 + \lambda_{22}f_2 + ... + \lambda_{2m}f_m + e_2 \\ \vdots\\ x_p = \lambda_{p1}f_2 + \lambda_{p2}f_2 + ... + \lambda_{pm}f_m + e_2 \\ m < p \]
  • The error terms are idiosyncratic and are uncorrelated among themselves and with any factor.
  • The factors can be uncorrelated or correlated.

Factor analysis via PCA approximation

  1. Apply PCA to the dataset
  2. Keep the first \(m\) principal components scaled by the square roots of the eigenvalues and treat the rests as idiosyncratic.
  3. Rotate the PC factors according to some criterion.

Varimax rotation (orthogonal)

  • Goal: makes most orthogal factor loadings either close to zero or far from zero.
  • find factor loadings \(b_{ij}\)s that maximize \[ \left(\frac{1}{p}\sum_{j=1}^k \sum_{i=1}^p b^4_{ij} - \sum_{j=1}^k \left(\frac{1}{p}\sum_{i=1}^p b^2_{ij}\right)^2\right). \]

Geometric interpretation

Jolliffe (2002, 155)

Oblique rotation

Jolliffe (2002, 156)

Comparison with PCA

  • FA rests on an assumption of how the data is generated
  • \(m\) factors still explain the same proportion of total variance as the \(m\) principal components do.
  • However, the first factor(s) no longer explain maximal variation.
  • The factors may or may not be orthogonal.

PCA vs. FA

Correspondence Analysis

When dealing with categorical variables

Occupation Favorite movie genre
Individual 1 Student Comedy
Individual \(n\) Engineer Sci-Fi

You can present bivariate association in a crosstab

Comedy Sci-Fi
Student 5 8
Engineer 7 7

Correspondence analysis

  • Goal: given a count table \(\mathbf{X} = [x_{ij}]\), to find row-weight and column-weight vectors \(\mathbf{r} \in \mathbb{R}^n\) and \(\mathbf{c} \in \mathbb{R}^p\), such that

\[\begin{equation} \begin{aligned} r_i &\propto \sum_{j=1}^p\ c_j \frac{x_{ij}}{x_{i\cdot}}, \\ \text{and } c_j &\propto \sum_{i=1}^n\ r_i \frac{x_{ij}}{x_{\cdot j}}, \end{aligned} \end{equation}\] where \(x_{i\cdot}\) and \(x_{\cdot j}\) are the sums of the \(i\)th row and the \(j\)th column, respectively.

Correspondence analysis

  • The row scores and column scores in a correspondence analysis are reciprocally determined.
  • The problem can be solved via a generalized singular value decomposition applied to \(\mathbf{X}\)
  • In CA, the total amount of variability is called inertia and is calculated as \(\frac{\chi^2}{N}\).
  • Like in PCA, the eigenvalues correspond to the amount of inertia explained by the corresponding dimensions. All dimensions are orthogonal to each other, and the first dimension explains the maximal inertia.
  • As in PCA, the solutions can be presented in a bi-plot.
  • CA can also be conveniently applied to supplementary columns and rows.

Multiple correspondence analysis

  • is an extension of CA that applies to multiple categorical variables.
  • MCA starts with turning individual-level data into an indicator matrix:

Indicator matrix

Student Engineer Comedy Sci-fi
Individual 1 1 0 1 0
Individual \(n\) 0 1 0 1

Multiple correspondence analysis

  • Then, it applies CA to the indicator matrix \(\mathbf{X}\) or the Burt matrix \(\mathbf{X}^\top\mathbf{X}\)
  • One caveat: due to the introduction of dummy variables, the total amount of inertia is inflated. The eigenvalues need to be corrected.