Fundamentals of (Digital) Survey Design

MACS 30000: Perspectives on Computational Analysis
University of Chicago

Asking Questions 1.0

Why ask questions?

  • Because we can
  • Measure internal states

Total Survey Error

  • Census vs. Sample
  • Sources of error
    • Representation errors
    • Measurement errors

Relationship to Bias and Variance

\[ \begin{eqnarray} \text{Total Survey Error} &=& \text{Representation Errors (Bias)} \\ &+& \text{Measurement Errors (Variance)} \\ \end{eqnarray} \]

Relationship to Bias and Variance

Representation Errors

  • When the survey does not accurately represent the population of interest
  • Example: 1936 Literary Digest poll (10 million ballots send and 2.4 million received)
    • Target population vs. Frame population
    • Sample population vs. Respondents
      • Non-response bias

Sampling Methods

Introducing sampling error as another source of representation error

Probability Sampling

  • Simple random sampling
  • Oversampling and undersampling
  • Post-stratification and weighting

Non-probability samples

  • Homogenous-response-propensities-within-group assumption
  • Response-propensity assumption
    • But: potential issues with data sparsity

Multilevel regression and post-stratification (MRP, e.g. Wang et al. 2015)

  1. Collect your sample (probability or non-probability)
  2. Divide the sample into many groups (cells) based on known characteristics
  3. Estimate a multilevel regression model
  4. Average over each group (cell) to calculate the estimate of interest
  5. Post-stratify the results based on each group’s prevalence in the known population

Source: Bit By Bit Figure 3.8 (based on Wang et al. 2015, Figure 2 + 3)

Representation Errors

Source: Bit By Bit Figure 3.2

Key Takeaways

  1. Having a large number of respondents will often decrease the variance of estimates, but it does not necessarily decrease the bias
  2. Researchers need to account for how their survey data is collected in making estimates - how might the population of respondents be skewed?

Measurement Errors

Answers we receive and the inferences we make can depend on exactly how we ask a question.

If you could choose between the following two approaches, which do you think is the better penalty for murder - the death penalty or life imprisonment, with absolutely no possibility of parole?

Are you in favor of the death penalty for a person convicted of murder?

Measurement Errors

TIL that only 34% of American Millenials think that “America is the greatest country in the world” while that number is at almost 70% for Baby Boomers. from r/todayilearned

Sometimes measurement error is itself an interesting social behavior

“Global warming” or “climate change”? Whether the planet is warming depends on question wording (Schuldt et al. 2011):

Republicans were less likely to endorse that the phenomenon is real when it was referred to as “global warming” (44.0%) rather than “climate change” (60.2%), whereas Democrats were unaffected by question wording (86.9% vs. 86.4%)

Avoiding Measurement Error

  • Copy questions from high-quality surveys (e.g. GSS, ANES)
  • Survey experiment some of your questions
  • Pilot test your questions (pre-testing)

Mental Exercise: Failure of US presidential election polls (2016)

Discussion Groups for Week 4

Week 4 Discussion Groups
0 Zhuojun / Yue / Tian
1 Andy / Ertong / Max
2 Anny / Abbey / Kuang
3 Daniela / Jiazheng / Yuhan
4 Huanrui / Pritam / Kexin
5 Lorena / Agnes / Tianle
6 Emma / Cosmo / Thomas

Survey 2.0: New Ways of Asking Questions

Typical Survey Modes

  • Interviewer-Administered: In-person or phone interview
  • Question types
    • Closed-ended questions
    • Open form

Computer-Administered Surveys

Benefits

  • Reduce costs
  • Reduce social desirability bias
  • Eliminate interviewer effects
  • Increase respondent flexibility, timing

Drawbacks

  • Cannot clarify confusing questions
  • Lose rapport with respondent
  • Interviewer can’t help maintain respondent engagement

Wiki surveys

Figure 3.9 in Bit by Bit: Results from a survey experiment (Schuman and Presser 1979)

Wiki surveys

Gamification

Figure 3.11 in Bit by Bit: Interface from Friendsense study (Goel et al. 2010)

Ecological Momentary Assessment

  • Take long surveys, chop them up, and incorporate into lives
  • Major benefits of EMA
    • Collection of data in real-world environments
    • Assesments that focus on individuals’ current or very recent state or behavior
    • Assessments that may be event-based, time-based, or randomly prompted
    • Completion of multiple assessments over time
  • Cornwell and Cagney (2017) – We will discuss it on Thursday.

Linking Surveys to Other Data

Amplified asking

  • Using digital traces to extract more value from survey data
  • Feature engineering + machine learning
  • Predicting poverty and wealth from mobile phone metadata (Blumenstock et al. 2015)

Discussion: Amplified Asking in Policymaking

Blumenstock, Cadamuro, and On (2015) use call detail records (CDRs) from mobile phones to predict poverty and wealth in Rwanda. Other studies have used CDRs to predict aggregate unemployment rates. Do you think CDRs and other measurements generated through amplified asking techniques should replace traditional surveys, complement them, or not be used at all for government policymaking? What evidence would convince you that CDRs can completely replace traditional measures?

Enriched asking

  • Combine digital trace with survey data (1:1)
  • Record linkage \(\rightarrow\) messy

Discussion: Predicting traits from Facebook likes (Kosinski et al. 2013)

  • Role of enriched asking (e.g. why can’t we just use observational data alone?)
  • Source of data
  • Ethical concerns