Digital Experimental Design

MACS 30000: Perspectives on Computational Analysis
University of Chicago

Assignment 1

Format

  • Submit your assignment via Gradescope
    • (Optional) Complete the test assignment by Friday
  • There is an exact word count range
  • Cite properly
  • with a bibliography

Advice

  • Make sure to answer each question with sufficient explanations
  • Try to employ concepts from Bit by Bit
  • Do not easily ignore strengths
  • Give some second thoughts (especially on weaknesses) before writing
  • Start early and bring your questions to next class

Experiments

Topics this week

  • Digital Experimental Design
  • Running Digital Experiments

When to use experiments

  • Useful for causal inference and determining causal effects
  • Rules out confounders or alternative explanations…
    • … when designed intelligently:
      randomized controlled trial \(>\) “perturb and observe”

Importance of control group

  • Random selection of treatment ensures that the only thing changing, on average, among two groups is the treatment

Basic survey designs

Source: Bit by Bit Figure 4.5

When experiments are not appropriate

Smith and Pell (2003) presented a satirical meta-analysis of studies demonstrating the effectiveness of parachutes. They concluded:

As with many interventions intended to prevent ill health, the effectiveness of parachutes has not been subjected to rigorous evaluation by using randomised controlled trials. Advocates of evidence based medicine have criticised the adoption of interventions evaluated by using only observational data. We think that everyone might benefit if the most radical protagonists of evidence based medicine organised and participated in a double blind, randomised, placebo controlled, crossover trial of the parachute.

Major features of experiments

Two dimensions of experiments

Source: Bit by Bit Figure 4.1

Validity

  • The extent to which the results of a particular experiment support some more general conclusion
    • Statistical conclusion validity
    • Construct validity
    • Internal validity
    • External validity

Importance of validity in experiments

  • Traditional experiments were most concerned with internal validity
    • Not generally concerned with external validity
    • WEIRD data
  • … but digital experiments typically have larger, more diverse pools of participants and increased capacity for assessing external validity

Heterogeneity of treatment effects

  • Assessing differential effects of the treatment on sub-groups in the study
  • Made possible in digital experiments because of the larger sample size and low variable cost to add more participants

Power consumption study

Source: Bit by Bit Figure 4.6 (adapted from Allcott 2011)

Power consumption study

Source: Bit by Bit Figure 4.8 (adapted from Allcott 2011)

Mechanisms

Source: Bit by Bit Figure 4.10

Mechanisms

  • Mechanisms tell us why or how a treatment caused an effect
  • Like with prediction, it isn’t necessarily enough to know \(X\) causes \(Y\) - we want to know why
  • Digital experiments allow us to test the process or mediating variables

Detecting Mechanisms: Collecting Process Data

How did Home Energy Reports cause people to lower their electricity usage?

  • Linking Data from power company rebate program: data about consumer appliance upgrades (more energy-efficient or not?)
    • Allcott and Rogers 2014: more people who received Home Energy Reports upgraded their appliances
    • Accounts for a 2% decrease in energy use in treated households

Revisiting the Home Energy Report

Source: Bit by Bit Figure 4.6 (adapted from Allcott 2011)

Detecting Mechanisms: Running Additional Experiments

Test four conditions to assess the role of tips alone in lowering energy usage (Ferraro, Miranda, and Price 2011):

  • a group that received tips on saving water
  • a group that received tips on saving water plus a moral appeal to save water
  • a group that received tips on saving water plus a moral appeal to save water plus information about their water use relative to their peers
  • a control group

Detecting Mechanisms: Running Additional Experiments

Source: Bit by Bit Figure 4.11 (Adapted from Ferraro, Miranda, and Price 2011)

Better: Full Factorial Design

Treatment Characteristics
1 Control
2 Tips
3 Appeal
4 Peer Information
5 Tips + Appeal
6 Tips + Peer Information
7 Appeal + Peer Information
8 Tips + Appeal + Peer Information

Voter mobilization on Facebook (Bond et al. 2012)

In groups, evaluate the following:

  • Where does this experiment fall on the Digital-Analog continuum?
  • Validity: Do you have any concerns about the four types of validity?
  • Heterogeneity of effects:
    • How does the digital experiment allow researchers to assess different levels of effect?
    • What do we learn from this aspect that we would not know if we just learned the average treatment effect?
  • Causal mechanism:
    • What causes the increase in reported and validated voting?
    • How does the study identify the causal mechanism? Or does it?
    • How can this knowledge be useful to researchers?
  • Ethics of running such a study

How do I run a digital experiment?

Partner with the powerful

Partnerships with big organizations

  • Advantages: Reduce costs, increase scale
  • Disadvantages: can’t alter kinds of participants, treatments, and outcomes that you use
  • Examples: Energy Report Studies from last class (partner with utility companies), Bond et al. 2012 (partner with big tech companies)

Build your own experiment

Creating your own digital “lab”

Costly, but can run exactly the experiment you want to run.

How do you recruit participants, though?

Digital ads

Salganik 2007, Figure 2.12

Amazon Mechanical Turk (MTurk)

Source: mturk.com

Benefits of MTurk for experiments

  • Larger potential subject pool
  • Can integrate with external survey platforms to easily run survey experiments
  • Cheaper than traditional subject recruitment
  • 4000+ MTurk academic studies and counting!

Threats to Validity?

  • External Validity
  • Internal Validity

Alternatives beyond just MTurk

  • CloudResearch
  • Prolific
  • Qualtrics
  • Dynata
  • and more…

Build a product

MovieLens

Use existing environments

On Thursday

  • van de Rijt et al. 2014: Success-breeds-success experiments
  • Bail et al. 2018 - political polarization on social media

Best practices in running your own experiment

  • Run multiple experiments that reinforce one another
  • Aim for zero variable cost
  • Build ethics into your design: replace, refine, reduce
  • See Bit by Bit 4.6.2 for more detail

Also on Thursday

  • Compare and contrast Salganik et al (2006) and van de Rijt et al (2014)
  • How do these studies fall on the lab/field continuum?
  • What are the relative strengths/weaknesses of the lab design vs. field design in terms of
    • statistical validity
    • construct validity
    • internal validity
    • external validity

Discussion Groups for Week 5

0 Daniela / Tian / Max
1 Pritam / Tianle / Yue
2 Emma / Anny / Zhuojun
3 Jiazheng / Cosmo / Yuhan
4 Andy / Abbey / Ertong
5 Huanrui / Kuang / Kexin
6 Lorena / Agnes / Thomas