Research Proposal

MACS 30000: Perspectives on Computational Analysis
University of Chicago

Annoucement

  • All Assignment 1 grades are finalized.
  • Assignment 2 revision due on Wed Nov 29
  • Peer reviews are assigned for Thursday’s discussion
    • Please submit written feedbacks on Canvas by 11:59 pm on Wed and
    • read your peers’ comments before coming to class.
    • 15 mins in class on Thursday for clarifying your feedbacks
    • followed by presentation & QA

Participation

  • You will receive
    • a check (10/10) if you submit your peer reviews in time and speak up at least once in class this week
    • a check plus (12/10) if you submit your peer reviews and present your proposal draft in class on Thursday
    • a check minus (8/10) if you submit your peer reviews
    • a zero (0/10) if you fail to participate at all

Final Proposal

  • Due on Fri Nov 8
  • All submissions graded by our graders and no revision
  • You will receive feedbacks from me if you submit early by Wed Nov 6
  • Sample proposals from last year are on Canvas
  • Remote OHs next week

Mass collaboration

Major types

  • Human computation (week 8)
  • Open call (today)
  • Distributed data collection (today)

Open Calls

  • Pose a problem asking for specific, measurable solutions from other people
  • Offer a reward/incentive for participation
  • Compare and evaluate the solutions using a consistent and measurable metric
  • Generate broad participation from a wide range of researchers

Netflix prize

  • Need to predict what movies customers would enjoy
  • Internal research plateaus
  • Release an anonymized dataset of 100 million movie ratings to predict 3 million held-out ratings
  • Anyone who could create an algorithm that improved the existing model by 10% or better would win 1 million dollars
  • Clear and unbiased evaluation criteria
  • Solicited over 40,000 solutions

Discussion

  • The best predictive models in the Netflix Prize open call were hybrids of multiple models (ensemble methods). What characteristic of one model relative to other models made it improve the overall prediction when blended with the other models?
  • In your opinion, what kind of tasks are better suited for open call contests? What kind of tasks are not?
  • What are the benefits to the researchers proposing the problem?
  • What are the benefits to the participants proposing the solutions?
  • Are open calls better tailored to questions of prediction or questions of explanation? How might we utilize open calls to tackle explanations?

Distributed Data Collection

  • Enlist volunteers as data collectors
  • Enlist work on a scale otherwise impossible

OpenStreetMap

Source: OpenStreetMap

Designing your own mass collaboration

  • Motivate participants
  • Leverage heterogeneity
  • Focus attention
  • Be ethical
  • What happens if nobody comes?

Crowdsourcing as a Social Movement

Research design

Literature review: Identifying sources

  • What constitutes an academic source?
    • Published journal article
    • Scholarly book/chapter
    • Working paper
  • What is not an appropriate source?
    • Policy report (maybe)
    • Blog post
    • Wikipedia

How to find sources

  • Google Scholar

  • Library research guides

  • Articles in annual review journals, e.g. Annual Review of Sociology/Political Science/Psychology

  • Handbooks & annotated bibliography

  • Skim the works cited of a relevant paper

  • Use citation index to find other papers that cite this work

Evaluating source quality

  • Is the publisher reputable?
  • Is it peer-reviewed?
  • Is it current?
  • What is the citation count?

Manage your bibliography

  • Record the bibliographical information for sources you think are noteworthy
  • Far better to do this up front than waiting until you write the paper - you will forget about some of your sources
  • Citation management

BibTeX

  • Store bibliographic information in plain-text .bib files
  • Easily incorporate citations into \(\LaTeX\) and R Markdown documents
  • I think it works for Microsoft Word as well?
  • Automatically generates your works cited page

A note on citation style

  • I don’t care which style you use, just be consistent
  • Using BibTeX or other citation managers ensures consistency in formatting
  • If you still have trouble understanding how to integrate your sources into your writing (e.g. when to cite, how to paraphrase) read chapter 14 in Booth or ask me.

Developing a research design

  • Is your question a what/how/why question?
  • If it is purely a descriptive/what question
    • choose an observational design
    • design a survey
  • If you want heterogeneity/mechanisms
    • go big
  • If you want to test a causal claim
    • come up with a quasi-experimental design
    • or conduct a real experiment

Where do you get data?

  • Does the data in principle already exist?
  • If yes, is it analog or digital?
    • If analog, how do you turn it into digital?
    • Machine-coding
    • Human computation
    • If digital, how hard is it to get it?

Where do you get data?

  • Does the data in principle already exist?
  • If no, how would you create it?
    • Survey
    • Mass collaboration
    • Simulation
  • Does the data allow you to answer your question?
    • If no, could you change your question?