Research Proposal

MACS 30000: Perspectives on Computational Analysis
University of Chicago

Annoucement

All Assignment 1 grades are finalized.
Assignment 2 revision due on Wed Nov 29
Peer reviews are assigned for Thursday’s discussion
- Please submit written feedbacks on Canvas by 11:59 pm on Wed and
- read your peers’ comments before coming to class.
- 15 mins in class on Thursday for clarifying your feedbacks
- followed by presentation & QA

Participation

You will receive
- a check (10/10) if you submit your peer reviews in time and speak up at least once in class this week
- a check plus (12/10) if you submit your peer reviews and present your proposal draft in class on Thursday
- a check minus (8/10) if you submit your peer reviews
- a zero (0/10) if you fail to participate at all

Final Proposal

Due on Fri Nov 8
All submissions graded by our graders and no revision
You will receive feedbacks from me if you submit early by Wed Nov 6
Sample proposals from last year are on Canvas
Remote OHs next week

Mass collaboration

Major types

Human computation (week 8)
Open call (today)
Distributed data collection (today)

Open Calls

Pose a problem asking for specific, measurable solutions from other people
Offer a reward/incentive for participation
Compare and evaluate the solutions using a consistent and measurable metric
Generate broad participation from a wide range of researchers

Netflix prize

Need to predict what movies customers would enjoy
Internal research plateaus
Release an anonymized dataset of 100 million movie ratings to predict 3 million held-out ratings
Anyone who could create an algorithm that improved the existing model by 10% or better would win 1 million dollars
Clear and unbiased evaluation criteria
Solicited over 40,000 solutions

Discussion

The best predictive models in the Netflix Prize open call were hybrids of multiple models (ensemble methods). What characteristic of one model relative to other models made it improve the overall prediction when blended with the other models?
In your opinion, what kind of tasks are better suited for open call contests? What kind of tasks are not?
What are the benefits to the researchers proposing the problem?
What are the benefits to the participants proposing the solutions?
Are open calls better tailored to questions of prediction or questions of explanation? How might we utilize open calls to tackle explanations?

Distributed Data Collection

Enlist volunteers as data collectors
Enlist work on a scale otherwise impossible

OpenStreetMap

Source: OpenStreetMap

Designing your own mass collaboration

Motivate participants
Leverage heterogeneity
Focus attention
Be ethical
What happens if nobody comes?

Motivate participants - what are potential incentives?
- Money
- Fun
- Community
- Helping science
- Competition
Leverage heterogeneity
- Don’t just automatically throw out low-skill contributors
- Some people will participate much more than others - there is no need to artificially limit participation
- Implement redundancy and quality checks to ensure the results are valid
Focus attention
- Limit the scope of what the contributor is doing
- Streetscore - which of these two images looks safer?
- Netflix prize - optimize our predictive accuracy for this set of movie ratings
- Citizen scientists may not be highly trained in conducting science, but give them clear instructions and a discrete task to complete and they can be impactful
Be ethical
- What is fair compensation?
- How do we avoid abusing or mistreating workers?
- Open calls - is it fair for thousands of worker-hours to be spent on this challenge and only a very small number of workers to receive compensation?
What happens if nobody comes?
- No guarantees people will want to participate in your project
- Some incentives are just not enough to attract enough people
- Pilot tests are good at implementing projects on small-scale to assess the viability of the project
- Be prepared for failure

Research design

Literature review: Identifying sources

What constitutes an academic source?
- Published journal article
- Scholarly book/chapter
- Working paper
What is not an appropriate source?
- Policy report (maybe)
- Blog post
- Wikipedia

How to find sources

Google Scholar
Library research guides
Articles in annual review journals, e.g. Annual Review of Sociology/Political Science/Psychology
Handbooks & annotated bibliography
Skim the works cited of a relevant paper
Use citation index to find other papers that cite this work

Evaluating source quality

Is the publisher reputable?
Is it peer-reviewed?
Is it current?
What is the citation count?

Manage your bibliography

Record the bibliographical information for sources you think are noteworthy
Far better to do this up front than waiting until you write the paper - you will forget about some of your sources
Citation management

BibTeX

Store bibliographic information in plain-text .bib files
Easily incorporate citations into \(\LaTeX\) and R Markdown documents
I think it works for Microsoft Word as well?
Automatically generates your works cited page

A note on citation style

I don’t care which style you use, just be consistent
Using BibTeX or other citation managers ensures consistency in formatting
If you still have trouble understanding how to integrate your sources into your writing (e.g. when to cite, how to paraphrase) read chapter 14 in Booth or ask me.

Developing a research design

Is your question a what/how/why question?
If it is purely a descriptive/what question
- choose an observational design
- design a survey
If you want heterogeneity/mechanisms
- go big
If you want to test a causal claim
- come up with a quasi-experimental design
- or conduct a real experiment

Where do you get data?

Does the data in principle already exist?
If yes, is it analog or digital?
- If analog, how do you turn it into digital?
- Machine-coding
- Human computation
- If digital, how hard is it to get it?

Where do you get data?

Does the data in principle already exist?
If no, how would you create it?
- Survey
- Mass collaboration
- Simulation
Does the data allow you to answer your question?
- If no, could you change your question?

Research Proposal

Annoucement

Participation

Final Proposal

Mass collaboration

Major types

Open Calls

Netflix prize

Discussion

Distributed Data Collection

OpenStreetMap

Designing your own mass collaboration

Crowdsourcing as a Social Movement

Research design

Literature review: Identifying sources

How to find sources

Evaluating source quality

Manage your bibliography

BibTeX

A note on citation style

Developing a research design

Where do you get data?

Where do you get data?