Running Experiments with Amazon Mechanical Turk

Follow and Like:
Follow by Email
Follow Me

Running Experiments with Amazon Mechanical TurkI’ll start by saying that I think Amazon Mechanical Turk (MTurk) and online markets offer no less than a revolution in experimental psychology. By now, I’ve already conducted over a hundred experiments on MTurk and have come to consider it as one of the most important tools available to me. Together with Qualtrics (see previous posts with tips – 1, 2, 3) MTurk is a very powerful tool for very quick and inexpensive data collection.   You don’t have to take my word for it, take it from those who know something. There are lots of high-profile articles popping up in various journals across all domains that have come to the same conclusion as I have – MTurk is an important tool. The following examples were chosen from psychology, management, economics, and even biology :


Social Psychology

From  Buhrmester,  Kwang, & Gosling (2011, Perspectives on Psychological Science) Amazon’s Mechanical Turk:  A New Source of Inexpensive, Yet High-Quality, Data?  :

Findings indicate that: (a) MTurk participants are slightly more representative of the U.S. population than are standard Internet samples and are significantly  more diverse than typical American college samples; (b) participation is affected by  compensation rate and task length but participants can still be recruited rapidly and  inexpensively; (c) realistic compensation rates do not affect data quality; and (d) the data  obtained are at least as reliable as those obtained via traditional methods.

From Paolacci and Chandler (2014, Current Directions in Psychological Science)  Inside the Turk Understanding Mechanical Turk as a Participant Pool :

Mechanical Turk (MTurk), an online labor market created by Amazon, has recently become popular among social scientists as a source of survey and experimental data. The workers who populate this market have been assessed on dimensions that are universally relevant to understanding whether, why, and when they should be recruited as research participants. We discuss the characteristics of MTurk as a participant pool for psychology and other social sciences, highlighting the traits of the MTurk samples, why people become MTurk workers and research participants, and how data quality on MTurk compares to that from other pools and depends on controllable and uncontrollable factors.


Clinical Psychology

From Shapiro, Chandler, & Mueller (2013, Clinical Psychological Science) : Using Mechanical Turk to Study Clinical Populations :

Although participants with psychiatric symptoms, specific risk factors, or rare demographic characteristics can be difficult to identify and recruit for participation in research, participants with these characteristics are crucial for research in the social,  behavioral, and clinical sciences. Online research in general and crowdsourcing software in particular may offer a solution. […]  Findings suggest that crowdsourcing software offers several advantages for clinical research while providing insight into potential problems, such as  misrepresentation, that researchers should address when collecting data online.



From Horton, Rand & Zeckhauser (2010, Experimental Economics) – The Online Laboratory: Conducting Experiments in a Real Labor Market :

We argue that online experiments can be just as valid— both internally and externally—as laboratory and field experiments, while requiring far less money and time to design and to conduct. In this paper, we first describe the benefits of conducting experiments in online labor markets; we then use one such market to replicate three classic experiments and confirm their results. We confirm that subjects (1) reverse decisions in response to how a decision-problem is framed, (2) have pro-social preferences (value payoffs to others positively), and (3) respond to priming by altering their choices.



From Paolacci, Chandler & Ipeirotis (2010, Judgment and Decision Making) – Running experiments on Amazon Mechanical Turk  :

Although Mechanical Turk has recently become popular among social scientists as a source of experimental data, doubts may linger about the quality of data provided by subjects recruited from online labor markets. We address these potential concerns by presenting new demographic data about the Mechanical Turk subject population, reviewing the strengths of Mechanical Turk relative to other online and offline methods of recruiting subjects, and comparing the magnitude of effects obtained using Mechanical Turk and traditional subject pools. We further discuss some additional benefits such as the possibility of longitudinal, cross cultural and prescreening designs, and offer some advice on how to best manage a common subject pool.



From Rand (2011, Journal of Theoretical Biology) – The promise of Mechanical Turk: How online labor markets can help theorists run behavioral experiments :

I review numerous replication studies indicating that AMT data is reliable. I also present two new experiments on the reliability of self-reported demographics. In the first, I use IP address logging to verify AMT subjects’ self-reported country of residence, and find that 97% of responses are accurate. In the second, I compare the consistency of a range of demographic variables reported by the same subjects across two different studies, and find between 81% and 98% agreement, depending on the variable. Finally, I discuss limitations of AMT and point out potential pitfalls.


[Update March 1st, 2016 : The APS Observer has a great summary article on MTurk : Under the Hood of Mechanical Turk ]


Watch this great overview lecture video about using Amazon Mechanical Turk for academic research (Gabriele Paolacci: The challenges of crowsourcing data collection in the social sciences):

Gabriele Paolacci: The challenges of crowdsourcing data collection in the social sciences
Watch this video on YouTube.


Other articles


Before we begin, I think this article is a MUST read for anyone thinking of using MTurk for academic research :  The Internet’s hidden science factory

From the article, I strongly recommend you watch this following video of a life of one MTurker :

Also see this PBS cover:

Watch this video on YouTube.



Lessons learned (some of these are rather old, I would strongly advise you in revisiting these):

  1. You need to verify that participants read and understand your survey, and that they don’t randomly click their answers. For that I do the following:
    1. After each scenario, I run a quiz to test their understanding.
    2. Obviously, every part includes a check. A manipulation should always be tested, better with more than a single manipulation check.
    3. Add a timer for each page and include a check in your stat syntax to test whether they answered too fast.
    4. Include a funneling section and ask them what the survey was about and set a minimum characters answer. Go over the answers to see who puts in noise. Ofcourse, if you included a manipulation also test for suspicion and ask them what they thoughts the purpose was or whether they can see any connection between the manipulation and your tested DV.
  2. It goes without saying that you should test your survey before setting it off to the wild. But, very important point is to set email triggers and see that the answers you get are what they should be. It happened a few times that I discovered something wrong within the first ten participants, so I stopped the batch, corrected the mistake and restarted everything.

[UPDATE 2013/02/05 my answer to a discussion about this]

  1. The questionnaire should show participants you’re a serious researcher. Meaning :
    1. 2 or 3 comprehension quiz questions about scenarios that they have to get right to proceed to make sure they understood the scenario or what they need to do in a task.
    2. Decoy questions that go in opposite directions and randomized into scales (ones I use often – “the color of the grass is blue” “in the same week, Tuesday comes after Monday” “rich people have less money than poor people” etc.)
    3. Randomizing question and choice sequence for each section.
    4. Adding a funneling section.
    5. Adding a timer to all questions to check how much time they spent on each page and when they clicked on things.
  2. Between subject manipulations are better than a simple survey since different participants see different conditions and hence reduce the chances of simply sharing answers.
  3. There’s no escape from going over the answers in detail, checking the answer timing, checking for duplicates and reading the funneling section.

[end of UPDATE]


For problems with running MTurkers, read :


For the technical details on how to set things up read the following :


There’s also a very helpful blog I strongly recommend that you visit – Experimental Turk which titles itself as A blog on social science experiments on Amazon Mechanical Turk. It hasn’t been updated for a while, but some viable info in there.


Tools :


Survey collection:

Multiple player games:


Further readings:


Alternatives to MTurk:

Got any other MTurk tips? have you had any experience running experiments on MTurk? Do share.

Notify of
Newest Most Voted
Inline Feedbacks
View all comments
7 years ago

To your comment above regarding pay where you said: “One should be careful with money as an incentive for answering questionnaires on MTurk. I’ve actually found that 5 cents a questionnaire may at times yield higher quality results than a 2 dollar reward since it reduces the chance that people merely participate for the money. People still participate for 2-5 cents, and that couldn’t be just for the money in it.” What is your measure for “higher quality data”? Are you speaking in terms of statistically significant differences, or just trends in the data or something else? I ask this… Read more »

Gilad Feldman
7 years ago
Reply to  Shaun

Thanks for the comments. These are important questions and I understand your concerns, but you’re raising a few very different issues. As for ‘high quality data’. As I point out above, I include attention checks and decoy questions throughout my studies, as well as quiz exams to make sure participants understood the scenario/task at hand. Higher quality data means less errors, less failing attention checks, and overall better responding to open-questions and tasks. You’ll notice another post in this blog about honesty, and that’s another factor that often comes into play. In this post I report the bottom line. I… Read more »


[…] wondering about the validity of using mTurk workers to do behavioral psychology experiments, the academic community in psychology, political science, and behavioral economics has embraced this … and find they get similar results from Turk panels as they’ve gotten historically from […]

orly aviv
5 years ago

I’m looking for an help in connecting my WordPress website – brain research study -to Amazone Mechanical Turk. Can you make any recommendations?

5 years ago
Reply to  Gilad Feldman

Hi Gilad, my reserch study is long ,for 3 diffrent groups . i did a wordpress web site. I would like to link the website to Turk.


[…] Link Blog on the Turk: Link Tim Brady Tips: Link Princeton resources: Link Overview of HITs/Turkers: […]