Clarifications regarding Heyman and Ariely (2004) replications and extensions and Expression of Concern by Psychological Science

Follow and Like:
Follow by Email
Follow Me

On July 23, 2021 Psychological Science issued an “Expression of Concern: Effort for Payment: A Tale of Two Markets” regarding the article by Heyman and Ariely (2004). It details the role of our team in detecting what led to this expression of concern, and so this post is meant to provide the full background details from our side, as the expression of concern does not tell the entire story. Our analysis of the article was part of a much larger effort to replicate Heyman and Ariely (2004) and for the most part we have been able to replicate the phenomenon, with two samples and using different design, with one adjustment that relates to us finding an effect where none was expected.


Please see our preprint: Rewarding more is better for soliciting help, yet more so for cash than for goods: Revisiting and reframing the Tale of Two Markets with replications and extensions of Heyman and Ariely (2004)


The expression of concern explained:

This statement is an Expression of Concern regarding the article “Effort for Payment: A Tale of Two Markets” (Heyman & Ariely, 2004) published in Psychological Science. This Expression of Concern is prompted by some uncertainty regarding the values of statistical tests reported in the article and the analytic approach taken to the data. The corresponding author of the article and coauthor of this statement, Dan Ariely, attempted to locate the original data in an effort to resolve the ambiguities but was unsuccessful. Because the ambiguities cannot be resolved, we decided to issue an Expression of Concern about the confidence that can be held in the results reported in the article.


And regarding our role they wrote:

The ambiguity originally was brought to the attention of the Editor in Chief by Gilad Feldman, Hirotaka Imada, Wan Fei Chan, Yuk Ki Ng, Lee Hing Man, Mei Sze Wong, and Bo Ley Cheng. These researchers ran the article through statcheck (Epskamp & Nuijten, 2018), an R package that is designed to detect statistical errors (much like spell check and grammar check in Word). The program searches articles for statistical results, recalculates the values, and compares the reported and recalculated values to determine whether they match. Statcheck is now required for all articles published in Psychological Science. However, it was not available at the time of submission and acceptance of Heyman and Ariely’s article. The statcheck analysis conducted by Feldman and colleagues produced some discrepancies between the statistical values reported in the article and those determined by the recalculation. Gilad Feldman notified the editor of these discrepancies.



I am glad to see Psychological Science follow-up with this “Expression of Concern”, and I appreciate Dan Ariely being a part of this process and the corresponding author for this update.



In the last 3 years at the University of Hong Kong I have been running a project of mass replications and extensions of classic findings in judgment and decision making and social psychology. Students in my courses and working with me on their guided thesis, from 2nd year undergraduate to taught masters programs, work on reproducing and replicating impactful articles in my research domains. Furthermore, in the last year we have also begun doing meta-science “research assessments”, where students examine published articles and replications of that work, on a variety of factors (see our template), and we ran this on both our projects (examples) and some of the mass collaboration replication projects (examples).

The aim is to demonstrate the importance of mass mobilizing our students, as early as undergraduate, in revisiting and reassessing our published literature, so we can build trustworthy credible foundations in our field.


So far, we completed about a 100 replications and extensions, so far with 68% of the replications successful, 13% mixed/inconclusive, and 19% failed. It’s important to note that even in the mixed/inconclusive and the failed replications, the students have been able to identify issues in the original and with analyses and extensions suggest a way forward towards improving in future research. We submit all our replications for journal peer review by inviting early-career researcher to help us validate and expand on the students’ work and lead the project through the long journal peer-review process. In 2020 we published 12 of these student and early-career team projects, there are about 15 more currently under review, and about a dozen more in advanced stages pending submission. Our team currently includes about 50 early-career researchers, and over 300 students took part in these projects. I’ll return to why these stats are important in my final note below.


We have been doing this with many articles in our field, as part of an effort to replicate the classics. The targets are chosen by me, and I choose work I feel has been important and influential in my domains of interest. I choose targets for replication as a tribute to this work, in directions that I think are promising. Targets are not chosen because of any concerns I had, quite the contrary.


A summary of our project where I explain what we did so far can be found on:


Heyman and Ariely (2004)

I assigned this target article to 2 teams of 2 4th year UG students taking my course, working independently, with 2 planned data collections, one US Americans on MTurk, and one British on Prolific Academic. We also mixed it up a bit, and in the second replication we also adjusted the original design from a between subject to a within subject design.


I feel it is important to begin by saying that for the most part we have successfully replicated the findings by Heyman and Ariely (2004) Study 1, twice, with different samples, using different designs. The main difference in our results was that regarding one analysis we were able to “detect” an effect where none was expected, which we thought required a slight reframing of the theory/phenomenon. This difference was probably due to our samples being much better powered than the original. During that process, we identified some issues with the reporting of the statistics in the original article, yet proceeded with the replications regardless with some needed adjustments for the uncertainty (bigger samples).

We note that the errors identified using statcheck have already been automatically posted to PubPeer in 2016. We were by no means the first ones to notice these inconsistencies, and many articles in the field flagged by statcheck can be found on PubPeer, and have not received any attention or handled by original authors or journal editors.


We wrote up the replications and submitted those to Psychological Science, which – given the pottery barn rule – we thought was the right target to publish it, especially given that we have identified issues in the reporting in the original that we felt should be addressed and accompanied with extensive replications of this work.

Unfortunately, the submission of our completed replications was desk rejected from Psychological Science since they only accept replications if those are in the format of Registered Reports (first peer review over the pre-registration plan), which we cannot do given that our replications are conducted as part of a one-semester UG course with strict schedules.


A copy of our preprint is available on: Rewarding more is better for soliciting help, yet more so for cash than for goods: Revisiting and reframing the Tale of Two Markets with replications and extensions of Heyman and Ariely (2004)

The supplementary includes our analysis of the target article.


The editor of Psychological Science, Prof. Patricia J. Bauer, who was the editor for our manuscript, was concerned by the inconsistencies we identified and proceeded to follow up with the original authors. She later contacted me to inform us that the journal has decided to issue an expression of concern for this article.


About our process

As an initial step, students conduct an analysis of the original article and do a power analysis to determine the required sample size for the replications. For the power analysis, the students use whatever information is available in the original article to calculate the effect size from the provided statistics. If information is not provided, they use indirect ways. For example, in this article, we analyzed the figure using WebPlotDigitizer to estimate the numbers from the plot in the figure, and back calculated the effect size from the provided statistics.


The students had some difficulties with Heyman and Ariely (2004) and when they discussed these challenges with me we soon realized that there were probably some errors in the reporting of these statistics, or that there is something about the way these are calculated that is not well specified in the article. Running the manuscript through tools like statcheck showed some of these issues. We were therefore not always able to calculate the effect size needed for our power analysis. This is why we tried to aim for very large samples in the two replications (US MTurk is 2203, UK Prolific is 999).


My takeaway message

Students and early-career researchers are the key to the “credibility revolution” and they are our most underappreciated underutilized stakeholder. We have shown that undergraduates, as early as 2nd year, can help us reexamine our literature and conduct high-quality replications and extensions that meet the standards of the best journals in our field. Students repeatedly describe this experience as meaningful and they are enthusiastic about being part of actual hands-on science process that contributes back to the community.


Some have called for paid “red teams” or paid journal peer review as a solution to assessing our findings, yet in my experience and humble view the fastest most impactful path towards a real change is working together with our students and early-career researchers. In our project we developed many resources, templates, examples, and teaching materials that allow any teacher or researcher to do the same in their courses, and all are openly available on my website (see “Get involved: Resources”). The team and I would be delighted to help anyone start a similar initiative in their universities and departments. We call researchers to let us know how we can support them in their open-science journey.




Other relevant links:


Followups I noticed:


Feel free to comment if you have any questions and/or concerns.

Notify of

Inline Feedbacks
View all comments