The following post is a review/critique of :
Wong, K. F. E., & Kwong, J.Y. Y. (2007). Effects of rater goals on rating patterns: Evidence from an experimental field study. Journal of Applied Psychology, 92, 577-585.
This article builds on the theoretical suggestion by Murphy et. al (2004) to show that raters with different goals give different ratings by offering the following contributions : (1) Investigating multiple ratings by one rater through looking at mean rating and discriminability, (2) reinforcing the claim that goals affect ratings by empirical manipulation rather than correlation display, and (3) clarifying the causality of rater goals and performance.
I find the method used very interesting yet somewhat puzzling in regards to the meaning that could be derived from such an empirical design. The design is clearly between subjects, where every subject is directed to rate his group-mates according to all goal groups identified by the researchers. This leads to a number of questions :
- Does manipulation of goals really show what the person would do if he would follow that goal or would the rater try to guess what he is being instructed to do and act accordingly? I have serious doubts regarding the 1st. Asking a person to (imagine how he would) rate according to a goal he does not follow is a bit like asking a western “Individualism” expat in Hong Kong to try and think from a local “Collectivism” point of view. It will most likely follow stereotypes and be very biased towards those than really say something about how a collectivist society would see things. With that said, there is a growing group that suggests this is atleast partly possible (trigger different cultural values by simply manipulating the language used), but I’m still a bit skeptic to the extent that I believe the person needs to atleast have a representation of those goals or values within his cognitive reach.
- Are goals so easily manipulated that they can be done by repeated manipulation and the simple quick instruction of goal change? Even though goals are not very stable they should be stable to some extent, at minimum for the short-term in which the manipulation here was performed.
- There is almost a paradox in the way the goal groups are defined and checked, which is my biggest concern regarding the method used. Goals are defined according to the authors’ perception and theory of how the raters will understand and interpret that goal and the raters are then asked to rate according to that. Results show that raters sometimes do and sometimes don’t follow the authors’ intent suggesting that the authors and raters might have different ideas regarding what the goals mean. It would have been easier to just directly instruct them on how they should rate – so rather than telling raters to rate by “Harmony” the authors might as well could have said “Please discriminate less and give higher scores” and surely the effect would have been stronger, but ofcourse that’s not what we’re aiming at. Assessing raters’ perception of goal groups might have been relevant if the article was trying to identify goal groups and determine their universality but the main point of the article to try and say something about how a person’s goals affects their ratings, but I’m afraid I have a hard time seeing that here.
The authors’ “practical implication” suggestion to clearly tell people how they should rate is important to minimize goal discrepancy but in a way this is exactly what the method used did within the goal groups. This point is indirectly suggested but insufficiently addressed at the end of the article (page 584, 3rd point).
- To get a better insight into the process and to maybe answer some of the concerns from the previous point it perhaps could have been relevant to first let the raters rate as they wish, let them rate according to all goals and then ask which one of those ratings the rater finds is closest to his original rating.
- Murphy et. al and perhaps this study as well were also trying to say something about managers’ appraisal of their subordinates. Murphy’s study tested students rating their professors and this study tested students rating classmates. Perhaps, we could have generalized it a bit further if we also had a class where the group was instructed to select a “group leader” that would atleast partly simulate the actual organizational context.
- It’s interesting to try and understand the big differences between Time 1 and Time 2 in for all goal groups. Looking at the construct I thought there might have been a bias, assuming I understand correct, in the sense that at Time 1 the subjects were introduced with the rating system for the first time. Another difference between Time 1 and Time 2 is that for Time 2 the raters (who are also rates) already saw the feedback from Time 1 a week after it was given. Thus, Time 2 was better thought out by the raters and might have been influenced by wanting to reciprocate. Adding a Time 0 shortly after the beginning of the semester that could be removed from the analysis might have helped addressed that.
With all that said, this article definitely shows an interesting, useful, very practical and relatively easy empirical method which offers a significant contribution in advancing Murphy’s goals influence rating idea into multiple ratees and between subject design. This also suggests some possible new directions for some other research topics I wasn’t sure how to approach empirically.
Incoming search terms:
- rater goals
- rater patterns -problems with
goals,
manipulation,
murphy,
raters,
ratings
JAP,
Micro