You are here

Interrater Reliability

18 November, 2015 - 11:49

Many behavioral measures involve significant judgment on the part of an observer or a rater. Interrater reliability is the extent to which different observers are consistent in their judgments.

For example, if you were interested in measuring college students’ social skills, you could make video recordings of them as they interacted with another student whom they are meeting for the first time. Then you could have two or more observers watch the videos and rate each student’s level of social skills. To the extent that each participant does in fact have some level of social skills that can be detected by an attentive observer, different observers’ ratings should be highly correlated with each other. If they were not, then those ratings could not be an accurate representation of participants’ social skills. Interrater reliability is often assessed using Cronbach’s α when the judgments are quantitative or an analogous statistic called Cohensκ (the Greek letter kappa) when they are categorical.