Friday, July 12, 2013

Reliability and validity

With all the chatter on the Zimmerman trial and the exclusion of the State's voice "experts," I wanted to take a moment to look at reliability and validity from a science standpoint. To do this, I'll enlist the help of one of my favorite authors on the subject, William M.K. Trochim, Professor in the Department of Policy Analysis and Management at Cornell University.

"Reliability has to do with the quality of measurement. In its everyday sense, reliability is the "consistency" or "repeatability" of your measures. Before we can define reliability precisely we have to lay the groundwork. First, you have to learn about the foundation of reliability, the true score theory of measurement. Along with that, you need to understand the different types of measurement error because errors in measures play a key role in degrading reliability. With this foundation, you can consider the basic theory of reliability, including a precise definition of reliability. There you will find out that we cannot calculate reliability -- we can only estimate it. Because of this, there a variety of different types of reliability that each have multiple ways to estimate reliability for that type. In the end, it's important to integrate the idea of reliability with the other major criteria for the quality of measurement -- validity -- and develop an understanding of the relationships between reliability and validity in measurement.

We often think of reliability and validity as separate ideas but, in fact, they're related to each other. Here, I want to show you two ways you can think about their relationship.

One of my favorite metaphors for the relationship between reliability is that of the target. Think of the center of the target as the concept that you are trying to measure. Imagine that for each person you are measuring, you are taking a shot at the target. If you measure the concept perfectly for a person, you are hitting the center of the target. If you don't, you are missing the center. The more you are off for that person, the further you are from the center.

The figure above shows four possible situations. In the first one, you are hitting the target consistently, but you are missing the center of the target. That is, you are consistently and systematically measuring the wrong value for all respondents. This measure is reliable, but no valid (that is, it's consistent but wrong). The second, shows hits that are randomly spread across the target. You seldom hit the center of the target but, on average, you are getting the right answer for the group (but not very well for individuals). In this case, you get a valid group estimate, but you are inconsistent. Here, you can clearly see that reliability is directly related to the variability of your measure. The third scenario shows a case where your hits are spread across the target and you are consistently missing the center. Your measure in this case is neither reliable nor valid. Finally, we see the "Robin Hood" scenario -- you consistently hit the center of the target. Your measure is both reliable and valid (I bet you never thought of Robin Hood in those terms before)."

So, yes, I can hear you now ... but you're talking about social research, like the stuff that you're doing for your PhD. Ok, sure. But think about it for a second. You call yourself an image analyst or a video analyst. Those fields have specific domains. Those domains involve measurements of one type or another. So, I ask you, are we really talking about two different things? Or, does the scientific method work across a number of academic and professional disciplines? I would argue that it does.

No comments: