Friday, December 15, 2017

Sample sizes and determinations of match

It's been a busy fall season, traveling the country and training a whole bunch of folks. Over a lunch, the group I was with asked me about a case that's been in the news and wondered if we'd be discussing how to conduct a comparison of headlight spread patterns. That lead us down quite the rabbit hole ...

Comparative analysis assumes a "known" and compares it to an "unknown." It's important to consider time & temporality - that one can only TEST in the present - in the "now." For the future / past, one can only PREDICT what happened "then." Testing and Prediction have their own rules.

Take the testing of an image / video of a headlight spread pattern. One attempts to compare the "known" vs. a range of possible "unknowns." Our lunch group mentioned a case where the examiner tested about a dozen trucks in front of the CCTV system that generated the evidentiary video in addition to the vehicle in evidence to try to make a determination. The examiner did in fact determine match, as the report indicated.

The question really isn't the appropriateness of the visual comparison. The question is the appropriateness of the sample size such that the results can be useful / trusted. How did the examiner determine an appropriate sample size? Is a dozen trucks appropriate?

Individual head light direction can be adjusted. Headlights come in pairs. Thus, there are two variables that are not on/off. In the world of statistics, they're continuous variables. You're testing two continuous variables against a population of continuous variables to determine uniqueness. Is this possible in real life? What's the appropriate sample size for such a test?

I use a tool called G*Power to calculate sample size. Just about every PhD student does. It's free and quite easy to use once you learn to speak it's language. Most, like me, learn it's language in graduate level stats classes.

For example, if you've determined that an F-Test of Variance - Test of Equality is the appropriate statistical test needed to conduct your experiment, then select that test using G*Power.



Press the Calculate button, and G*Power calculates the appropriate sample size. In this case, the appropriate sample size is 266. There's a huge difference between 266 and a dozen. You can plot the results to track the increase in sample size relative to Power. If you want greater confidence in your results (Power), you need a larger sample size.

The examiner's report should include a section about how the sample size was created and why the test used to calculate it was appropriate. It should have graphics like those below to illustrate the results.


It's vitally important that when conducting a comparative exam and declaring a "match", that the examiner understands the necessary science behind that conclusion. "Match" usually does not mean "a Nissan Sentra." That's not helpful given the quantity of Nissan Sentras a given region. "Match" means "this specific Nissan Sentra." Isn't the standard, "Of all the Nissan Sentras made in that model year whithersoever dispersed around the globe, it's only this particular one and no other?"

What about the test? Did you choose the appropriate test?

What if, on the other hand, you determined that the appropriate test is a T-test like Wilson's sign-ranked test, then the sample size would be different. With that test, the appropriate sample size would be 47. That's still not a dozen.


What happens if you like the T-test and opposing counsel's examiner likes the F-test? What happens when two examiners disagree? Do you have the education, training, and confidence to defend your choice and your results in a Daubert hearing?

Perhaps you've been trained in the basics of conducting a comparative examination. But have you been trained / educated in the science of conducting experiments? Do you know how to choose the appropriate tests for your questions? Do you know how to structure your experiment? Do you know how to calculate the appropriate sample size for your tests?

To wrap up, when concluding that a particular vehicle can't be any other because you've compared the head light spread pattern in a video to several vehicles of the same model / year, it's vitally important to justify the sample size of comparators. You must choose the appropriate test and calculate the sample size based on that test. ASTM 2825-12's requirement that one must produce a report such that another similarly trained / equipped person can reproduce your work means that you must include your notes on the calculation of the sample size. If you haven't done this, you're just guessing and hoping for the best.

No comments: