Featured Post

Welcome to the Forensic Multimedia Analysis blog (formerly the Forensic Photoshop blog). With the latest developments in the analysis of m...

Wednesday, August 7, 2019

Where'd you get the 10?

When the "Search for Images on the Web" functionality was introduced into Amped's Authenticate some time ago, I asked a simple question of the development team, "where'd you get the 10?"


Amped SRL prides itself on operationalizing peer-reviewed published papers in image science. I assumed, wrongly it seems, that there was some science behind the UI's default setting for how many pictures you'd like to find. There isn't.

What I found out, at the time, is that the "Stop After Finding At Least N Pictures. N is" default of 10 is set to 10 for no particular reason whatsoever. My own opinion is that it's set to 10 because the developers are engineers and 10 - a one and a zero - looks nice. The 10 has no foundation in science / statistics as a valid number for that field. If you accept the default as presented, you're creating a convenience sample set that will give you more chances of being wrong than being right. Here's why.

The basic question being tested in the developer's example, with help from the "Search Images From Same Camera Model" dialog, is "match / no match." Does the evidence item match a representative sample of images from the same make / model of camera? You would perform this check when your evidence item's JPEG QT is not found in your tool's internal database (this is a known limitation of all software that is dependent upon an internal database). Another way of framing "match / no match" is a Generic Binomial Test.

Here's what the sample size calculation looks like for a generic binomial test performed in the criminal justice context (as opposed to the university / research context).

Analysis: A priori: Compute required sample size 
Input: Tail(s)                  = Two
Proportion p2            = 0.8
α err prob               = 0.01
Power (1-β err prob)     = 0.99
Proportion p1            = 0.5
Output: Lower critical N         = 19.0000000
Upper critical N         = 40.0000000
Total sample size        = 59
Actual power             = 0.9912792

Actual α                 = 0.0086415

A two-tailed test has a better ability to limit Type I and Type II errors vs. a one-tailed test. α and β error probability are set as low as possible, one chance per 100. This protocol yields a recommended sample size of 59. Not 10.


Around 20 samples, you have more chances of being wrong than being right. At 10 samples, you have about 8 chances in 10 of being wrong.

The "match / no match" scenario is quite different than an attempt to establish "ground truth," or what the camera "should be producing" when it creates an image. For these tests, a "performance model" is required and is generated by samples created by the device in question. Each camera will present it's own issues and the results of your test of the camera in question can't be applied to other cameras of the same make / model. You'll need to know a bit about the signal path - from light coming in to the resulting image's storage - to be able to determine the correct value for the "predictors" variable. In my last experiment, that value was 17 ... 17 different parts / processes that could possibly be in error. In that case, the sample size calculation was as shown below.

t tests - Linear multiple regression: Fixed model, single regression coefficient

Analysis: A priori: Compute required sample size 
Input: Tail(s)                       = Two
Effect size f²                = 0.15
α err prob                    = 0.01
Power (1-β err prob)          = 0.99
Number of predictors          = 17
Output: Noncentrality parameter δ     = 4.9598387
Critical t                    = 2.6099227
Df                            = 146
Total sample size             = 164
Actual power                  = 0.9900250

In this case, I needed to generate 164 valid files for my sample set - not 10. The plot is shown below.


With only 10 samples for such a test, you would have 9 chances in 10 of being wrong.

Another scenario where the "official guidance" on sample sizes is quite off is with PRNU. The official guidance notes that the number of reference images in building a reference pattern is limited to 50, "(the number of images is limited to 50, which proved to be enough – we’ll also discuss this in a future post)." But, from the source documentation (link), the authors recommend more than that, "Obviously, the larger the number of images Np, the more we suppress random noise components and the impact of the scene. "Based on our experiments, we recommend using Np > 50" (link). The authors of the source document actually used 320 samples per camera in proving out their theories - not 10 or 50. Other researchers examining PRNU have used even larger sample sets (link) (link). These three links weren't chosen at random to attempt to reinforce my point. These are the references found in the processing report generated by Amped's Authenticate - further illustrating the point about validation of tools and methodologies. (Many thanks to Dr. Fridrich for maintaining a massive list of source documentation.)


But, as my old college football coach used to say, "the wrong way will work some of the time, the right way will work all of the time." Simply choosing a convenient number of samples may not be a problem in a justice system that has a 95% plea rate. But, as recent news points out, if you get caught out for bad methodology, all of your past work gets re-opened. Don't let this happen to you. Use sound methods. Get educated on the foundations of your work.

Remember, tools like Amped SRL's Authenticate don't render an opinion as to a file's authenticity - you do. Your opinion must be backed by a sound methodology, compliance with standards, and the fundamentals of science.

Correcting the lack of understanding of this vital topic was on the 2009 NAS Report's list of recommendations (link).

"The issues covered during the committee’s hearings and deliberations included:
  • (a) the fundamentals of the scientific method as applied to forensic practice—hypothesis generation and testing, falsifiability and replication, and peer review of scientific publications;
  • (b) the assessment of forensic methods and technologies—the collection and analysis of forensic data; accuracy and error rates of forensic analyses; sources of potential bias and human error in interpretation by forensic experts; and proficiency testing of forensic experts;
  • (c) infrastructure and needs for basic research and technology assessment in forensic science
  • (d) current training and education in forensic science; ..." pg. 3
10 years later, and vendors are still ignoring the NAS' recommendations, often providing incorrect information to customers.

The scientific method forms the foundation of all of the forensic science training offerings that I've created over the years. Illustrating error rates, where they appear in the work, and how to calculate and control for them can be found in all of our forensic science courses.

If you'd like to move beyond "push-button forensics," I hope to see you in class soon.

No comments: