Monday, January 14, 2019

Test your report's readability

One of the concepts that I tend to repeat when training folks in the forensic sciences is that our work should target the last mechanical device that will display or project our work products as well as targeting the combined perceptual abilities of the Trier of Fact. Working in this way, there will be no surprises when it comes to presenting your work.

The same is true for your reports. Your reports will make sense to you. You wrote them. They'll make sense to your quality control staff (your reviewers) as they tend to exist in the same culture and climate as you. But, will they make sense to the Trier of Fact - without you having to explain it to them?

There is functionality within our toolset to help with this question - how readable is my report? If you're using MS Word to draft your reports, it's actually quite easy to set this up.

  • Click the File tab, and then click Options.
  • Click Proofing.
  • Under When correcting spelling and grammar in Word, make sure the Check grammar with spelling check box is selected.
  • Select Show readability statistics.



After you enable this feature, open a file that you want to check, and check the spelling by pressing F7 or going to Review > Spelling & Grammar. When Word finishes checking the spelling and grammar, it displays information about the reading level of the document.


Each readability test bases its rating on the average number of syllables per word and words per sentence. The following sections explain how each test scores your file's readability.

Flesch Reading Ease test (references)

Originally developed for the US Navy in 1975, this test rates text on a 100-point scale. The higher the score, the easier it is to understand the document. For most forensic science processing / analysis reports, you want the score to be between 55 and 70. Given that we'll have to use standard scientific terminology, it will be difficult to achieve readability scores higher than 70.

The formula for the Flesch Reading Ease score is:

206.835 – (1.015 x ASL) – (84.6 x ASW)

where:

ASL = average sentence length (the number of words divided by the number of sentences)

ASW = average number of syllables per word (the number of syllables divided by the number of words)

Flesch-Kincaid Grade Level test

This test rates text on a U.S. school grade level. For example, a score of 8.0 means that an eighth grader can understand the document. For your reports, aim for a score of approximately 7.0 to 10.0. It will prove difficult to bring these values down, as noted above, due to our use of scientific language which gets averaged into the total score.

The formula for the Flesch-Kincaid Grade Level score is:

(.39 x ASL) + (11.8 x ASW) – 15.59

where:

ASL = average sentence length (the number of words divided by the number of sentences)

ASW = average number of syllables per word (the number of syllables divided by the number of words)


The Readability Statistics shown above is from a raw authentication report - before editing and before the insertion of the plain English explanations for each of the processes.

Given that about 95% of cases plea and never see the inside of a court room, it's vitally important that your reports be readable.  95% of your reports will be read, interpreted, and acted upon without your being present to help the reader understand what you said / meant. With this simple tool that is built into many word processing applications, you can assure that your reports are readable, and at what grade level.

If you're using Google Docs, you'll need to run your report through another app or web site. Readability Statistics were removed some time ago.

Enjoy.

Tuesday, December 18, 2018

Sample Sizes and Speed Calculations, oh my!

There's been a lot of talk lately about using the footage from a DVR to determine the speed of an object depicted on the video. In my classes on the topic, I explain how to set up the experiment to validate the results of your tests. In this post, I want to present a few snapshots of the steps in the validation.

It's been well documented that DVRs are not Swiss chronographs, they're mostly a cheap box of random parts. The on-screen time stamps have been shown to be "estimates" and "approximations" of time - not entirely reliable. It's also well documented that the files' metadata contains hints about time. Let's take a look at a few scenarios.

Test 1: Geovision 


The question in this case was, is the particular evidence item reliable in it's generation of frames such that the frame rate metadata could be used for speed calculation?

A sample size calculation was performed to see how many tests would need to be performed to build a model of the DVR's performance. In this way, we'd know if the evidence item was typical of the performance of the DVR or a one-time error.


Analysis: A priori: Compute required sample size 
Input: Tail(s)                  = One
Proportion p2            = 0.8
α err prob               = 0.05
Power (1-β err prob)     = 0.99
Proportion p1            = 0.5
Output: Lower critical N         = 24.0000000
Upper critical N         = 24.0000000
Total sample size        = 37
Actual power             = 0.9907227

Actual α                 = 0.0494359

The calculation determined that a total of 37 tests (sample size = 37) would yield the best results of our generic binomial test (works correctly / doesn't work correctly). On the other end of the graph, for a sample size less than 10, a coin flip would have been more accurate.

The Frame Analysis shown above, generated by Amped FIVE, seems to indicate that the I Frame generation is fairly regular and perhaps the P Frames are duplicates of the previous I Frame. You'd only get this information from an analysis of the metadata - plus a hash of each frame. Nothing viewed on-screen, watching the video play, would give you this specific information.

It's important to note that this one video represents one channel in a 16-channel DVR. The DVR features motion / alarm activation of it's recordings. It took a bit of digging to find out how many camera streams were active (motion / alarm) during the time the evidence item was being recorded.

With the information in hand, we created an experiment to reproduce the original recording conditions. 

But first, a couple of important definitions are needed.

Observational Data: It's an observational study which they observe things in different circumstances over which the researcher has no control.

Experimental Data: It's data that is collected from an experimental study that involves taking measurements which can be controlled. 

Our experiment in this case was "experimental." We were able to control which of the DVRs channels were actively recording - when and for how long.

With the 37 tests conducted and the data recorded, it was determined that the average recording rate within the DVR - for the channel that recorded the evidence item - was effectively seven seconds per frame. Essentially, the DVR was so overwhelmed with data that it could not process all of the incoming signal effectively. It did the best that it could, but in it's "error state," the I Frames were copied to fill the data container. Some I Frames were even duplicates of previous I Frames. This was likely due to a rule about the fps needed to create the piece of media - the system's native container format was .avi.

Test 2: Dahua 

In a second test, a "generic" black box DVR was tested. The majority of the parts could be sourced to Dahua (China). The 16 camera system outputs a native file with a .264 extension. 

The "forensic applications," FIVE included, are all based on FFMPEG for the "conversion" of these types of files. After conversion, the report of the processing indicated that the fps of the output video was 25fps. BUT, this was recorded in the US. 

Is 25fps the correct rate?
Is 25fps an "error state?"
If 25fps isn't the "correct rate," what should it be?

In this case, the frame rate information in the original container was "non-standard."  As such, FFMPEG had no way of identifying what "it should be" and thus defaulted to 25fps - the output container needs to know it's playback rate. Why 25fps? FFMPEG is French - where the playback rate (PAL) is 25fps.

In this case, we didn't have an on-screen timestamp to begin our work. Thus, we needed to conduct and experiment to attempt to calculate an effective frame rate for this particular DVR. This begins with a sample size calculation. How many tests do we need to build the model of the DVR's performance.


t tests - Linear multiple regression: Fixed model, single regression coefficient

Analysis: A priori: Compute required sample size 
Input: Tail(s)                       = One
Effect size f²                = 0.15
α err prob                    = 0.05
Power (1-β err prob)          = 0.99
Number of predictors          = 2
Output: Noncentrality parameter δ     = 4.0062451
Critical t                    = 1.6596374
Df                            = 104
Total sample size             = 107
Actual power                  = 0.9902320

In order to control for all of the points of data entry into the system (16 channels) as well as data choke points (chips and busses), our sample size has increased quite significantly. It's not a simple yes/no test as in Test 1 (above).

A multiple linear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. Essentially, how do the channels, busses, and chips (independent / control variables) influence the resulting data container (dependent variable)?

The tests were run and the data assembled. It was found that the effective frame rate of the channel that recorded our evidence was 1.3526fps. If you had just accepted the 25fps given to you by FFMPEG, the display of the video would have been inaccurate. Using the 25fps for the speed calculation would also yield inaccurate results. Having the effective frame rate, plus the validation of the results, helps the entire process trust your results.

It certainly helps that my tool of choice in working with the video data, Amped FIVE, contains the tools necessary to analyse the data. I can hash each frame (hash is now part of the Inspector Tools in FIVE). I can analyse the metadata (see above). Plus, I can adjust the playback rate precisely (Change Frame Rate filter).


These examples illustrate the distinct difference between what one "knows" and what one can "prove." We can "know" what the data stream tells us via the various frame analysis tools that are available. We can "prove" the validity of our results by conducting the appropriate tests, utilizing the appropriate number of tests (sample size).

If you are interested in knowing more about this topic, or if you need help on a case, or if you'd like to learn how to do these tests yourself, you can find out more by clicking here.

Tuesday, October 9, 2018

Peer Review?

What is peer-review? What is the role of the peer-reviewer? In reviewing academic / scientific research, is peer-review different than reviewing an academic / scientific work-product?

Let's examine the questions. But first, some definitions.

What is peer-review? As a noun, peer-review is an evaluation of scientific, academic, or professional work by others working in the same field. As a verb, peer-review is the action of subjecting (someone or something) to a peer review.

According to the John Jay College of Criminal Justice [source]: "In academic publishing, the goal of peer review is to assess the quality of articles submitted for publication in a scholarly journal. Before an article is deemed appropriate to be published in a peer-reviewed journal, it must undergo the following process:
  • The author of the article must submit it to the journal editor who forwards the article to experts in the field. Because the reviewers specialize in the same scholarly area as the author, they are considered the author’s peers (hence “peer review”).
  • These impartial reviewers are charged with carefully evaluating the quality of the submitted manuscript.
  • The peer reviewers check the manuscript for accuracy and assess the validity of the research methodology and procedures.
  • If appropriate, they suggest revisions. If they find the article lacking in scholarly validity and rigor, they reject it.
Because a peer-reviewed journal should not publish articles that fail to meet the standards established for a given discipline, peer-reviewed articles that are accepted for publication should exemplify the best research practices in a field." Here's a nice explanation of the difference between Scholarly (peer-reviewed) vs Popular articles (link to video).

Thus, peer-review is an evaluation of the accuracy and validity of a paper / article submitted for publication in a scholarly journal performed by an impartial group of anonymous others working in the same field.

How does one perform a review?

How does one check for accuracy and validity? Start with structure. As our field is a science, forensic science, is the paper divided into sections with headings such as those listed below?
  • Introduction
  • Literature review
  • Theory or Background
  • Subjects
  • Methods
  • Results
  • Discussion
  • Conclusion
Does the article have footnotes or citations of other sources? This questions asks, is it easy for the reader / reviewer to move between the topic and the reference  Does the article have a bibliography or list of references at the end? This question speaks to the organization of references. Each journal will have it's standards in terms of how references are handled. Does the item under review follow the journal / publication's standard (APA, ACS, Harvard, etc.)? If you are reviewing, are you familiar with the formatting rules used by the publication? Are the author's credentials listed? Is the author a subject matter expert? Do their credentials check out?

In reviewing for accuracy, are the referenced sources available to you (the reviewer). If yes, great. Do the references support / reject the author's work? If not, likely they won't be available to the eventual reader. You may wish to suggest to the author that they find source for reference that is more generally available to the audience. An example of this can be seen in authors that cite an obscure source, unavailable to the reader, in an appeal to authority (logical fallacy) to support a novel idea - knowing that the reviewer / reader will have no way of verifying (checking the accuracy) of the assertion.

Reviewing for accuracy is much more than simply checking the author's math. Many of the problems with papers come from the author's mistakes / misuse of referenced materials.

How does a journal or periodical choose peer-reviewers?

The answer to this question is both simple and complex. The short, simple answers is ... no one knows. Leading to the complex answer, it depends. It depends upon the topic, the availability of qualified subject matter experts, the publication schedule vs. the reviewers' schedules, etc. The answers to these questions help you evaluate the quality of the publication.

In evaluating the quality of the publication, are you (the reader) able to access the peer-review criteria? Does the journal publish a list of it's peer-reviewers by topic? In other words, how do you (the reader / consumer of the information) know if a paper has been evaluated by qualified subjected matter experts?

Are there peer-reviewed journals or publications that cover the forensic science domain of forensic multimedia analysis?

Yes. But they are few and far between.

The IAI publishes the Journal of Forensic Identification (JFI) at $205/yr. The JFI deals primarily with issues around latent prints, having only 6 articles that touch multimedia evidence since 2010 (mostly case reports). It's available to members of the IAI and subscribers. If you're an academic and / or have access to the EBSCO Criminal Justice database service or the National Criminal Justice Reference Service, you can find it there. In all, it's not a popular place to find cutting-edge articles in our discipline.

The American Academy of Forensic Sciences publishes the Journal of Forensic Sciences (JFS) at $354/year. Just as the IAI, there are few articles of value to the forensic multimedia analyst. Also like the IAI, if you're not a subscriber or an academic, you'll not be able to access the few editions and articles that are relevant to our discipline.

There's also the Digital Forensic Research Workshop (DFRWS). It doesn't publish a journal, it has workshops around the world. Some of it's papers can be found on it's web site.

In all, there are a lot of problems with the above listed journals in terms of the validity of the articles presented as "research." Two important criteria for judging the adequacy of research are internal validity, the study’s success at establishing causality, and external validity, the study’s generalizability to other settings and other times. For the case studies, no attempt at internal validity is generally offered. The authors simply list a series of events taken in their work. Because of this, the external validity is non-existent. The results of their work can only be applied to their work.

In examining the work found in the above referenced sources, a question emerges: is there a difference between academic / scientific journals and the journals available to forensic multimedia analysts?

The answer, unfortunately, is yes. The work found in the JFI can largely be considered case reports. Case reports do not advance science, they report on an analyst's work on a single case. The analyst might have figured out how to do something, but those results are confined to that specific set of circumstances. Nowhere in the JFI, for example, does another analyst take someone's case notes and attempt an experiment in an attempt to validate the method, or create an experiment that builds out the case information such that the results can be applied to a greater population of cases.

Where do we go from here?

This is a topic that I've been exploring in some depth lately. I work with a research organization that seems like it may want to explore the creation of a journal that publishes novel work, at a greater frequency, and with results available to the greater community without fee. Stay tuned on that ...

Sunday, August 12, 2018

Scientists as Stoics?

I have made no secret of my academic pursuits. I have been an educator in the classic liberal arts and sciences for some time now. I am of the firm belief that the classics inform every aspect of adult life. I'm one of the few out there that believes that people should not hyper-specialize in their educational pursuits, but should have a broad knowledge set. Save the specialization for doctoral / post-doctoral work.

I have also made no secret of my athletic pursuits. The Russian martial art of Sambo has within it a provision of rank that factors not only your competition appearances and wins, but also your reach as a coach. How many people have you assisted in their path to success? The Russian martial art of Systema grounds one in an ethical foundation that effortlessly considers the consequences of action / non-action in everything. This mindfulness becomes a part of taking every breath. To achieve it's goals, Systema seeks to remove the practitioner from the attention of a potential threat, rather than boastfully seeking every violent encounter.

My love the many martial systems that I have studied and trained do inform my work as a forensic scientist, as does my love of the classics and the pursuit of knowledge.

It's with this in mind that I share this post with you today. I've spent a lot of time traveling this summer. I've been criss-crossing the country spreading the good news of science. I've also been stuck in airports and on the tarmac enduring endless delays. Thankfully, I have a Kindle and can engage in one of my other favorite pursuits, reading.



I came across William Ferraiolo, PhD, and his book via a friend on social media. As someone who teaches and lectures on philosophy, religion, and politics, I'm always looking for fresh insight on the classics. Meditations of Self-Discipline and Failure is just that.

It was quite refreshing to read this book, especially in light of the current social media driven culture. Everyone on LinkedIn is an "expert." Everyone on Instagram is a cultural influencer. Everyone on Facebook is having a great time eating every meal at some amazing destination. Real life, I'm afraid, isn't at all like that. I think that so many of the problems that our western culture is facing is due in large part to a loss of our connection with our history. Without a grounding in the classics, without the ability to utilize logic and reason, judging one's own life against what one sees on YouTube will not end well. Sadly, so many seek solace in a bottle or a pill when their life doesn't measure up to what they see on the screen. Tragically, many willingly choose to end their life for similar, trivial reasons. As long as one draws breath, there's always a chance of turning things around for the better. Nothing is ever truly hopeless.

I share this tragic fact with my forensic science students: all of the people whom I have known, and who willingly chose to end their lives, have been employed in the forensic sciences. Six of them. That's six too many. I share it with them ahead of informing them of the many ways that they can mitigate the vicarious trauma associated with working in this world - ways that don't include a nightly bottle of Gin.

The totality of my life informs my work in the forensic sciences. My knowledge and absorption of stoicism guides my work, reporting style, and testimonial delivery. It also helps me deal with the vile filth and foul of the criminal world. It's not about me, it's about the case, the evidence, and the facts. The case is not about me, and I do not make it so. I do not personalize the case. I do not absorb it - "I worked the ... case." I am a practitioner assisting the Trier of Fact, nothing more. It's about the results, grounded in science and the scientific method. I think others in the sciences would benefit from this approach.

All this being said, I believe Dr. Ferraiolo's Meditations on Self-Discipline and Failure: Stoic Exercise for Mental Fitness, to be a worthwhile read. Here's a quote that fits nicely within this discussion, as well as serving as commentary on recent events.

Do not become overly enamored with yourself, your abilities, or your paltry status. You are, in the grand scheme of things, a trifling, ephemeral phenomenon of little consequence. You are slightly smarter than an ape or a dolphin. If there is a Creator who has endowed you with any special status, recognize that this is a gift and not an accomplishment in which you may rightfully take pride. No one earns birth as a member of the reasoning species, or any privileges pertaining thereto. If the matter is entirely propitious, you have still less warrant for a swollen ego. Note your good fortune, but do not claim to be intrinsically good, due to a chance concatenation of molecules. Set about the business of trying to understand your place in this vast cosmos, your duties as a human being, and a method and practice leading to enlightenment—or the closest approximation you can manage. (p. 23)

On science as science, not consensus or mob rule:

Do not be swayed by the mere opinion of the masses or the majority. The truth is not determined by plebiscite. (p. 46)

On earning respect:

Do not pretend to respect other persons either more or less than you actually do respect them. You owe no one a pretense of deference, and you owe everyone the deference that they have, by your own lights, earned. You should have nothing to do with sham collegiality or faux civility. Some persons are worthy of your contempt, and their behavior, as well as other outward indications of their character, is sufficient grounds for reasonable (though not perfectly reliable) assessment of their merit. If anyone demands that you “try to get along” with any person that you do not respect, then you have grounds for reconsidering your relations with the former individual (the one issuing the demand). Do not allow yourself to be pressed, bullied, or cajoled into relations that strike you as unhealthy or pointless. (p. 9)

The book is simultaneously easily digested and incredibly disturbing. If one's goal is self-improvement, the improvement of the self will always be a painful slog. No one likes to examine one's own shortcomings and failures. But it is a very necessary pursuit. You'll end up the better for it. This book can serve as a guide to get you started down that vital path of making one's life worth living.

Every scientist should be a stoic. I believe stoicism to be an essential characteristic and a necessary defense against error and falsehood. Perhaps you don't agree. Perhaps you don't understand what I mean. If you'd like to know more, start with this book. You'll be glad that you did.


Wednesday, July 4, 2018

How would you know?

Like many in law enforcement, I have degrees in Organizational Leadership. This is a solid degree choice for anyone aspiring to leadership in their organization, public or private. The difference between a "management" degree, like an MBA, and a "leadership" degree like mine (BOL / MOL) is quite simple actually. Managers correct things that have gone wrong. Leaders help things go right in the first place. I happen to have received my degrees (BOL and MOL) from a 130+ year old brick-and-mortar business school. Earning a business degree from a long-established business school leaves you with an incredible foundation in business principles. So what? What does that have to do with Forensic Multimedia Analysis?

Here's the "so what" answer. Let's examine the business of DVR manufacturing from the standpoint of determining the DVR's purpose and if it fulfills its purpose. Attempting to identify purpose / fit for purpose of the parts in the recording chain is one of the elements of the Content Triage step in the processing workflow. Why did the device produce a recording of five white pixels in the area where you were expecting to see a license plate? Understanding purpose helps answer these "why" questions.

What is the purpose of a generic Chinese 4 channel DVR? The answer is not what you think.

For our test, we'll examine a generic Chinese 4 channel DVR, the kind found at any convenience store around the US. It captured a video of a crime and now you want to use it's footage to answer questions about the events of that day. Can you trust it?

Take a DVR sold on Amazon or any big box retailer. There's the retail price, and there's the mark-up along the way to the retailer.


When you drill down through the distribution chain to the manufacturer, you find out something quite amazing, like this from Alibaba.com.


The average wholesale price of a 4 channel DVR made in China is $30 / unit. Units with more camera channels aren't much more. Units without megapixel recording capability are a bit less. This price is offered with the manufacturer's profit built in. Given that the wholesale price includes a minimum of 100% markup from cost, and that there is a labor and fixed costs involved, the average Chinese DVR is simply a $7 box of parts. The composition of that box of parts is entirely dependent upon what's in the supply chain on the day the manufacturing order was placed. That day's run may feature encoding chips from multiple manufacturers, as an example. The manufacturer does not know which unit has chips from a particular manufacture - and doesn't care as long as it "works."

What's the purpose of this DVR? The purpose has nothing to do with recording your event. The purpose is to make about $15 in profit for the manufacturer whilst spending about $15 on parts, labor, and overhead. Check again for 4 channel DVRs on Alibaba.com. There's more than 2500 different manufacturers in China offering a variety of specs within this space ... all making money with their $7 box of parts.

Let's say the $7 of parts at your crime scene recorded your event at 4CIF. You are asked to make some determination that involves time. You'll want to know if you can trust your $7 box of parts to accurately record time. How would you know?

One of the more popular DVR brands out west is Samsung. But, Samsung doesn't exist as such anymore. Samsung Techwin (Samsung's CCTV business unit) was sold to Hanwha Group a few years ago and is now sold as Hanwha Techwin (Samsung Techwin) in the US. Where does Hanwha get their $7's worth of parts within the supply chain? China, for the most part. China can make DVR parts a lot cheaper than their Korean counterparts.

Here's the specs from a Hanwha Techwin HRD-440.


This model, recording at 4CIF, for example, can record UP TO 120fps across all of it's channels. UP TO means it's max potential recording rate. It does not mean it's ACTUAL recording rate at the time of the event in question. The "up to" language is placed there to protect the manufacturer of this $7 box of parts against performance claims. If it was a Swiss chronometer, it wouldn't need the disclaiming language. But, it's not a Swiss chronometer - it's a $7 box of parts.

What does the recording performance of the channel in question in the specific evidentiary DVR look like when it alone is under load (maximum potential recording rate)? What about the recording performance of the channel in question (at max) when the other channels move in and out of their own maximum potential recording rate? What happens within the system when all channels are at the max? Remember also that systems like these allow for non-event recording to happen at lower resolutions than event recording (alarm / motion). How does the system respond when a channel or all channels are switching resolutions up / down? How does what's happening internally compare with the files that are output to .avi or .sec files? How do these compare to data that's retrieved and processed via direct acquisition of the hard drive?

How would you know? You would build a performance model. That's something that you learn in all the stats / quant classes that you take along the way to earning a PhD. I earned my PhD in Education.

Why a PhD in Education, you might ask. Three reasons. There are no PhDs in Forensic Multimedia Analysis for one. The second reason, and the subject of my dissertation, deals with the environment on campus and in the classroom that causes such a great number of otherwise well qualified people to arrive on campus and suddenly and voluntarily quit (withdraw). The results of my research can be applied to help colleges configure their classes and their curriculum, as well as to train professors to accommodate a diverse range of students - including mature adults with a wealth of knowledge who arrive in class with fully formed and sincerely held opinions. The third reason has to do with a charity that I founded a few years ago to help bring STEM educational help to an underserved community and population of learners in the mountain communities of northern Los Angeles / southern Kern counties in California.

Imagine that you've been told by your chain of command that you must have certain level of education to promote at your agency. That's what happened to me. I was minding my own business with a AS in Political Science that I cobbled together after my college football career, such as it was, crashed and burned after injury. I later found myself in police service when these new rules were instituted. But, thankfully, our local Sheriff had approached the local schools promising butts in seats if they'd only reduce their tuition. So I finished my Bachelors degree at an esteemed B-school for $7k and stayed there for a MOL for only $9k. The PhD path wasn't cheap, but it was significantly cheaper than it would have been without the Sheriff's office's help. As to why I chose to go all the way to PhD, that was the level of education necessary to make more pensionable money had I decided to switch from being a technician making more than half-again my salary in overtime (which isn't pensionable, sadly) to management. But, I digress. Back to work, Jim.

Sparing you the lecture on time and temporality here, the basic tenet of experimental science is that you can only measure "now." If you want to know what happened / will happen, you need to build a model. Meteorologists build a model of future environmental patterns to forecast the weather for next week. They don't measure next week's weather properties today. The same hold true across the sciences. Moneyball was a Quant's attempt to model behavior in order to achieve a future advantage in sports.

When modeling performance, it's important to use valid tools and to control for all variables (as best as possible). At a minimum, it's important to know how your tools are working and how to not only interpret the results produced but to spot issues of concern within the results.

As an example, pretty much everyone in this space is familiar with FFMPEG and it's various parts. Let's say that you use the command line version to analyze the stream and container of the .avi file from our example DVR (it's all you have to work with). It's an NTSC DVR and the results from your analysis tool indicate a frames per second (fps) of 25. Is this correct? Would you necessarily expect 25fps from an NTSC DVR? Is this FFMPEG's default when there's no fps information in the file (it's a European tool after all)? Does total frames / time = 25fps? If yes, you're fine. If not, what do you do? You test.

Is your single evidentiary file (sample size = 1) sufficient to generalize the performance of your $7 box of parts? Of course not. In order to know how many samples are needed to generalize the results across the population of files from this specific DVR, you need to test - to build a performance model. How many unique tests will gain you the appropriate number of samples from which to build your model? Well, that depends on the question, the variables, and the analysts' tolerance for error ...