Vendor reps note that Super Resolution works at the "sub-pixel" level, and people's eyes roll. If the pixel is the smallest unit of measure, a single picture element, how can there be a "sub-pixel?" That's a very good question. Let's take a look at the answer.
From the report in Amped SRL's Five: The Super Resolution filter applies a sub-pixel registration to all the frames of a video, then merges the motion corrected frames together, along with a deblurring filtering. If a Selection is set, then the selected area will be optimized.
Ok. What is sub-pixel registration?
First, let's look at how the authors of Super-Resolution Without Explicit Subpixel Motion Estimation set up the premise: "The coefficients of this series are estimated by solving a local weighted least-squares problem, where the weights are a function of the 3-D space-time orientation in the neighborhood. As this framework is fundamentally based upon the comparison of neighboring pixels in both space and time, it implicitly contains information about the local motion of the pixels across time, therefore rendering unnecessary an explicit computation of motions of modest size. The proposed approach not only significantly widens the applicability of super-resolution methods to a broad variety of video sequences containing complex motions, but also yields improved overall performance." That's quite a mouthful.
Here's the breakdown.
The first thing we must understand is the pixel neighborhood. The neighbourhood of a pixel is the collection of pixels which surround it. The connected pixels are neighbors to every pixel that touches one of their corners.
Next, we must understand what registration means. Image registration is the process of aligning two or more images of the same scene. This process involves designating one image as the reference (also called the reference image or the fixed image), and applying geometric transformations to the other images so that they align with the reference.
Let's put it together. A static pixel (P) in a single image is easy to understand. But, what about video? That pixel represents some place in 4D space-time. That 4D space-time orientation will change as time progresses. We want to line-up (register) that pixel across the multiple frames. Super Resolution thus tracks implicit information about the motion of the pixel across 4D space-time, and corrects for that motion. The result of the process is a single higher-resolution image.
The practical implications are this:
- Frame Averaging works well when the object of interest doesn't move. The frames are averaged and the things that are different across frames are removed and the things that are the same remain.
- To help with a Frame Averaging exercise, we can use a perspective registration process to align the item of interest - a license plate for example - across frames. This works well when the item has moved to an entirely new location, like in low frame rate video.
But, when the motion is subtle, super resolution is a better choice.
Here's an example. The park service was investigating a vandalism and poaching incident. There's a video that they believe was taken in the area of the incident. Within the video, there's a sign in the background that contains location information (text) that's blurred by the motion of the shaking, hand-held camera. There's enough motion to eliminate Frame Average as a processing choice. There's not enough motion to use a perspective registration function to align the sign correctly. Super resolution is the best choice to correct for the motion blur and register the pixels that make the text of the sign.
In this case, super resolution was indeed the best choice. The sign's information was revealed and the location was determined.
And now the potential pitfalls ...
- Brand new pixels and pixel neighborhoods are created in this process.
- A brand new piece of evidence (demonstrative) is created in this process.
Whenever you perform a perspective registration, your geometric transform necessarily creates new pixels and neighborhoods. In FIVE, during the process of using the filters, the creation is "virtual" in that it all happens in CPU and RAM. These new pixels and neighborhoods are really only created when you write the results of your processing out as a new file.
That brand new piece of evidence that you've created - the results written out - is a demonstrative that you've just created. You must explain it's relationship to the actual evidence files and how it came to be. Indeed, you've just added a new file to the case. This fact should be disclosed in your report.
With the reports in FIVE, there is the plain English statement about the process that is lifted from the many academic papers from which Amped SRL gets their filters. Sure, when you're asked about the process performed, you can likely just read the report's description. But, what if the Trier of Fact wants to know more? How confident are you that you can explain super resolution?
Consider super resolution's main use - license plate enhancement. Your derivative file is a demonstrative in support on one side's theory of the case. Your derivative is illustrative of your opinion. Did you use the tool correctly? Are the results accurate? Is seeing believing? Given the ultra low resolutions we're usually dealing with, a slight shift in pixels can make a big difference in rendering alpha-numeric characters. This is part of the reason Amped SRL likes to use European license plates in their classes and PR - they're easy to fix. Not so in the US.
Advice like that shown above is the value of independence. A manufacturer's rep can really only show you the features. I'll show you not only how a tool works, but how to use it in different contexts, why it's sometimes inappropriate to use, and how to frame it's use during testimony. If you're interested in diving deep into the discipline of video forensics, I invite you to an upcoming course. See our offerings on our web site.
Have a great day, my friends.
No comments:
Post a Comment