Super Resolution FAQ by PhotoAcute.com
I stumble upon a great FAQ about Super Resolution in a commercial product webpage called PhotoAcute. It has a good style of explanation and I’ve added some questions and answers to it by myself in order to have more information in this framework. Have a look…
Q: What is super-resolution?
A: Super-resolution is a technique to enhance the resolution of an imaging system. In this FAQ we will refer to the particular type of super-resolution which can improve resolution of digital imaging systems beyond their sensor and optics limits.
Q: So, is it real?
A: It looks like a science fiction, but there are solid physical concepts behind the process. To be sure, there are limits to what you can achieve with super-resolution processing, which depends on numerous factors (see ““What levels of increased resolution are realistic?” for an in-depth discussion on limits).
Q: Why does it work?
A: Super-resolution (SR) are techniques that enhance the resolution of an imaging system. Some SR techniques break the diffraction-limit of systems, while other SR techniques improve over the resolution of digital imaging sensor.
There are both single-frame and multiple-frame variants of SR. Multiple-frame SR use the sub-pixel shifts between multiple low resolution images of the same scene. They create an improved resolution image fusing information from all low resolution images, and the created higher resolution images are better descriptions of the scene. Single frame SR methods attempt to magnify the image without introducing blur. These methods use other parts of the low resolution images, or other unrelated images, to guess what the high resolution image should look like. Algorithms can also be divided by their domain: frequency or space domain. Originally super-resolution methods worked well only on grayscale images, but researchers have found methods to adapt them to color camera images.[wikipedia ref] Recently also the use of super-resolution for 3D data has been shown [wikipedia ref]
There are two key components in every digital imaging system: the sensor and the lens. There are two different types of image degradation introduced by these two components individually:
- Optical blur.
- Limit on the highest spatial frequency the given sensor can record.
Optical blur is simply a reduction in amplitude of high spatial-frequency components of the image. It should have been possible to reconstruct a perfect, high-resolution image after optical blur by applying an inverse sharpening. Unfortunately, this is followed by degradations cause by the sensor and simple sharpening is not going to work. The key to super-resolution is the presence of so-called aliased components in the sensor output. These are present due to the fact that the sensor is constructed from a finite number of discrete pixels. These are higher spatial-frequency components than the sensor can handle that should not normally be present in the sensor output. Fortunately, due to imperfect anti-aliasing filters in the imaging system (or the complete lack of them) and due to lower than 100% fill-factor (the percentage of the area that is sensitive to light in each sensor pixel) the aliased components remain in the image. Even the best anti-aliasing filter can only lower these components by some amount but cannot eliminate them completely. Aliased components are typically unwanted in the normal image since they might manifest themselves in a form of moire effect or other unwanted artefacts. Another, photography-specific reason why super-resolution works is that real sensors are composed of Color Filter Arrays (CFAs). CFA can record only a single color at each pixel location. This lowers the upper spatial frequency that can be recorded by the sensor even more. But having multiple, slightly shifted images makes it possible to reconstruct full color at each pixel site.
Q: What is Aliasing?
A: When a digital image is viewed, a reconstruction—also known as an interpolation—is performed by a display or printer device, and by the eyes and the brain. If the resolution is too low, the reconstructed image will differ from the original image, and an alias is seen. An example of spatial aliasing is the Moiré pattern one can observe in a poorly pixelized image of a brick wall. Techniques that avoid such poor pixelizations are called anti-aliasing. Aliasing can be caused either by the sampling stage or the reconstruction stage; these may be distinguished by calling sampling aliasing prealiasing and reconstruction aliasing postaliasing.[Wikipedia ref]
Temporal aliasing is a major concern in the sampling of video and audio signals. Music, for instance, may contain high-frequency components that are inaudible to humans. If a piece of music is sampled at 32000 samples per second (sps), any frequency components above 16000 Hz (the Nyquist frequency) will cause aliasing when the music is reproduced by a digital to analog converter (DAC). To prevent that, it is customary to remove components above the Nyquist frequency (with an anti-aliasing filter) prior to sampling. But any realistic filter or DAC will also affect (attenuate) the components just below the Nyquist frequency. Therefore, it is also customary to choose a higher Nyquist frequency by sampling faster.
In video or cinematography, temporal aliasing results from the limited frame rate, and causes the wagon-wheel effect, whereby a spoked wheel appears to rotate too slowly or even backwards. Aliasing has changed its apparent frequency of rotation. A reversal of direction can be described as a negative frequency. Temporal aliasing frequencies in video and cinematography are determined by the frame rate of the camera, but the relative intensity of the aliased frequencies is determined by the shutter timing (exposure time) or the use of a temporal aliasing reduction filter during filming.[Wikipedia ref]
Like the video camera, most sampling schemes are periodic; that is they have a characteristic sampling frequency in time or in space. Digital cameras provide a certain number of samples (pixels) per degree or per radian, or samples per mm in the focal plane of the camera. Audio signals are sampled (digitized) with an analog-to-digital converter, which produces a constant number of samples per second. Some of the most dramatic and subtle examples of aliasing occur when the signal being sampled also has periodic content.
Anti-aliasing means removing signal components that have a higher frequency than is able to be properly resolved by the recording (or sampling) device. This removal is done before (re)sampling at a lower resolution. When sampling is performed without removing this part of the signal, it causes undesirable artifacts
Q: Aliasing components? Do they really exist?
A: This is a long one. Let us model an “ideal” camera – with ideal lens (no blur, no distortions) and a sensor completely covered by an array of pixels. Every pixel registers a signal proportional to the amount of light it received.
How would such camera image a target of black-and-white lines, if the width of a line were exactly the same as the dimension of a pixel. The image will de quite different in case all the lines fall exactly to the pixels and in case the lines fall between the pixels:
Luckily, real scenes usually do not have exactly the same structure as the sensor has. To make our model more realistic, we will tilt the lines – so if in some part of the picture the edges of the lines match the edges of the pixels in the sensor, they will not match in other parts. This is how the tilted lines will be imaged by our ideal camera:
The contrast between black and white lines differs from 100% of the original contrast to none. Looks strange already, doesn’t it?
What happens if we try to image line pairs of higher frequency? See the pictures below: the lines are visible, but they have different directions, and, moreover, thicker width – that is, lower frequency than in the original!
This is caused by so-called aliasing. The sensor, which is not able to image a pattern of frequency higher than 0.5 cycles/pixel, delivers not only lower contrast but completely wrong pictures. If the scene being imaged has a regular pattern, the artifacts are known as Moiré pattern.
Digital cameras usually have anti-aliasing filters in front of the sensors. Such filters prevent the appearance of aliasing artifacts, simply blurring high-frequency patterns. With the ideal anti-aliasing filter, the patterns shown above would have been imaged as a completely uniform grey field. Fortunately for us, no ideal anti-aliasing filter exists and in a real camera the aliased components are just attenuated to some degree.
Q: How does it work
A: The first step is to accurately align individual low-resolution images with sub-pixel precision.
After the images are aligned, a number of techniques are possible, both iterative and non-iterative, complex or simple, slow or fast. What is common in all of the techniques is that information encapsulated in the aliased components is used to recover spatial frequencies beyond sensor resolution and a de-blurring is used to reverse degradation caused by the optical system.
Of course, the real reconstruction process is much more complex due to the presence of at least the following phenomena:
- Sensor noise. The noise itself degrades the image quality, but most importantly it reduces the ability to recover and separate aliased components that are low in amplitude and typically buried under noise floor.
- Uncertainty in real registration offsets of individual images. Since the precise camera position and orientation in space is not known during super-resolution processing, it has to be estimated from the low resolution scenes themselves, which introduces errors.
- � Diffraction limit. It is said that the optical system has fundamental limits on resolution where two close subjects cannot be resolved one from another. There are methods that allow breaking this limit as well under certain assumptions (see Wikipedia).
Q: What levels of increased resolution are realistic?
A: It is highly variable depending on the optical system exposure conditions and what post-processing is applied. As a rule of thumb, you can expect and increase of 2x effective resolution from a real-life average system (see MTF measurements) using our methods. We’ve seen up to a 4x increases in some cases. You can get even higher results under controlled laboratory conditions, but that’s only of theoretical interest.
Q: What kind of source material is suitable for super-resolution processing?
A: Here are some rules:
- The less post-processing, the better. Avoid sharpening, for example.
- Heavy compression is particularly bad. It will destroy all the aliasing components.
- Heavy compressed video that relies on inter-frame prediction is also very bad. The only positive outcome from applying super-resolution to heavily compressed materal is that it will decrease artifacts from compression itself, and might suppress noise, but don’t hope for a real increase in resolution.
- 16 bit images are better than 8bit because they preserve low-amplitude effects.
- RAW images are good for super-resolution, not only because they lack any post-processing but also because they give information about the exact layout of the Color Filter Array.
- Blur in the images is no good.
- Sensor noise is also no good, but surprisingly it has significantly less affect than blur. So, if you have to choose: blur from an unsteady hand or noise from high ISO – choose noise.
- The optimal count of images to achieve reasonable super-resolution is 8. Do not expect any improvements with more than 16 input images.
Q: What kinds of resolution processing are available today?
A: There are two major classes of super-resolution:
- reconstruction-based super-resolution
- recognition-based super-resolution
Recognition-based super-resolution is trying to detect or identify certain pre-configured patterns in the low resolution data. It has a limited application area (e.g. forensic face-detection).
It can be dependent or independent of a particular imaging system. The image-system-dependent method has the advantage of taking into account all the characteristics of a particular system and thus producing better results.
Super-resolution methods can also be divided by source/output type:
- Single-image – in this case we’re talking about deblurring, and there is no real resolution increase.
- Multiple still images in, single image out – used in photography
Video-sequence super-resolution – a wide variety of methods were recently brought into existence due to the growing popularity of HDTV. Most of them are not based on real super-resolution methods and are as simple as edge enhancement.
For a comparison of various methods, please refer to Superresolution comparison paper.
Q: Any suggestions for more scientific reading?
A: There are lots of good papers available on the internet; here are just two of them to start:
One of the first papers on super-resolution which seemed to inspire some of the modern methods:
Michal Irani and Shmuel Peleg, “Super Resolution From Image Sequences“, ICPR, 2:115–120, June 1990.
A paper from Microsoft Research that attempts to estimate the practical limits of super-resolution. The scope of this paper is limited to a particular subclass of linear-only, reconstruction-based super-resolution algorithms. In any case, the obtained bounds do correlate well with the practical results (top limit is ~5x under ideal conditions, ~2x in real life).
Zhouchen Lin and Heung-Yeung Shum, “Fundamental Limits of Reconstruction-Based Superresolution Algorithms under Local Translation“
Q: What are the specific strengths and weaknesses of the super-resolution method used in PhotoAcute products?
A: The main properties are:
- Excellent performance under noisy conditions (see example)
- Insensitivity to mismatches in radiometric alignment (input images with differences in exposure are acceptable)
- Very fast (see superresolution algorithms comparison paper)
Another property that can be considered a weakness in applications where the imaging system is unknown is that to obtain optimal performance the algorithm is tuned for a particular imaging system (individual profiles are used for each sensor/lens combination). See this example though.