Electronics: Real-time signal conditioning (1995 – 2000)
Real-time signal conditioning (1995 – 2000)
Around the millennium, TNO-FEL worked on real-time contrast enhancement using existing and new algorithms. The designs are used in both military applications and the field of Safety and Security. An improved version of the contrast-enhancement technique described in [1] has been implemented in the Real-Time Video Processor (RTVP). This system also implements a grey-scale stretching of the image within an area of interest. The RTVP was successfully demonstrated in several operational situations, which are, among others, in the area of security, surveillance and main battle tanks. An implementation of the grey-scale stretching algorithm has also been realised as a single ASIC (VISTA01) in a 0.7 mm CMOS technology. The (MOVE) microprocessor architecture is applied to combine the advantages of dedicated hardware with the flexibility of programmable processing into a single ASIC.
Grey-scale stretching is not very effective when the video signal already has high intrinsic contrast. For instance, when scanning a dark forest area, the air above can be relatively bright. Thus the amplitude of the video signal already spans the whole range from “black” to “white”. New, more advanced contrast-enhancement algorithms that perform stretching on a local basis have been developed for these situations [2]. A prototype board with this advanced contrast enhancement algorithm called the LACE (Local Adaptive Contrast Enhancement) processor, will be available by the first quarter of 2000.
The Local Adaptive Contrast Enhancement (LACE) processor is intended for real-time video contrast enhancement. The main application areas are surveillance (security), reconnaissance and other applications where images are recorded without control of the lighting conditions. It is not intended for public television applications. The objective of ‘contrast improvement’ is -in this context- to create an ‘improved’ image Y from an original image X, in such a way that a human observer is better able to recognise details in the image. Note that this is different from trying to create a more ‘natural-looking’ or ‘pleasurable’ image. The image is assumed to have originated from a natural scene (thus recorded with a camera without control of the lighting conditions), rather than being a synthetic or studio-recorded image. The images can either be visible light or infrared images.
The main reason why original images may contain insufficient contrast is because of a mismatch between the dynamic range of the ‘real world’ image and the dynamic range of the observer (and/or the used display). The dynamic range of the image can be either too large or too small for the observer, for example:
During fog or heavy rain in the daytime, everything looks grey. The human eye can not distinguish the subtle luminance differences on distant objects, even though the average luminance level of the scene is sufficient for accurate observation. This is a situation in which the dynamic range of the image needs to be expanded to match the human eye. The lowest luminance levels in the image should be made more ‘dark’, and the highest levels more ‘bright’.
On a sunny day, the luminance value of bright objects may be too high for the human eye. A display device will anyhow not be able to reproduce these high intensities. This is a situation in which the dynamic range of the image needs to be compressed to match the eye (or the display device).
The matching of the dynamic range can be done by tuning the gain and offset (contrast and brightness) of the camera. In the first case, a camera with sufficient gain should be used. In the second case, the gain of the camera should be turned down (maybe by also fitting an optical filter). If the camera (including its digitiser) has sufficient dynamic range at its digital output, this tuning of the gain and offset can also be done in the digital domain. This is called ‘global contrast stretching’ or ‘histogram stretching’ and is implemented in the RTVP and the VISTA01 chip.
Although tuning the gain and offset may be sufficient in many situations, it will give unsatisfactory results in cases, like the following:
During fog at dusk or dawn, a bright light is illuminating a small part of the scene of interest. When tuning the gain and offset to this bright part, the rest of the scene will turn black. Conversely, when tuning to the dark part of the scene, the bright part will clip at the maximum white level of the display device. A similar situation exists for a fire with a lot of smoke.
On a sunny day, some objects of interest are in the bright sunlight, others are ‘hidden’ in small patches of shade. When tuning the gain and offset, the same problem as with the fog exists.
The problem in these two examples is the same: To see the bright and the dark parts of the scene at the same time you need a relatively low gain value, so you can’t see the details in the bright and the dark parts. To see the contrast in the bright/dark parts of the scene, you need more gain, but you have to shift the average luminance level down/up, which ‘blacks out’/clips the darker/lighter parts of the scene. Note that manual intervention is necessary to decide which part of the picture is of interest (the whole, the dark or the bright part).
The problems above explain the need for contrast improvement algorithms that are more advanced than simply tuning the gain and offset (contrast and brightness). One such algorithm is implemented in the LACE (Local Adaptive Contrast Enhancement) processor. With LACE, gain and offset values are calculated for each pixel individually. These values are based on the statistics of the surrounding pixels, where pixels nearby have more influence than pixels further away. Doing this in real-time for images of 720*576 at 25 Hz (or 720*288 at 50 Hz) requires an enormous amount of processing. Therefore, TNO-FEL is implementing LACE in hardware. A prototype was produced in the first quarter of 2000.
Some results of the LACE processing on extremely poor images are shown below. The first four images were captured with an ordinary CCD camera and an eight-bit frame grabber. Because 8-bit is not enough for these kinds of images (the LACE hardware uses 12-bit), some of the results show quantisation effects. To demonstrate the effect on 12-bit images, a synthetic fifth image is added. Note that this is a single image showing four times the same boat under different lighting conditions. Also, note that only the first image could benefit from (global) contrast stretching. All other images contain “black” as well as “white” areas and local adaptive processing is the only way to realise considerable contrast improvement over the whole image. All images were processed without any user intervention (thus without adapting parameters for a particular image and without defining areas of interest).
The Full Block Matching Algorithm (FBMA) is a brute-force algorithm for the determination of motion vectors for blocks of pixels. It is used in the Block Motion Estimator (BME) developed by TNO-FEL. The basic operation is the following:
An image X is divided into a rectangular grid of blocks of -for example- 8*8 pixels. For each of these blocks, a matching block of pixels is searched for in the next image (Y), resulting in a motion vector for each block.
The motion vector for a particular block in image X is determined by first comparing all pixels of that block to all pixels in the same position in image Y, and calculating an error measure indicating the “amount of difference”. This can be the RMS (Root Mean Square) of the difference, or the MAD (Mean Absolute Difference). This error measure corresponds to the 0,0 motion vector (no displacement). Next, the same block of pixels from X is compared with the pixels in Y shifted by one pixel to the right. These results in an error measure for the 0,1 motion vector (a horizontal displacement of one pixel). This process is repeated for all vectors in the search range. The search range can be, for example, from -8 to +7 pixels horizontal and the same vertical, giving a total of 16*16=256 vectors. From all these vectors, the one with the lowest error measure is selected as the best-matching motion vector. Refer to the figure below.
The figure shows a block of pixels in images X and Y and a 3D plot of the error measure as a function of the different vectors “vn” and “vm” in the search range. The 3D plot corresponds to the case where a 0,0 vector corresponds to the best match (smallest error). The error measure shown here is the SAD (Summed Absolute Difference).
The process for one block, as just described, is repeated for all blocks in the image. This requires a lot of computation, so real-time operation requires the use of dedicated hardware, such as the BME.
The Block Motion Estimator (BME) performs the Full Block Matching Algorithm (FBMA) on real-time video input. It is intended as a first processing step in motion detection systems for surveillance systems and other systems for the detection of moving objects. It can also be used for other image-processing applications, such as motion-compensated temporal noise reduction.
In addition to calculating the motion vectors for blocks of pixels, the BME also calculates a so-called “confidence level” for each block. The confidence level indicates how much the matching error of the motion vector differs from the average matching error of all vectors for a particular block. Thus, a high confidence level indicates that the selected motion vector yields a much better match than the other vectors in the search range. Note that this measure is more useful for discriminating true motion from other changes in image brightness than a simple “goodness of fit” such as the matching error (MAD or RMS, refer to the FBMA algorithm) itself. The output from the BME is a stream of block motion vectors and their associated confidence levels. This data stream can be fed to a PC or workstation for further analysis. The hardware implementation of the BME was demonstrated in the first quarter of 1999.
The Global Motion Estimator (GME) determines the global displacement of an image concerning the preceding image. It operates on real-time video input coming from a non-stationary camera. The GME is applicable in combination with the Block Motion Estimator (BME) and for motion compensation. In the first case, it can inform the BME of the global displacement of the image, allowing the BME to determine the block motion vectors relative to the global vector. In the second case, the global vector from the GME is used to eliminate or smooth out the unwanted or irregular image displacements from the non-stationary camera (due to, for example, the vibration of the camera).
The GME uses an algorithm similar to the Full Block Matching Algorithm (FBMA) with only one block: the whole image. The amount of processing needed for this has been greatly reduced by many modifications, without affecting the quality of the result (in many cases the result is even improved). In addition to calculating the global motion vector, the GME also calculates a confidence level, similar to the confidence levels of the BME.
The output from the GME is a single vector plus confidence level per frame.
The hardware implementation of the GME was demonstrated in mid-1999.
In 1995, TNO-FEL started a study on the non-uniformity correction of thermal imagers; in particular, imagers based on two-dimension focal-plane-array detectors. The studied method is called Scene-based thermal referencing (SBTR) since it uses scene information and does not need calibration devices such as uniform reference plates. An algorithm was developed that gives excellent correction of the non-uniformity of two in-house thermal imagers for the 3 to 5 µm wavelength band. Based on these results the algorithm has been transferred to dedicated hardware based on a PCB with FPGAs. This board was successfully presented in the first quarter of 1999 operating in two different camera demonstrators. One camera system consists of a PtSi focal plane array; the other demonstrator is based on CMT.
The principle of Scene-based thermal referencing (SBTR) is the compensation of the fixed-pattern noise by comparing many frames that have different positions relative to the scene. By comparing consecutive images, the non-uniformity and the scene can be distinguished. These various frame positions can be caused by a deliberate motion of the camera or by accidental pointing instability, e.g. in the case of a handheld camera.
Video contrast-enhancement ASIC
Modern black and white CCD cameras at the end of the 90s may distinguish up to 4000 grey levels. When looking at a monitor, the human eye is limited to about 80 grey levels only. Images appear grey and show poor contrast in bad weather conditions like rain, fog or when the image suffers from the backlight. A real-time contrast-enhancement ASIC has been developed by TNO-FEL to overcome the dynamic-range mismatch between the CCD camera and the human eye. The real-time contrast-enhancement ASIC called “VISTA01” enhances the contrast of a video frame. The enhancement can be applied globally or defined by a user-definable area of interest (AOI). The contrast is evaluated in a grey value histogram of each video frame. An application-specific MOVE microprocessor reads the contrast from the histogram and calculates some parameters for the high-speed video path. The implemented algorithm can be tuned in software (using an RS232 data link) to optimise performance in special situations or when specific cameras are used.
Features of the video contrast-enhancement ASIC were:
- PAL/NTSC video input/output
- 12-bit video path
- 16-bit MOVE application-specific microprocessor
- Histogram evaluation
- Tunable contrast-enhancement algorithms
- User-definable Area Of Interest (AOI)
- RS232 compatible communication port
The functionality of the contrast-enhancement algorithm was completely embedded into a single ASIC. Some external hardware is necessary for video signal restoration. In the signal path, A/D and D/A converters have been used for conversion to the digital domain and back. The ASIC consists of five functional blocks: a video synchronisation separator to generate the time base, an AOI timing generator, a histogram builder, a 12-bit video path and a MOVE microprocessor.
References
[1] “Automatic, adaptive, brightness independent contrast enhancement”, F.P.P. de Vries, Signal Processing 21 (1990) 169-182, Elsevier
[2] “Multi-Scale Adaptive Gain Control of IR Images”, K. Schutte, Infrared Technology and Applications XXIII, SPIE AeroSense, April 1997, Vol. 3061