Filtering Based ROI Coding Using Dynamic Range Compression and Updating Source Picture Filter

: The region of interest (ROI) coding, which compresses background region more strongly than foreground region, is valuable technology in restricted communication environments such as limited bandwidth and / or storage capacity. This paper proposes a ﬁltering scheme which reduces the amount of bits by ﬁltering background region in order to support e ﬃ cient ROI coding on ordinary devices without equipping ROI coding functionality. Speciﬁcally, dynamic range compression (DRC) ﬁlter and updating source picture (USP) ﬁlter are added to a low-pass ﬁlter, which has been used for conventional ﬁltering scheme. The DRC ﬁlter compresses dynamic range of all pixels by down-scaling. The USP ﬁlter measures distance between current and previous picture by block basis, and then overwrites the block of the current picture when the distance is smaller than predetermined threshold. Experimental results show that the proposed ﬁltering scheme reduces bit-rate by half compared to the conventional low-pass ﬁlter in the condition with the PSNR-Y of background region kept to 27 dB. Compared to uniform quality coding which does not incur the ﬁlters, there is the case that the proposed ﬁltering scheme reduces bit-rate by 58 %. These results promise to cut down on cost of communication and storage.


Introduction
Recently video surveillance systems become popular rapidly amid growing concern about security. Development of image sensors is also active and omnidirectional network cameras have been released. As the number of network cameras increases, the cost for transmitting and recording data often becomes a problem because the amount of video data is larger than that of audio data generally. When communication environments is restricted such as limited bandwidth and/or storage capacity is not sufficient, we have to take measures such as shortening recording time, decreasing frame rate and/or narrowing view angle and decreasing resolution. However these are not ideal solution because these measures impair security performance. To address this problem, region of interest (ROI) coding have attracted attention [1] [2].
The ROI coding is technology to ensure picture quality in important region on limited bandwidth by allocating more bit amount to ROI than background region [3]. This idea has developed to improve subjective picture quality mainly for video telephones and video conferences. New researches such as combining with saliency and combining with new video coding standards are active in recent years [4]∼ [10].
In case of a part of video surveillance system, it is sufficient if a security officer can identify who are in field angle. For instance, face regions must be sent finely to identify the individual adequately, while in other regions rough picture quality which is enough to understand monitoring location is allowed. In this situation, ROI coding can reduce the amount of bits in background region, and then bit-rate reduction is realized [1]. Recently computer vision technique such as human / face detection has been developed to a practical level [11]∼ [13], and commercialization is also in progress [14]. We expect that combination of computer vision technique and ROI coding resolves cost problem of video surveillance systems.
The simplest method for ROI coding is quantization parameter (QP) control scheme, which uses built-in function for quality control [5]∼ [7] [15]∼ [18]. For example, the H.264 / MPEG4 AVC [19], which is mainstream coding standard in video surveillance systems, has syntax to control quality of a macro block (MB). By using this syntax and assigning lower QP meaning narrow quantization step to ROI than background region, ROI coding is realized. This scheme has the advantage that coding efficiency of background region is high, while there is restriction that we have to select video coding standard and implementation which have built-in function to control quality of a MB [20]. In video surveillance systems, hardware accelerator is often used for video encoding because video encoding is commonly known as heavy process. However ordinary encoders may not support the built-in function, and therefore this scheme is not feasible. In addition, this scheme has also a disadvantage that range of trade-off between picture quality and bit-rate is limited within a used video coding standard. In case of the H.264 / MPEG4 AVC, range of the quality parameter is defined as value from 0 to 51. There are other researches for ROI coding by utilizing builtin functions of codecs or by alteration of the codec, such as controlling the number of candidate coding modes [21], suppressing high frequency components [2] and making use of the ROI scalability [9] [22] of the scalable video coding extension of the standards[23] [24], but these technologies also have the same restrictions as above.
Another method for ROI coding is filtering scheme which separates background region of a picture by image processing to reduce the amount of bits before encoding [1] [4] [15] [20] [25]. For instance, a low-pass filter such as the Gaussian filter can reduce the amount of bits in background region. In general, a video encoder compresses the amount of bits by removing spatial and temporal redundancy, and therefore the amount of bits is expected to be reduced by the low-pass filter in regions where redundancy is increased, in other words energy of the AC component is reduced. As a similar research, although there is a drawback that post-processing after decoding is essential, it is studied that transmitting only the ROI and reconstructing the background region from the already transmitted picture in the post-processing [26]. An advantage of filtering scheme is that it can also be used in the situation where using video coding implementation does not support ROI coding, because filters are independent from the codec. It also eliminates the need for a costly adaption to new coding standards and implementations. Moreover this scheme is also an advantage that range of trade-off between picture quality and bit-rate is not limited. This paper considers these features to be important, and discusses the filtering system in the following.
Filtering scheme has the advantages as described above, while coding efficiency is low compared to the QP control scheme. This is because discussion in terms of filter design to reduce the amount of bits of background region efficiently for state-of-the-art standards was not seriously done since the H.264 / MPEG4 AVC supports syntax for quality control of a MB. For instance, in case that the low-pass filter is used as a prefilter before encoding, high-frequency components are suppressed. This is workable to reduce the amount of bits. As for the QP control scheme, the amount of bits of background region is reduced by controlling degree of entropy reduction in quantization process and possibility that skip mode is selected in prediction process, because both of quantization step and basis of coding mode selection also depend on the QP. Specifically, ratio of skip mode MBs increases in background region. However, in a filtering scheme, it is assumed that identical QP is assigned to whole regions, and therefore it is impossible to obtain the effect of reducing the amount of bits by above two controls. Especially absence of controlling skip mode selection basis has a profound effect because skip mode contributes coding efficiency so much when there is little motion. Since measures for these differences are not sufficient, coding ef-ficiency of a filtering scheme in background region becomes lower than the QP control scheme. This paper proposes a new filtering scheme that consists of dynamic range compression (DRC) filter and updating source picture (USP) filter in addition to a conventional low-pass filter. The DRC filter compresses dynamic range of all pixels by down-scaling. The filter realizes the function which varies the amount of bits by changing dynamic range, which is similar to entropy, of the source signal in QP control scheme. Down-scaled pixels are restored to original scale at a postfilter after decoding so as to avoid matter that dynamic range is different depending on region when watching recorded video. The USP filter measures distance between current and previous picture by block basis, and then overwrites the block of the current picture to induce skip mode when that distance is smaller than predetermined threshold. This filter realizes the function by which skip mode is likely to be selected in background region. The proposed filtering scheme reduces the amount of bits of background region efficiently by combination of these filters. If filtering scheme can sufficiently reduce the amount of bits, it is expected to dramatically increase the applicable range of ROI coding, since filtering scheme can be introduced on ordinary devices without equipping ROI coding functionality.
This paper is organized as follow. Section 2 describes a proposed system for ROI coding using the proposed filtering scheme. Section 3 explains detail of proposed filters such as the DRC filter and the USP filter, which is main contribution of this paper. Section 4 presents performance of the proposed filtering scheme. Finally, Sec. 5 concludes this paper.

The proposed system for ROI coding
The proposed system filters background region in a source picture based on ROI information detected by object detection engine, and encodes the filtered picture. Figure 1 is overall image of the proposed system. Detail is shown in following paragraphs.
The object detection engine in Fig. 1 searches object regions from the input picture, and outputs rectangular coordinates of object regions. In this paper, face and human are detected as the object. Several object detection methods have been proposed as previously described. In this paper, a detection method using the HOG feature [11] is employed for human detection, and a detection method using the Haar-like feature [12] [13] is employed for face detection, The object detection engine outputs coordinates of rectangular areas including a human or a face by the pixel. In contrast, the encoder divides a source picture into MBs, and encodes the MBs. When there is a boundary between ROI and background region in a MB, energy of AC components is rather increased because the boundary causes a sharp edge. For this problem, it is proposed to prevent an edge due to region boundary in a MB by introducing multiple levels of background quality and changing quality smoothly [20]. In our paper, the system adopts to adjust pixel coordinates   Fig. 1, because sharp edge due to region boundary has a positive side effect that it attracts attention of a viewer. Figure 2 shows an image of adjustment process. This process is formulated below.
where x L,T and y L,T are coordinates of top left pixel of ROI.
x R,B and y R,B are those of bottom right pixel. X L,T and Y L,T are those of top left pixel of adjusted ROI. X R,B and Y R,B are those of bottom right pixel. N is size of MB. In this paper, N is set to 16 because the H.264 / MPEG4 AVC is assumed as described later.
The adjusted coordinates are transmitted to the pre-filter. The pre-filter reduces the amount of bits of regions belonging to MBs determined as background. Detail of design about the pre-filter is described in following section.
Finally a filtered picture, of which the amount of bits in background region is reduced, is input to the video encoder. Although there is no particular limitation on the video encoder for filtering scheme, the H.264 / MPEG4 AVC encoder Baseline Profile, which is a mainstream standard currently in the surveillance camera market, is assumed in this  In the decoder side, the H.264 / MPEG4 AVC decoder receives and decodes the transmitted stream. Then the decoded picture is input into the post-filter. Detail of design about the post-filter is described in following section. Output of the post-filter is that of the proposed decoder system.

Proposed filtering scheme
This section describes detail of the pre-filter and post-filter those are the core of the proposed filtering scheme. First, a structure of pre-filter is shown in Fig 3. The pre-filter is composed of a low-pass filter, the DRC filter and the USP filter. Purpose and behavior of each module are described in following subsections.

Low-pass filter
Low-pass filter has been conventionally used in filtering scheme [20] [22]. This filter reduces the amount of bits in background region by suppressing the high-frequency components. In this paper the general Gaussian filter is used as this filter. Definition of the used Gaussian filter is following. where σ is the variance parameter of the Gaussian filter, and the value is obtained from the kernel size k.

DRC filter
The QP control scheme varies the amount of bits by changing dynamic range, similar to entropy, of the source signal. In particular, quantization step of background region is larger than that of ROI. The DRC filter replicates the capability on a filtering scheme, in which it is assumed that quantization step is same between ROI and background region. This filter compresses dynamic range by performing the down-scaling processing for all pixels in pixel domain. Entropy of picture signal becomes low by compressing dynamic range, and therefore the amount of bits is also reduced. Figure 4 shows behavior of the DRC filter. First subtraction 128 from the pixel value p i is performed to make center value of scaling process 128. Then the value is down-scaled by using a predetermined scaling factor R e , and is rounded to integer value. A range of R e is from 0 to 1.0. Finally compressed value p o is produced by adding 128 to the integer value. By using this filter, background region becomes dim since dynamic range of the region is narrow.
By the way, since center value of scaling process is set to 128, following effects are expected.
• Under an assumption that scene conditions can not be specified, pixel value of input pictures is distributed around 128. In particular, chroma components have this tendency. Therefore setting the center value to 128 prevents that residual entropy increases wastefully because it is possible to minimize the expected amplitude of residual pixel value at switching point between ROI and background region.
• When playing without the post-filter, a sense of discomfort about color is small subjectively. For instance, if center value of scaling process is set to 0, color is biased towards green such as Fig. 5(a), because the prefilter processes pictures in YUV space as well as many video codecs. On the other hand, when scaling centering on 128, color closes to natural as shown in Fig. 5(b).

USP filter
In the QP control scheme, skip mode is likely to be selected in background region. This has influence on allocation of the amount of bits. The USP filter realizes the similar function in a filtering scheme. That measures distance between current and previous picture by block basis, and then overwrites the block of the current picture to induce skip mode when the distance is smaller than predetermined threshold, Figure 6 shows behavior of the USP filter. In this filter, it is determined whether the overwriting process is performed or not by threshold processing of following.
where B t (m, n) represents a block composed of 16 × 16 pixels of input pictures to be filtered, and B t−1 (m, n) represents the co-location block of B t (m, n) in the filtered picture transmitted to the encoder just before time. t is picture index of the block such as timestamp, and m, n represents position of the block in the picture. MAD is a function to calculate the mean absolute difference (MAD) of the two blocks. T is the (a) Center value is set to 0 (b) Center value is set to 128 parameter which is used to calculate the threshold. In the proposed filtering scheme, dynamic range of background region is compressed by the DRC filter, and the lower scaling factor R e becomes, the lower the MAD of B t (m, n) and B t−1 (m, n) are. Therefore multiplying R e to the threshold T is performed to determine whether the overwriting process is performed or not in a stable manner regardless of R e . If the condition Eq.7 is true, update processing to B t (m, n) is applied by copying the filtered block B t−1 (m, n). This process encourages that B t (m, n) is encoded by skip mode in the H.264 / MPEG4 AVC encoder.

The post-filter
In the proposed system, it is necessary to decompress dynamic range to the original scale at the decoding side, since the pre-filter generates the picture of which dynamic range in background region is compressed as described above. The post-filter is a module for this function. The post-filter performs inverse operation of the DRC filter to return the compressed pixel value the original scale. Even without this filter, it is possible to view a decoded picture, while color of background region becomes dim. Figure 7 shows behavior of the post-filter. q i and q o represents pixel value. The basic behavior is the same as the DRC filter. The difference is that the scaling factor is inverse number of R e , R d = 1/R e , and that cropping process is added to the end.
As described above, q i is determined by rounding off in calculation process of p o and by deteriorating mixing by quantization noise on the video encoder. Therefore, the result of the inverse operation using R e is likely to take a value less than 0 or a value of 256 or more. However, since it is known that range of p i , which is pixel value before compression, is [0, 255], it is reasonable to keep q o within [0, 255]. Therefore the negative pixel values are shifted to 0, and 256 or more pixel values are shifted to 255 in the cropping process. In this paper, range of pixel value is assumed as [0, 255] in a typical pattern, while it is necessary to adjust the range as appropriate in an application such as high dynamic range compression.

Simulation
In this section, experiments are performed in order to evaluate how close the proposed filtering schemes show perfor-  mance to the QP control scheme. The comparison is conducted in terms of the amount of reduced bits by the ROI codings and picture quality of decoded pictures. We also compare it with the Gaussian filter as a conventional filter scheme.

Evaluation method
To evaluate performance of the ROI codings, the graph of which horizontal axis is reduction rate, and vertical axis is picture quality of decoded pictures in background region is drawn. In this paper, reduction rate Q is defined by following equation.
where F B is a reference bit-rate calculated by uniform quality coding which does not incur the filters. F R is a bit-rate of ROI coding. As picture quality of decoded pictures, the PSNR-Y and the SSIM [27] are used. The PSNR-Y is calculated by using pixels belonging to background region because the ROI coding schemes of this paper maintain picture quality of ROI while reducing the amount of bits. It is possible to evaluate pixels belonging only to background region since the deterioration is measured for each pixels independently as for the PSNR-Y.

Test sequences
Test sequences used to draw the graph are Hall monitor created by Rensselaer Polytechnic Institute and Bowing created by FUJITSU LABORATO-RIES LTD. We select test sequences which are taken by the fixed camera since we assume surveillance systems. Resolution of these sequences is 352 × 288, and that frame rate is 30 fps.
The Hall monitor is a video sequence that monitors the corridor and pedestrians appear, and therefore we use a human detection detector using the HOG feature for the object detection engine. On the other hand, since the Bowing is a video sequence in which a face toward the camera appears, a face detector using the Haar-like feature is used for the object detection engine.

Parameter configuration
QP of ROI is fixed to 26. In the QP control scheme, QP of background region is In the filtering schemes, there are three parameters, the kernel size k of the Gaussian filter, the scaling factorR e of the DRC filter, and the parameter T of the USP filter. A number of combinations are assumed for these parameters. Therefore we performe encoding and decoding with multiple combinations of these parameters, and envelope curves are drawn to collected data in terms of the reduction rate and picture quality of decoded pictures in background region. Figure 8 and 9 show simulation results of relationship between reduction rate and picture quality of decoded pictures. When compared to the proposed filtering scheme and the conventional filtering scheme, the proposed filtering scheme reduces the amount of bits more efficiently than the conventional filtering scheme does at all the points. For example, in case of the Hall monitor, when the PSNR-Y is approximately 27 dB, reduction rate of the conventional filtering scheme is approximately 22 %, while that of the proposed filtering scheme is about 58 %. In other words, on the basis of bit-rate of the conventional filtering scheme, the proposed filtering scheme can reduce bit-rate to half roughly while maintaining picture quality. On the other hand, comparison of the proposed filtering scheme and the QP control scheme shows that the proposed filtering scheme is slightly inferior than the QP control scheme. When the PSNR-Y is approximately 27 dB, the reduction rate of the QP control scheme is about 60 %, and difference between two approaches is about 2 %. These tendency are the same in case of the Bowing.

Results
Even when comparing in terms of the SSIM, the proposed filtering scheme is superior to the conventional filtering scheme and inferior to the QP control scheme as in the terms of the PSNR-Y. Focusing on low reduction rate range in the graph, the difference between the proposed filtering scheme and the QP control scheme shrinks. It indicates that the proposed filtering scheme can achieve almost the same performance as the QP control scheme. Figure 10 and 11 show comparison of decoded pictures when performing the uniform quality coding (a), the QP control scheme (b), the conventional filtering scheme (c) and the proposed filtering scheme (d). The PSNR-Y of each decoded pictures are approximately 27 dB. Comparing ROI, each ROI coding schemes can achieve fine picture quality enough to identify the individual. In background region, tendency of the degradation between each scheme is different although the same picture quality is achieved in terms of the PSNR-Y. For the QP control scheme, since the amount of bits is reduced by coarse quantization, block noise is conspicuous. On the other hand, the conventional filtering scheme reduces the amount of bits just by the lowpass filter, and therefore block noise is not observed while blur is observed over the picture instead. In case of the proposed filtering scheme, tendency is similar to the QP control scheme. The QP control scheme is better than the proposed filtering scheme with respect to reproducibility of color, and the proposed filtering scheme is fine with respect to reproducibility of detail such as edges.

Conclusion
In this paper, a new filtering scheme is proposed to reduce the amount of bits in background region efficiently. The proposed filtering scheme is designed at the thought of the way that the QP control scheme reduces the amount of bits. It is composed of the DRC filter and the USP filter in addition to the conventional low-pass filter. The simulation results show that the proposed filtering scheme can reduces bit-rate to half roughly while maintaining picture quality when the PSNR-Y is approximately 27 dB. Compared to the uniform quality coding, 58 % of bit-rate can be reduced on the simulation in this paper, and we expect a significant reduction in storage cost and communication cost of a wireless environment.