Color Spatio-Temporal JND and Its Application to Video Coding

A method which reveals the visual masking in the human visual system (HVS) is useful in perceptual data coding. In this paper, we propose a color DCT-based method to estimate color spatio-temporal just noticeable distortion (JND) for video coding. The spatio-temporal JND profiles are assessed by incorporating a new temporal masking adjustment into the mathematical model of estimating the DCTbased spatial JND profiles for luminance component and chrominance components of color images. In this paper, the new block-based temporal masking adjustment mainly considering the variation of local temporal statistics in luminance component between successive video frames is proposed. The spatio-temporal JND profiles are used to preprocess the motion-compensated residue in video coding for higher performance. The simulation results demonstrate the performance of the perceptual video coding in terms of bit rates and visual quality. The bit rates required by the perceptually tuned video codec are lower than that required by the un-tuned video codec at nearly the same visual quality.


Introduction
In order to achieve higher performance of compressing the image, the estimation of approaching actual JND of images is always an important topic and utilized into the perceptual coding schemes while considering that human eyes are the ultimate receiver of the visual data.Watson et al. [1], [2] presented that quantization matrices for the use in DCT-based compression were designed by exploiting visibility thresholds that are experimentally measured for quantization errors of the DCT coefficients.In [3], the JND threshold was determined by the dominant between the luminance masking and the texture masking to adapt the step size of a uniform quantizer in the proposed subband image coder.In [4], the masking thresholds derived in a locally adaptive fashion based on subband decomposition were applied to the design of a locally adaptive perceptual quantization scheme for achieving high performance in terms of quality and bit rate.Yang et al. [5] proposed a nonlinear additive model to estimate the spatial JND profiles for perceptual coding of color images.In [6], Liu proposed a wavelet-based color visual model to increase the efficiency of image coders in compressing color images.
For investigating the compact representation of video data in field of video coding, the JND model incorporated with temporal properties of the HVS are applied to the video codec with high coding efficiency to maintain acceptable visual quality at low bit rates.In [7], the spatial JND model in the pixel domain was extended by exploiting the temporal masking effect to obtain the spatio-temporal JND for video coding.The JND models proposed in [8] are the improvement of [7] for higher performance video coding by introducing the overlapping effect between the luminance adaptation and the spatial contrast masking.In [10], the estimation of the subband just noticeable distortion for video was developed by combining the luminance adaptation, contrast masking, spatio-temporal CSF, and eye movements.In order to improve the performance demonstrated in [10], Wei and Ngan [11] design a new perceptual model by considering the gamma correction and temporal CSF.In [12], the proposed JND model is constructed by taking not only the spatial and luminance properties of the HVS but also the temporal and chroma properties into account.
Accurate and effective estimation of the JND profiles of video signals is helpful to increase the efficiency of coding the video data.In this paper, a color DCT-based spatiotemporal JND model is proposed with the integrated formulations for video signals.By utilizing on the base detection threshold for each DCT coefficient in luminance and chrominance components of color images, the proposed model uses the masking adjustment presented in our prior work [14] to compute the spatial JND profiles of three color compo-DOI: 10.12792/icisip2014.034nents of the color image.Then, a new block-based temporal masking adjustment mainly considering the variation of local temporal statistics in luminance component between successive video frames is proposed to extend the above spatial JND profiles for assessing the color spatio-temporal JND profiles.The estimated spatio-temporal JND profiles are used to preprocess the motion-compensated residue in video coding.

Improvement of the Spatio-Temporal JND
Based on the Ahumada's model proposed in [13], the proposed DCT-based spatio-temporal JND for color video is expressed as Temporal masking effect refers to that the temporal redundancy in video sequences is due to motion related blurring and resolution reduction [15], [16].In [7], the percep- where means the inter-frame average luminance difference and

Justification of the Improved Spatio-Temporal JND
To justify the color spatio-temporal JND estimation, a subjective test is conducted to inspect if the estimated JND is perceptually redundant to the human visual perception.

Simulation Results
The proposed method is implemented with six CIF (352288) color video sequences.In the experiment, the simulation is carried out to justify the proposed JND model with  =1.0.In order to evaluate the visual quality of the JND-contaminated color video sequences, we adopt the similar subjective viewing test and viewing condition based on the method presented in [12], [17], [18] in the simulation.
In each subjective viewing test, each subject is asked to observe the pair color videos that are displayed side by side on the screen of the monitor.The pair videos are composed of the original color video and its JND-contaminated video for evaluating the perceptual difference between the pair videos.For a fair comparison, the presentation of the video pairs is in randomized order.The viewing condition of observing the pair video sequences is in a dark room at a viewing distance of 6 times the video height.
In Fig. 1 shows the experimental results of testing the "Harbour" color video at visually lossless quality.The 2-th, 12-th, and 28-th frames in the video sequences are depicted for comparison.The original 2-th frame of the "Harbour" color video is shown in Fig. 1a while its JND-contaminated version with PSNR of 28.87dB is shown in Fig. 1b.The two frames are perceptually indistinguishable for human eyes as observing at the viewing condition described above.
The same observation can also be found in other frames.Fig. 1d demonstrates that the perceptually lossless visual quality of the JND-contaminated 12-th frame of the "Harbour" color video (28.92dB) is shown while comparing to the original 12-th frame (Fig. 1c).These experimental results indicate that the proposed model is justified to be able to estimate the JND profiles for color video sequences.The simulation results of the 28-th frame are shown in Figs.1e   and f.Herein, the estimated color DCT-based spatio-temporal JND profiles are used to preprocess motion-compensated residue in the video scheme that is similar to the scheme proposed in [9] for improving the coding performance.The block diagram of the perceptually tuned video coding scheme is shown in Fig. 1.The dynamic range of the motion-compensated residue we can reduce, the less objective distortion of the reconstructed color image for a given bit rate we can achieve [19].That is, we utilizes the JND profiles to process the motion-compensated residue such that the dynamic range of perceptually tuned signals can be reduced to achieve lower bit rate or better reconstructed image quality.The experiments of coding the color video sequences by the video coding scheme are carried out with and without tuning the motion-compensated residue, while all frames are intra coded.The experimental results demonstrate that the bit rates required by the video coding scheme with the perceptually tuned motion-compensated residue are lower than that without the perceptually tuned motioncompensated residue, while both the coding video sequences are at nearly lossless visual quality.

Conclusions
where and are the spatiotemporal JND value and the spatial JND value, respectively, of the DCT coefficient at location (u,v) in the b-th block of the k-th frame in the O (O=Y, Cb, and Cr) color component of the video sequences.The spatial JND value of a specified coefficient is evaluated by the corresponding base visibility threshold and the masking adjustment that are described in [14]. is the proposed temporal masking adjustment that is estimated by using the variation of local temporal statistics of the b-th block between the successive frames (the k-th frame and the (k-1)-th frame) in luminance component (Y) of the video sequence.
tual experiment was investigated to find that human eyes are not sensitive to the changes of luminance on the time axis while video signals are displayed.Based on the variation of local temporal statistics in luminance component between successive video frames, the corresponding temporal masking adjustment used with the spatial JND to compute the spatio-temporal JND is investigated in this paper.The experiments designed to obtain the adjustment is similar to the experiment presented in [7].In this paper, the inter-frame variation of local temporal statistics in luminance component is based on the DCT 88 block.The relation between the temporal masking adjustment and the inter-frame variation at the b-th block between the k-th and (k-1)-th frame in luminance component (Y) is given by

()
is the b-th block in the k-th frame in O component.The scale factor is used to describe the human visual sensitivity to the variation of average background luminance between successive sequence frames.f is the function between the scale factor and the inter-frame average luminance difference.It can be found in [7] and is derived experimentally from a subjective test.The scale factor is used to describe the human visual sensitivity to the variation of block content between successive sequence frames.Also, masking effects exist in chrominance components of the color image and affect the sensitivity to chrominance components of a target color pixel.It cannot be easily identified since masking effects in chrominance components involves complex human vision mechanisms.This makes the estimation of noise detection thresholds in chrominance become difficult.To reduce the complexity of measuring the temporal masking in chrominance components, the temporal masking adjustment used in luminance component is directly applied to chrominance components in this paper while considering that human visual perception is less sensitive to chrominance components than luminance component.

Suppose a test image
represented in the YCbCr color space is contaminated by the associated JND profiles in the DCT domain.That is ̃ (5) where ̃ is the JND-contaminated DCT coefficient at location (u,v) in the b-th block of the k-th frame in the O (O=Y, Cb, and Cr) color component of the video sequences, is a uniformly distributed random variable taking value of either 1 or -1, and  is a scale factor whose value can be chosen such that the distortion is uniformly distributed over the contaminated image while the contaminating strength is slightly larger than JND.Herein, we use the scale factor  to inspect how the estimated JND approaches the actual JND of the video sequences in the following experiment.If the proposed model can accurately estimate the perceptual redundancy, the PSNR of the video contaminated by the associated JND profiles in the DCT domain should be as low as possible while maintaining the visual quality.
In this paper, a color DCT-based spatio-temporal JND profile is proposed.It is based on the combination of a new temporal masking adjustment and the mathematical model of estimating the DCT-based spatial JND profiles for luminance component and chrominance components of color images.The model is inspected to compare the visual distortion between the video and its JND-contaminated version by using a subjective viewing test.By utilizing the spatiotemporal JND profile, the coding performance of the perceptually tuned video scheme is improved