Search Region Correction via Spectrum Domain for Online Visual Tracking

Tracking-by-detection methods have drawn extensive attention in recent years. Most algorithms employ a local search region around previously estimated position assuming that the trajectory of moving object is smooth and the displacement of the object between frames is small enough. These, however, may not be valid always. Especially, when the moving objects get out of the local search region due to fast/irregular motion, tracking will fail no matter how welltrained tracker is used. In this paper, we correct the search region by utilizing the energy of the spectrum domain involving motion blur. Also, the proposed approach is light and dependent from core trackers thus the existing trackingby-detection methods can easily adapt and benefit from the improvement of performance without much change of processing time and methodology. We conduct the experiment on 10 sequences with fast-motion by conventional methods and the proposed method. The proposed method outperforms the conventional methods even the object moves fast and is blurred.


Introduction
Visual Tracking algorithms with bounding boxes as the output have drawn extensive attention in recent years (1)(2)(3) .Various challenges, benchmarks and measure methods have contributed to easy comparison and standardization of evaluation (4)(5)(6)(7) .Various methods have tried to learn more powerful features (8,9) , build more discriminative classifiers (10,11) , exploiting robust patches for tracking (3,12) , etc. Besides, another important issue needs to be addressed is that "how to determine labeled samples for training and unlabeled samples for the test".In order to decrease the computational complexity and exclude irrelevant samples, most of the existing methods collect samples with observation based on recent frames.Specifically, one conventional way is to set a search region based on the previously estimated location, and crop training/test samples within the region.The limitation is also obvious because when the estimation of the previous location is not correct enough, learning models will be updated with noisy and potential hard false-positives, which will lead to the tracking drift problem.Also, objects with fast motion can easily escape from the search region and the algorithms will lose the target completely.Enlarging the search region can sometimes relieve the problem of fast motion (13) , but can also involve other false-positives and increase the risk of tracking drift specifically when the target and background display similar visual cues (14) .
As one of the non-negligible influencing conditions, motion blur widely exists in visual tracking tasks especially with the fast-motion sequences taken in slow shutter speed.The illumination changes over time are integrated within the exposure time, which lead to the loss of sharp features and significant appearance change.Such "smoothed" pixels usually increase the difficulty in training models and have a negative impact on target inference.Besides, the motion blur itself can vary from linear to affine, from rotational to spacevariant over frames, which is hard to be modeled.Also, blurred images bring uncertainty to visual annotation, not to say an accurate estimation of the target position.
There are several methods about the correction of the search region by utilizing optical flow.One example (15) is to use a branching model based on the judgment of the tendency of the motion blur's change to estimate the correction vector in the current frame.However, this method has no assurance for the correct location of the object in the current frame since it does not take the movement of the object in the previous frame into account.As a result, it cannot compare the motion blur in two frames precisely.
It is necessary to overcome the drawback that only global motion blur can be exploited.We propose a novel approach that can utilize local motion blur as well by estimating the object in the current frame.Also, we update the observation model according to the degree of motion blur in order to improve the accuracy of existing methods.

Proposed Method 2.1 Framework of Tracking-by-detection
Most tracking-by-detection methods (16)(17)(18) set a search region based on the previously estimated position, and crop training/test samples within the region.Samples are used to update the observation model in order to adapt to the change of the object appearance.
In general, the object position is initialized manually as a bounding box   at first frame   , where  = 1 .  contains four elements of a rectangle surrounding the object, which are top left coordinates (, ), width , and height ℎ of the bounding box within the image.Then, we use an enlarged region of   , as the search region windows   by assuming the displacement of the object between two frames is small.However, objects with fast motion can easily escape from the search region.In that case, the algorithms will lose the target completely.

Overview of Proposed Method
To improve the above framework, we add preprocessing that can change the search region and observation model with the information estimated from motion blur.Specifically, we estimate the approximate object's position with fast motion according to corrected the optical flow, which is estimated using the spectrum domain features.The energy in the spectrum domain between two frames is analyzed.It is possible to estimate the displacement of the current object and the degree of motion blur between frames, and thus a tracker can track the object with fast motion using corrected search region.On the other hand, the classifier updates the observation model considering the motion blur.The overview of our method is shown in Fig. 1, we will explain the details in the following sections.

Search Region Correction by Optical Flow
In order to correct the search region misaligned due to the motion blur, a global displacement vector of the object should be estimated in each frame.As a solution, we estimate it by analyzing the optical flow vectors within the bounding box.
Optical flow is assembled of vectors representing the motion of an object.We employ Farneback (19) because it is robust with large displacements of the object by iterative and multi-scale displacement estimation.
Furthermore, we calculate the average of the optical flow vectors within B  as the representative motion vector, assuming the object has linear motion (i.e.we assume the object moves linearly in the image).The average optical flow   ̅̅̅̅̅ is calculated as follows where   is the representative optical flow vector at   .From this, the search region   considering the motion of the object is estimated as follows, Where shift function ℎ(  ,   ̅̅̅̅̅ ) calculates the search region that shifts   by   ̅̅̅̅̅ .
However, the optical flow will be influenced by motion blur especially in the case of fast motion.A method which is robust with motion-blur is needed.

Motion blur
The situations of motion blur can be quite complicated, as they may be space-variant and nonlinear (20) .Although in the real-world application, most motion blur is non-linear, it is reasonable to assume that the object's motion is linear between two successive frames because the exposure time is usually short.The proposed method assumes that motion within   is space-invariant.A simulation example of motion blur generation function we use in this paper is shown in Fig. 2. Both sides of the object are transparent by blending into the background when the object moves during the exposure time.

Analyze Blurred Image by Spectral Domain
Since the motion blur can usually be modeled by the convolution of point spread function (PSF) with the original sharp image, the motion information can be recovered by estimating the parameters of PSF (22) .PSF ℎ(, ) can be modeled as a rectangluar pulse whose orientation, width corresponds to the direction and degree of the blur.Regardless of the orientation, the PSF is determined as follows, where  is the length of the blur.The blurred image   is generated by convoluting the PSF with the original image I, As the orientation can be determined by the direction of the global optical flow vector calculated in equation ( 1).Thus, The task left is to estimate .
We analyze blurred image with different degrees of the blur in the spectrum domain.At first, PSF is translated into the optical transfer function (OTF) represented by .Each (, ) is calculated by taking fast fourier transform (FFT) on the PSF.Secondly, the blurred image after FFT   can be generated by doing multiplication with  and . is defined by the FFT of .
=  • . ( Taking a one-dimensional case as an example, the PSF of motion blur has the following characteristic, The signal  taken FFT   is computed as where  − is a normal orthogonal base.Thus, the maximum value of ℎ  taken FFT   is 1.From equation ( 5), (6), and ( 7), the sum of the power spectrum is This means the sum of the power spectrum decreases as the blur increases.The sum of the power spectrum is defined as the average energy of   according to Parseval theorem : where   is power spectrum.In summary, the average energy of the image equals the sum of the power spectrum, which decreases as the blur increases.

Object Position and Blur's Degree Estimation by Average Energy of Image
As mentioned in Sec.2.3 and Sec.2.4, situations of motion blur cannot obtain accurate flow due to motion blur.Thus, we propose to use the estimated average energy (equation ( 9)) to correct the search region and update the observation model.Specifically, object position in the current frame is estimated by comparing the current candidate positions and previously estimated position with respect to average energy.
Firstly, we create a blurred image set  −1, , which contains different degrees of blurred images.From the previously estimated object within bounding box  −1 .Each element of  −1, is   which is calculated as follows, where ℎ  is the motion filter with blur length  .Thus,  −1, has consisted as follows, where   is blur length interval.In fact,   is calculated directly from the spectral domain using equation (5), which is low calculation cost than performing DFT directly on the blurred image.Secondly, we detect the current candidate positions using equation (1).Corrected optical flow   ′ is calculated as follows, Where count = |  ̅̅̅̅̅ |   ⁄ + 1 which is decided   and  is interval divided.Furthermore, we calculate the difference of average energy between the images in  −1, and the images located at  , = ( +   ′ (),  +   ′ (), , ℎ) in order to obtain approximate position of the object.Criterion for evaluation is the equality of average image energy, which is evaluated as follows, where function min ( −1, ,  , ) calculate difference of average of energy between  −1, and  , , which can therefore estimate ℎ  and   ′ .Thus the search region is corrected by   ′ and the observation model is updated by ℎ  .

Experiments conditions
We evaluate our tracking algorithm with 3 trackingby-detection methods on 10 sequences in benchmark datasets which belong to fast-motion category (5) .The 3 trackers are real-time compressive tracking (CT) (16) , distribution fields for tracking (DFT) (17) , and exploiting the circulant structure of tracking-by-detection with kernels (CSK) (18) .We check whether the proposed method can improve the accuracy of conventional 3 methods with respect to fast motion.Each method is tested with four different search region sizes.We carry out the experiment on the condition that all the parameters of trackers are fixed.We conduct the experiment with MATLAB written program, which runs on Core (TM) i7-7700 3.60 GHz CPU with 12 GB RAM.
In this paper, we evaluate the accuracy of conventional 3 trackers with/without the proposed method by intersection over union (IoU) (23) , which is calculated as follows, where   is the previously estimated bounding box and   is the ground truth Bounding box.If IoU is larger than 0.5 in one frame, the tracking result is considered as a success.Each method is initialized with the same position.

Experimental Results
The average success rates of three methods are shown in Fig. 3. Fig. 4 shows some quantitative results of tracking scenes by CT method and our method.Sequence-dependent success rates of the conventional method and ours are plotted in Fig. 5, 6, 7, which show the relationship between the IoU threshold and the success rate.In Fig. 3, our method improves the performance of all the original methods by correcting the search regions when the object moves fast.Also, the success rates of conventional methods decrease when the search region becomes small.However, with the help of our method, the conventional methods can achieve a In conclusion, our approach can improve the accuracy of conventional methods even the search region is small and the blur is strong, as is shown in Fig. 4.There are two reasons for conventional methods losing the target.First, the object escapes from the search region because the displacement of the object is too large.Secondly, the object's appearance in a blurred image which is of low quality is significantly different from the observation model.However, our method can determine the start position of the classifier and match the degree of the motion blur to make the search region fit the object position by analyzing the spectrum domain.
Also, for the sequence which DFT succeeds in tracking originally, there still exists a small increase in success rate by adopting the proposed method.Although enlarging the search region can increase the success rate, the possibility that tracking drift occurs will also increase since the search region will contain more background.Nevertheless, the proposed method does not perform well when the appearance of the object changes largely in two frames because of the spectrum information's limitation.

Conclusions
This paper presents a preprocessing method for tracking algorithms to deal with the fast motion with strong motion blur that degrades the image by correcting the search region.Also, we update the observation model according to the degree of the current motion blur to further approach the current object's appearance.We conduct the experiment on the fast-moving objects and succeed in improving the accuracy of tracking by adopting the proposed approach to conventional methods.However, when the object does not have sufficient features, there exists a possibility that our method does not perform well.To cope with this situation, as a future work, considering the scaling of the object can possibly lead to a better solution.

Fig. 1
Fig. 1 Overview of the proposed method.

Fig. 2
Fig. 2 Average success plots over all sequence.

Fig. 4
Fig.4 Some examples of tracking scenes by CT method.

Fig. 5
Fig. 5 Examples of success rates in different search regions by CSK.

Fig. 6
Fig. 6 Examples of success rates in different search regions by CT.

Fig. 7
Fig. 7 Examples of success rates in different search regions by DFT.