Trajectory Clustering Using Affinity Propagation with Trajectory Entropy Descriptor

To cluster similar trajectories in surveillance videos, we consider using trajectory entropy descriptors, which are computed from trajectories of foreground objects. Based on the descriptors, we can assess the distances between trajectories with different lengths and transfer the distances to a similarity matrix for affinity propagation clustering. In the experiments, we have shown the effectiveness and efficiency of the proposed method for trajectory clustering compared to the state-of-the-art methods.


Introduction
Recently, surveillance cameras are set to monitor public environments for security issues.These surveillance videos record traffic conditions, crime, accidents, and other events.Nevertheless, it is hard to manually monitor such huge amounts of videos.As a result, automatic surveillance video analysis can help users to analyze video content.Among various visual cues, trajectories of foreground objects provide continuous temporal information for automatic surveillance video analysis.With trajectories, many research applications such as temporal frame interpolation (1) , video object retrieval (2) , human action recognition (3) , behavior analysis of the traffic flow (4) , and rare event detection (5) are proposed.
In this paper, we aim to analyze trajectories of moving objects, such as pedestrians, motorcycles, and vehicles, in surveillance videos.To extract trajectories, foreground object detection and tracking are required.Many recent foreground object detection methods can help to obtain foreground objects including background subtraction (6) , vehicle detection (7) , pedestrian detection (8) and group detection (9) .Foreground objects are then tracked between continuous frames.Among various object tracking methods, updating appearance models (10) , dynamic models over time (11), (12) and sparse representation based models (13), (14) are proposed to link foreground objects to trajectories.
After obtaining the trajectories of foreground objects, measuring similarity between two trajectories with different lengths remains a problem.When two trajectories are similar, the lengths are still different due to motion properties.To cluster similar trajectories, many methods are proposed.For example, Mo et al. (15) use B-spline curves to represent trajectories.They collect pre-labeled trajectories to train a model, and develop a new joint sparsity model for anomaly detection.Nascimento et al. (16) use hidden Markov models (HMM) to model dynamic motions of trajectories.The parameters of HMM are estimated by using the EM algorithm.Naftel and Khalid (17) apply discrete Fourier transform (DFT) coefficients to describe trajectories.They group trajectories by using self-organising maps (SOM).Hu et al. (18) also apply the DFT coefficients with the Dirichlet process mixture model (DPMM) to cluster trajectories.Atev et al. (19) compare three trajectory representation methods including dynamic time warping (DTW) (20) , the longest common subsequence (LCSS) (21) , and the Hausdorff distance with modifications for estimating the similarity between trajectories.Two clustering methods are also compared including agglomerative clustering and spectral clustering.Finally, they suggest using the Hausdorff distance with the spectral clustering for solving the trajectory clustering problem.
In practice, some researchers focus on developing the representations of trajectories.These methods can be categorized as fixed trajectory length and un-fixed trajectory length approaches.The representations of fixed trajectory include down sampling (DS) (22) , principal component analysis (PCA) (23) , and discrete Fourier transform (DFT) coefficients (17) (18) .Different from fixed length representation methods, dynamic time warping (DTW) (20) , the longest common subsequence (LCSS) (21) and modified Hausdorff (MH) distance (19) (24) , directly measure distances between two trajectories with different lengths.Nevertheless, these methods only consider the similarity between the spatial points of trajectories.
In this paper, we propose using trajectory entropy descriptor (TED) (25) and affinity propagation (26) to group similar trajectories from surveillance videos.To extract trajectories of foreground objects, background modeling (27) using galaxy descriptor (28) is applied to extract foreground objects at first and then a maximum a posterior tracking process (29) is applied to obtain the trajectories of the extracted foreground objects.Based on trajectories, we apply the trajectory entropy descriptors to represent trajectories, which can be used to calculate similarity between trajectories with different lengths.Then, we apply affinity propagation to cluster trajectories based on the similarity.After the clustering, foreground objects with similar events can be grouped together as the results of automatic surveillance video analysis.In the experiments, we also compare the proposed method with state-of-the-art trajectory representation methods to provide results that are more convincing.The rest of this paper is organized as follows.Section 2 introduces the trajectory representation using the trajectory entropy descriptor.The trajectory clustering algorithm for event analysis is illustrated in Section 3. Section 4 shows the experimental results.Finally, we give the conclusion in Section 5.

Trajectory Representation
Trajectories of foreground objects provide motion information in visual surveillance.How to compare trajectories with different lengths remains a problem.Recent approaches aim to down sample trajectories to the same length or consider the point wise matching.In this paper, we propose using a trajectory entropy descriptor (25) to transform a trajectory to an encoded descriptor, which can solve the trajectory length problem.The measurement of the similarity between two trajectories is achieved by comparing their descriptors.A trajectory T i of the ith foreground object is defined as follows: where s i is the length of T i , and (x i (t), y i (t)) is the 2-D spatial coordinates of T i at frame t.
To encode T i , we present the trajectory entropy descriptor (TED) in the following.The moving velocities and accelerations of foreground objects are the key factors to affect lengths of trajectories.However, recent trajectory representation methods cannot be directly applied to measure the similarity between different trajectories and the length problem is still unsolved.To solve these problems, the trajectory entropy descriptor (TED) (25) is proposed to transfer velocities and accelerations of foreground objects to descriptors.Let v i,x (t) and v i,y (t) be the velocities of the horizontal and vertical directions of the ith trajectory at frame t, respectively and they are defined as and where Δt is the frame time.Similar to the representation of the velocity, the acceleration of the horizontal acceleration a i,x (t) and vertical acceleration a i,y (t) are defined as follows: and By considering the moving directions of foreground objects, we consider the positive and negative values of velocities and accelerations.
To observe the movement variations in each time slot, probability mass functions with respect to velocities and accelerations are proposed.Without losing the generality, we define the positive and negative based probability mass functions pv i,x+ (t) and pv i,x− (t) of the horizontal velocity for the ith trajectory as follows: and where λ (=0.001) is a small constant to avoid that the denominator becomes zero.Similarly, we also define the probability mass functions pv i,y+ (t) and pv i,y− (t) of the vertical velocity as follows: and Similarly, we can also define the positive and negative probability mass functions pa i,x+ (t) and pa i,x− (t) of the horizontal acceleration as follows: and Then, the probability mass functions pa i,y+ (t) and pa i,y− (t) of the vertical acceleration are defined as follows: With the probability mass functions of the positive and negative velocity in the horizontal direction, the positive and negative entropies Ev i,x+ and Ev i,x− of the velocity in the horizontal direction are defined as follows: The positive and negative entropies Ev i,y+ and Ev i,y− of velocity in the vertical direction are then defined as follows: and Similar to the definition of entropies of the velocities, we can also define the positive and negative acceleration entropies Ea i,x+ and Ea i,x− in the horizontal direction as follows: The acceleration entropies Ea i,y+ and Ea i,y− in the vertical direction are defined as follows: and Then, based on the entropy, the each trajectory T i can be modeled to a trajectory entropy descriptor T(T i ) as follows: The distance between two TEDs is defined as the Euclidean distance.

Trajectory Clustering
To group similar trajectories, we apply affinity propagation (26) with TEDs.Affinity propagation considers each TED as an exemplar by giving the preference.Because each TED is an exemplar, the number of clusters is not required during clustering.Two kinds of messages, responsibility and availability are used in affinity propagation.Before clustering, an affinity matrix s is constructed at first.Each element s(i, j) of s represents the similarity between two trajectories T i and T j as follows: where D R 2 (T(T i ), T(T j )) is the Euclidean distance of TEDs between T i and T j , and σ is the variance.Based on s(i, j), the responsibility r(i, j) is defined as: The availability a(i, j) is defined as: Finally, the self-availability is defined as: where if T j contains positive responsibility, it is considered as an exemplar.By continuously exchanging the responsibility and availability messages, the corresponding exemplar of T j is obtained by i.e.T i* and T i belong to the same group.Trajectories with similar TEDs can then be clustered to the same group.

Experimental Setting
We compared the clustering performance with different trajectory representation methods by using affinity propagation.We used three surveillance videos (29) , which included Hall, Sidewalk and Street to evaluate the performance.The sampled frames of each video are shown in Fig. 1.Generally, the extracted trajectories in surveillance videos are most incomplete and exist many false alarms due to object occlusion and lighting changes as shown in Fig. 2. The dataset information is showed in Table 1.We implemented the algorithm on an Intel Core i7 computer with a 3.40GHz CPU and 10 GB RAM.

Clustering Evaluation Metrics
In order to evaluate the performance of clustering results, clustering accuracy (ACC) (30) and normalized mutual information (NMI) (31) are used.Given the ith trajectory T i , c i is the obtained cluster label, and l i is the ground-truth label.The ACC is defined as: where Ω is the total number of trajectories and map(•) matches the ground-truth labels and the obtained cluster Besides ACC, the normalized mutual information (NMI) assess the correlation between the clustering results and is defined as: where C and L are sets of clusters of the clustering results and ground-truth groups, respectively.H(C) and H(L) are the entropies of C and L, respectively.Let |C p | be the number of samples in the cluster C p , and |L q | be the number of samples in the class L q .Then, the H(C) and H(L) are defined as follows: The mutual information I(C, L) between C and L is defined as: NMI value is bounded in [0, 1].A larger NMI indicates better clustering results.

Results
In the experiments, we used the affinity propagation (26) to cluster trajectories.We manually labeled the ground-truth of each trajectory based on moving directions, starting positions, moving velocities and accelerations.To  demonstrate the performance of the proposed method, we implemented the following trajectory representation methods: down sampling (DS) (22) , principal component analysis (PCA) (23) , discrete Fourier transform (DFT) (17) , dynamic time warping (DTW) (20) , and longest common subsequence (LCSS) (21) for comparison.
The ACC and NMI values of the clustering results of all of the methods are shown in Table 2 and Table 3, respectively.As shown in Table 2, our method has the best ACC results except the street dataset.Although LCSS performed a slightly better ACC compared to the proposed method, the computation time of LCSS is significantly larger than that of TED as shown in Table 4.The performance of our method is also better than those of recent trajectory representations in NMI as shown in Table 3.
The computation time of each trajectory representation method with respect to each dataset for clustering is shown in Table 4.As we expected, the fixed trajectory length methods, DS and PCA are faster than other methods, because only simple computation procedures are required in DS and PCA.Nevertheless, their clustering results are much worse than those of the un-fixed trajectory length methods as shown in Table 2 and Table 3.In contrast, the proposed TED is faster than the un-fixed trajectory length methods, and can achieve the best results in most datasets.Fig. 3 shows three sampled groups of clustering results for each dataset.To show the moving directions of the trajectories, we marked the starting points of the trajectories as red nodes.As shown in Fig. 3(a), although some trajectories are not complete and broken due to tracking, three moving directions including the directions from south to north, from east to west and from west to east are still separated in the Hall video.Similar clustering results can be observed in Fig. 3(b) of the Sidewalk video.The pedestrians moving from east to west and from west to east are clustered to different groups.Because of the motion properties of incomplete trajectories, these trajectories are separated to another group as shown in the third row of Fig. 3(b).In Fig. 3(c), foreground objects of three different moving directions are clustered to three groups.Such results show the potentials of the proposed method to separate foreground objects with different events to different groups.Thus, the proposed method can be applied to analyze the behavior of foreground objects in automatic surveillance video analysis.

Conclusions
We apply the TED to describe the motion properties of trajectories with different lengths.Based on the distances of TED between different trajectories, trajectories with similar motion properties are clustered by using affinity propagation to individual groups.To evaluate the clustering results, the state-of-the-art trajectory representation methods are compared and our method is shown to achieve better ACC and NMI values.In the future, we will apply the affinity propagation with TED to more video datasets for labels.The map(•) can be obtained by using the Kuhn-Munkres algorithm.The σ(a, b) is a delta function.Please note that σ(a, b) = 1, if a = b and σ(a, b) = 0, otherwise.A larger ACC means a better clustering results.

2 .
The sampled frames of (a) Hall, (b) Sidewalk, and (c) Street videos.The extracted trajectories of (a) Hall, (b) Sidewalk, and (c) Street videos.Each curve represents a trajectory of the detected foreground objects.

Fig. 3 .
Three sampled groups of clustering results of (a) Hall, (b) Sidewalk, and (c) Street videos.Each curve represents a trajectory of the detected foreground objects.The red nodes show the starting points of the trajectories.

Table 1 .
Video information

Table 2 :
Performance comparison of ACC (%) of different trajectory representations by using affinity propagation.

Table 3 :
Performance comparison of NMI (%) of different trajectory representations by using affinity propagation

Table 4 :
Computation time (seconds) of measuring similarity by using different trajectory representations.
evaluating the usability in automatic surveillance video analysis.