Dynamically Weighted DTW for Dynamic Full-Body Gesture Recognition

As sensor technology has become more advanced, research projects exploring new ways in human-computer interaction have been enabled. Especially, after the Kinect released, study of gesture recognition for human-computer interaction using Kinect has become popular. Dynamic Time Warping (DTW) is one of the template matching algorithm used in the field of gesture recognition. By selecting the most similar gesture compared with the DTW to input sequence and reference sequence of the database, it is possible to achieve a gesture recognition. When trying to recognize a dynamic full-body gesture in Kinect environment, we have to compare all joints throughout the body. When calculating the similarity of the two sequences, all joints of the body is not important in the same way. We propose a dynamically weighted DTW method based on the displacement of the joint. The proposed method shows good performance in the experimental results.


Introduction
Generally used human-computer interface are mouse and keyboard.However, recently, natural user interface (NUI) based on user motion is being more applied due to the improvement of the sensor technology.After Microsoft Kinect released, NUI studies utilizing depth camera has become popular.Gesture recognition is an important technology used in NUI.Many methods have been proposed for gesture recognition, including the Dynamic Time Warping (1) , Hidden Markov Models (2) and Conditional Random Fields (3) .Dynamic time warping algorithm is a pattern matching algorithm that allows a nonlinear stretching of the sequence.To recognize the human gesture, DTW calculating the similarity between input data and predefined reference sequence.In case of recognizing full-body gesture, all body joints are not equally important.In this paper (4)(5)(6) , we introduce DTW approaches for full-body gesture and our method.

DTW gesture recognition for full-body gesture 2.1 Standard dynamic time warping
DTW algorithm measures the similarity between any two variable sequences (1) .When we compare sequence R and T(R has N length, T has M length), we can generate an N-by-M matrix g which is the accumulated cost matrix. ( ( 1, ) Typically function d is the Euclidean distance.Calculated distance value is g(N,M).
In order to compare two full-body sequences, calculate average of whole DTW distance which is calculated from each joint.

Locally weighted DTW
Locally Weighted DTW (4) consider displacement of each joint.( . ) where g is the gesture index, j is the joint index and n is the frame number.Dist j computes the displacement of jth joint's two consecutive coordinates in feature vector fn-1 g and fn g .
After the total displacements are calculated, filter out the noise.
where Ca and Cb are threshold values, and T1 and T2 are experimentally determined boundary values for threshold assignment.The weight of gesture class g are calculated via 1 , ( 1) Where lwj h is joint j's weight value for gesture class g.
The optimum  is chosen as the one that maximizes R.

Globally weighted DTW
Globally weighted DTW (5) use each joint's average DTW distance between gesture classes and within-class.Global weights is given by: max(0, ), where Db j is average DTW distance between gesture classes corresponding to joint j, Dw j is average DTW distance within-class corresponding to joint j.

Enhanced globally weighted DTW
Enhanced globally weighted DTW (6) is GDTW's extension version.EGDTW additionally calculate the weight of each classes.Weight of EDTW is obtained by max(0, ), where Db gj is average DTW distance between gesture classes corresponding to joint j of class g, Dw gj is average DTW distance within-class corresponding to joint j of class g.

Dynamically weighted DTW
Locally weighted DTW and globally weighted DTW methods have been proposed because all joints of full-body gesture is not equally important.This method determines the weights to each joint.The value of the weight is the same value from the start to the end of the operation.But, considering the movement from the start to the end of the gesture, it is not equally important from the start to the end of the operation.
Therefore, we propose, a method of weight changes according to time.The dynamic weights of the class g is defined as follows: Where g is gesture index, j is joint index and n is frame number. is minimum contribution.Parameter  is calculated as Equation (4).The gesture set used in this experiment is made of 15 kinds of full-body gesture which is motivated from UbiSoft's Just Dance 2015 (7) .Used capture device is Kinect, Kinect is operated at 30 frames per second.The Kinect SDK produce body skeleton from which the joint coordinate can be obtained.In this paper, we use x, y, z-coordinates of the joints.Gestures from 8 people are accumulated in the database, each gesture repeated 30 times.Whole database's sequences are normalized by average of two shoulder's position.

Experimental result
In front of the measurement of recognition rate, we select the sequence to represent the class.We choose the sequence which has lowest Within-class distance.Excluding selected to represents sequence, all sequences are set to the test sequence.
Use the 1-nearest neighbor (8) method, it was measured Fig. 1.Fox Gesture and its joint displacement graph Fig. 2. Maps Gesture example and its joint displacement graph gesture recognition rate.Table 1.show the recognition rate of the experimental results.When used in combination with other methods and when used Performance showed improvement on average (10.75% improved).Figure 1 and Figure 2 shows a joint displacement graph and Example gesture of a database.Watch advantages when using a dynamic weight, Advantage in Fox gesture (2.15%) is smaller than in the case of Maps gesture (11.35%).If there is a feature that can be distinguished from other gesture to displacement of every time, it is judged to be able to take a high advantage.

Conclusions
We have developed a dynamically weighted DTW method to improve recognition rate of gesture recognition.Experimental result show that dynamic weighting method improve recognition rate.In gestures each joints movement are continued in order, dynamic weighting method, showed a good performance.But, in gestures many joint moves at the same time, relatively performance improvement width was small.

Table 1 .
Gesture recognition result