IIAE CONFERENCE SYSTEM, The 1st International Conference on Industrial Application Engineering 2013 (ICIAE2013)

Font Size: 
One-Shot-Learning Gesture Recognition Using Motion History Based Gesture Silhouettes
Upal Mahbub, Tonmoy Roy, Md. Shafiur Rahman, Hafiz Imtiaz, Seiichi Serikawa, Md. Atiqur Rahman Ahad

Last modified: 2013-03-28


A novel approach for gesture recognition based on motion history images is proposed in this paper for one- shot learning gesture recognition task. The challenge here is to perform satisfactory recognition operations with only one training example of each action, while no prior knowledge about actions, foreground/background segmentation, or any motion estimation and tracking are available. In the proposed scheme motion history imaging technique is applied to track the motion flow in consecutive frames. The information of motion flow is later utilized to calculate the percent change of motion flow for an action in different spatial regions of the frame. The space-time descriptor computed this way from the query video is a measure of the likeness of a gesture in a lexicon. Finally, gesture classification is performed based on correlation based and Euclidean distance based classifiers and the results are compared. Through extensive experimentations on a much diversified dataset the effectiveness of employing the proposed scheme is established.


One-Shot-Learning, Gesture Recognition, Gesture Silhouettes


(1) M. A. R. Ahad, Computer Vision and Action Recognition: A Guide for Image Processing and Computer Vision Community for Action Understanding, ser. Atlantis Ambient and Pervasive Intelligence. Atlantis Press, 2011.

(2) M. A. R. Ahad, J. Tan, H. Kim, and S. Ishikawa, “Human activity recognition: Various paradigms,” in Control, Automation and Systems, 2008. ICCAS 2008. International Conference on, pp. 1896- 1901, 2008.

(3) M. A. R. Ahad, J. K. Tan, H. Kim, and S. Ishikawa, “Motion history image: its variants and applications,” Mach. Vision Appl., vol. 23, no. 2, pp. 255–281, 2012.

(4) A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 3, pp. 257–267, Mar. 2001.

(5) H. Imtiaz, U. Mahbub, and M. A. R. Ahad, “Action recognition algorithm based on optical flow and RANSAC in frequency domain,” in SICE Annual Conference (SICE), 2011 Proceedings of, pp. 1627–1631, 2011.

(6) U. Mahbub, H. Imtiaz, T. Roy, M. S. Rahman, and M. A. R. Ahad, “A template matching approach of one-shot-learning gesture recognition,” Pattern Recognition Letters, 2012. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S0167865512002991

(7) U. Mahbub, H. Imtiaz, and M. A. R. Ahad, “An optical flow based approach for action recognition,” in Computer and Information Technology (ICCIT), 2011 14th International Conference on, pp. 646–651, 2011.

(8) S. R. Fanello, I. Gori, G. Metta, and F. Odone, “One-shot learning for real-time action recognition,” To appear in Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA), 2013.

(9) Y. M. Lui, “Human gesture recognition on product manifolds,” Journal of Machine Learning Research, vol. 13, pp. 3297– 3321, 2012.

(10) L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object categories,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, no. 4, pp. 594 –611, 2006.

(11) B. Lake, R. Salakhutdinov, J. Gross, and J. Tenenbaum, “One shot learning of simple visual concepts,” in Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 2011.

(12) W. Yang, Y. Wang, and G. Mori, “Human action recognition from a single clip per action,” MLVMA, pp. 482–489, 2009.

(13) H. J. Seo and P. Milanfar, “Action recognition from one example,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 867–882, 2011.

(14) M. Tang, “Recognizing hand gestures with Microsoft’s Kinect,” Computer, vol. 14, no. 4, pp. 303–313, 2011.

(15) Z. Ren, J. Yuan, and Z. Zhang, “Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera,” in Proceedings of the 19th ACM international conference on Multimedia, pp. 1093–1096, 2011.

(16) R. Zhou, J. Meng, and J. Yuan, “Depth camera based hand gesture recognition and its applications in human-computer-interaction,” in 8th International Conference on Information, Communications and Signal Processing (ICICS), pp. 1–5, 2011.

(17) I. Guyon, V. Athitsos, P. Jangyodsuk, B. Hammer, and H. J. E. Balderas, “ChaLearn gesture challenge: Design and first results,” CVPR workshop, pp. 1–6, 2012.

(18) T.-K. Kim and R. Cipolla, “Canonical correlation analysis of video volume tensors for action categorization and detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 8, pp. 1415–1428, 2009.

(19) Y. Song, D. Demirdjian, and R. Davis, “Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database.” in IEEE Conference on Automatic Face and Gesture Recognition, pp. 500–506, 2011.

(20) Z. Jiang, Z. Lin, and L. Davis, “Recognizing human actions by learning and matching shape-motion prototype trees,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 3, pp. 533–547, 2012.

(21) B.-W. Hwang, S. Kim, and S.-W. Lee, “A full-body gesture database for automatic gesture recognition,” in IEEE Conference on Automatic Face and Gesture Recognition, pp. 243–248, 2006.

(22) “ChaLearn Gesture Dataset (CGD2011),” ChaLearn, California, 2011.

(23) N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62–66, 1979.

(24) L.-F. Liu, W. Jia, and Y.-H. Zhu, “Survey of gait recognition,” in Proceedings of the intelligent computing 5th International Conference on Emerging Intelligent Computing Technology and Applications, pp. 652–659, 2009.

(25) J. Liu and N. Zhang, “Gait history image: a novel temporal template for gait recognition,” Proc. IEEE International Conf. on Multimedia and Expo, pp. 663–666, 2007.

(26) G. R. Bradski and J. W. Davis, “Motion segmentation and pose recognition with motion history gradients,” Mach. Vision Appl., vol. 13, no. 3, pp. 174–184, Jul. 2002.

(27) A. Bobick and J. Davis, “An appearance-based representation of action,” in Proceedings of the 1996 International Conference on Pattern Recognition (ICPR), pp. 307–312, 1996.

(28) R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2nd ed. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2001.

(29) V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Soviet Physics Doklady, vol. 10, no. 8, pp. 707–710, 1966.

(30) J. D. Golic and M. J. Mihaljevic, “A generalized correlation attack on a class of stream ciphers based on the levenshtein distance,” Journal of Cryptology, vol. 3, pp. 201–212, 1991.

(31) A. Marzal and E. Vidal, “Computation of normalized edit distance and applications,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 9, pp. 926–932, 1993.

(32) “ChaLearn Gesture Dataset. [online],” ChaLearn, California, http://www.kaggle.com/c/GestureChallenge

(33) H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” Acoustics, Speech and Signal Processing, 1978.

(34) D. Wu, F. Zhu, and L. Shao, “One shot learning gesture recognition from rgbd images,” CVPR Workshop on Gesture Recognition, 2012.

Full Text: PDF