Optimization for Greedy Non-maximum Suppression Based on Multi-task Convolutional Neural Network

  • Qingqing Hong
  • Qian Fan
  • Lifeng Zhang

Abstract

Non-maximum suppression (NMS) is an essential part of the face detection pipeline based on a convolutional neural network (CNN). The typical approach for NMS used by face detection is a greedy, locally optimal strategy, which is to localize objects from a set of candidate locations. However, NMS still has some shortcomings, such as sometimes the detection box has no relationship with a high classification score, which leads to misjudging face localization during NMS. In this paper, we observed that the NMS implemented on the multi-task convolutional neural network (MTCNN) is a cascaded network and the enhanced NMS based on MTCNN to achieve high performance during face detection and alignment. We employ WIDER FACE as the test dataset to evaluate our proposal. The precision and recall curves are drawn when using three subsets at different thresholds. The result shows that the proposed approach can perform better performance than traditional NMS.

References

D. Chen, S. Ren, Y. Wei, X. Cao and J. Sun, “Joint cascade face detection and alignment”, European Conference on Computer Vision (ECCV 2014), Vol.8694, pp.109-122, 2014. DOI: 10.1007/978-3-319-10599-4_8

K. Zhang, Z. Zhang, Z. Li and Y. Qiao, “Joint Face Detection and Alignment using Multi-task Cascade ConvolutionalNetworks”, IEEE Signal Processing Letters, Vol.23, No.10, pp.1499-1503, 2016. DOI: 10.1109/LSP.2016.2603342

P. Viola and M. J. Jones, “Robust real-time face detection”, International Journal of Computer Vision, Vol.57, No.2, pp.137-154, 2004. DOI: 10.1023/B:VISI.0000013087.49260.fb

H. Li, Z. Lin, X. Shen, J. Brandt and G. Hua, “A convolutional neural network cascade for face detection”, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp.5325-5334, 2015. DOI: 10.1109/CVPR.2015.7299170

Z. Zhang, P. Luo, C. C. Loy and X. Tang, “Facial landmark detection by deep multi-task learning”, European Conference on Computer Vision (ECCV 2014), Vol.8694, pp.94- 108, 2014. DOI: 10.1007/978-3-319-10599-4_7

S. S. Farfade, M. J. Saberian and L. J. Li, “Multi-view face detection using deep convolutional neural network”, Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR 2015), pp.643-650, 2015. DOI: 10.1145/2671188.2749408

R. Rothe, M. Guillaumin and L. V. Gool, “Non-Maximum Suppression for Object Detection by Passing Messages between Windows”, Asian Conference on Computer Vision (ACCV 2014), Vol.9003, pp.290-306, 2014. DOI: 10.1007/978-3-319-16865-4 19

N. Bodla, B. Singh, R. Chellappa and L. S Davis, “Soft-NMS-Improving Object Detection With One Line of Code”, The IEEE International Conference on Computer Vision (ICCV 2017), pp.5561-5569, 2017. DOI: 10.1109/ICCV.2017.593

K. He, G. Gkioxari, P. Dollar and R. Girshick, “Mask rcnn”, The IEEE International Conference on Computer Vision (ICCV 2017), Vol.42, No.2, pp.2961-2969, 2017. DOI: 10.1109/TPAMI.2018.2844175

J. Hosang, R. Benenson and B. Schiele, “Learning non-maximum suppression”, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), pp.4507-4515, 2017. DOI: 10.1109/CVPR.2017.685

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu and A. C. Berg, “SSD: Single shot multibox detector”, European conference on computer vision(ECCV 2016), Vol.9905, pp.21-37, 2016. DOI: 10.1007/978-3-319-46448- 02

Y. Rao, D. Lin, J. Lu and J. Zhou, “Learning Globally Optimized Object Detector via Policy Gradient”, 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), pp.6190-6198, 2018. DOI: 10.1109/CVPR.2018.00648

Q. Zhu, M. C. Yeh, K. T. Cheng and S. Avidan, “Fast human detection using a cascade of histograms of oriented gradients”, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), pp.1491-1498, 2006. DOI: 10.1109/CVPR.2006.119

C. Zhang and Z. Zhang, “Improving multiview face detection with multi-task deep convolutional neural network”, IEEE Winter Conference on Applications of Computer Vision, pp.1036-1041, 2014. DOI: 10.1109/WACV.2014.6835990

Z. Zhang, P. Luo, C. C. Loy and X. Tang, “Facial Landmark Detection by Deep Multi-task Learning”, European conference on computer vision(ECCV 2014), Vol.8694, pp.94-108, 2014. DOI: 10.1007/978-3-319-10599-4_7

B. Yang, J. Yan, Z. Lei and S. Z. Li, “Convolutional Channel Features”, 2015 IEEE International Conference on Computer Vision(ICCV 2015), pp.82-90, 2015. DOI: 10.1109/ICCV.2015.18

Published
2020-01-26
How to Cite
Hong, Q., Fan, Q., & Zhang, L. (2020). Optimization for Greedy Non-maximum Suppression Based on Multi-task Convolutional Neural Network. Journal of the Institute of Industrial Applications Engineers, 8(1), 39. https://doi.org/10.12792/jiiae.8.39
Section
Articles