Optimization for Greedy Non-maximum Suppression Based on Multi-task Convolutional Neural Network
Non-maximum suppression (NMS) is an essential part of the face detection pipeline based on a convolutional neural network (CNN). The typical approach for NMS used by face detection is a greedy, locally optimal strategy, which is to localize objects from a set of candidate locations. However, NMS still has some shortcomings, such as sometimes the detection box has no relationship with a high classification score, which leads to misjudging face localization during NMS. In this paper, we observed that the NMS implemented on the multi-task convolutional neural network (MTCNN) is a cascaded network and the enhanced NMS based on MTCNN to achieve high performance during face detection and alignment. We employ WIDER FACE as the test dataset to evaluate our proposal. The precision and recall curves are drawn when using three subsets at different thresholds. The result shows that the proposed approach can perform better performance than traditional NMS.
D. Chen, S. Ren, Y. Wei, X. Cao and J. Sun, “Joint cascade face detection and alignment”, European Conference on Computer Vision (ECCV 2014), Vol.8694, pp.109-122, 2014. DOI: 10.1007/978-3-319-10599-4_8
K. Zhang, Z. Zhang, Z. Li and Y. Qiao, “Joint Face Detection and Alignment using Multi-task Cascade ConvolutionalNetworks”, IEEE Signal Processing Letters, Vol.23, No.10, pp.1499-1503, 2016. DOI: 10.1109/LSP.2016.2603342
P. Viola and M. J. Jones, “Robust real-time face detection”, International Journal of Computer Vision, Vol.57, No.2, pp.137-154, 2004. DOI: 10.1023/B:VISI.0000013087.49260.fb
H. Li, Z. Lin, X. Shen, J. Brandt and G. Hua, “A convolutional neural network cascade for face detection”, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp.5325-5334, 2015. DOI: 10.1109/CVPR.2015.7299170
Z. Zhang, P. Luo, C. C. Loy and X. Tang, “Facial landmark detection by deep multi-task learning”, European Conference on Computer Vision (ECCV 2014), Vol.8694, pp.94- 108, 2014. DOI: 10.1007/978-3-319-10599-4_7
S. S. Farfade, M. J. Saberian and L. J. Li, “Multi-view face detection using deep convolutional neural network”, Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR 2015), pp.643-650, 2015. DOI: 10.1145/2671188.2749408
R. Rothe, M. Guillaumin and L. V. Gool, “Non-Maximum Suppression for Object Detection by Passing Messages between Windows”, Asian Conference on Computer Vision (ACCV 2014), Vol.9003, pp.290-306, 2014. DOI: 10.1007/978-3-319-16865-4 19
N. Bodla, B. Singh, R. Chellappa and L. S Davis, “Soft-NMS-Improving Object Detection With One Line of Code”, The IEEE International Conference on Computer Vision (ICCV 2017), pp.5561-5569, 2017. DOI: 10.1109/ICCV.2017.593
K. He, G. Gkioxari, P. Dollar and R. Girshick, “Mask rcnn”, The IEEE International Conference on Computer Vision (ICCV 2017), Vol.42, No.2, pp.2961-2969, 2017. DOI: 10.1109/TPAMI.2018.2844175
J. Hosang, R. Benenson and B. Schiele, “Learning non-maximum suppression”, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), pp.4507-4515, 2017. DOI: 10.1109/CVPR.2017.685
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu and A. C. Berg, “SSD: Single shot multibox detector”, European conference on computer vision(ECCV 2016), Vol.9905, pp.21-37, 2016. DOI: 10.1007/978-3-319-46448- 02
Y. Rao, D. Lin, J. Lu and J. Zhou, “Learning Globally Optimized Object Detector via Policy Gradient”, 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), pp.6190-6198, 2018. DOI: 10.1109/CVPR.2018.00648
Q. Zhu, M. C. Yeh, K. T. Cheng and S. Avidan, “Fast human detection using a cascade of histograms of oriented gradients”, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), pp.1491-1498, 2006. DOI: 10.1109/CVPR.2006.119
C. Zhang and Z. Zhang, “Improving multiview face detection with multi-task deep convolutional neural network”, IEEE Winter Conference on Applications of Computer Vision, pp.1036-1041, 2014. DOI: 10.1109/WACV.2014.6835990
Z. Zhang, P. Luo, C. C. Loy and X. Tang, “Facial Landmark Detection by Deep Multi-task Learning”, European conference on computer vision(ECCV 2014), Vol.8694, pp.94-108, 2014. DOI: 10.1007/978-3-319-10599-4_7
B. Yang, J. Yan, Z. Lei and S. Z. Li, “Convolutional Channel Features”, 2015 IEEE International Conference on Computer Vision(ICCV 2015), pp.82-90, 2015. DOI: 10.1109/ICCV.2015.18
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).