Evaluation of a snip pruning method for a state-of-the-art face detection model

Artem Melnychenko; Oleksii Shaldenko

With rapid development of machine learning and subsequently deep learning, deep neural networks achieved remarkable results in solving various tasks. However, with increasing the accuracy of trained models, new architectures of neural networks present new challenges as they require significant amount of computing power for training and inference. This paper aims to review existing approaches to reducing computational power and training time of the neural network, evaluate and improve one of existing pruning methods for a face detection model. Obtained results show that the presented method can eliminate 69% of parameters while accuracy being declined only by 1.4%, which can be further improved to 0.7% by excluding context network modules from the pruning method.

G. Hinton et al., "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups," in IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, Nov. 2012.
https://doi.org/10.1109/MSP.2012.2205597
K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778.
https://doi.org/10.1109/CVPR.2016.90
K. Zhang, Z. Zhang, Z. Li and Y. Qiao, "Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks," in IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, Oct. 2016.
https://doi.org/10.1109/LSP.2016.2603342
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota.
Brown, T et al. “Language models are few-shot learners”, Advances in neural information processing systems, 33, pp.1877-1901.
Schwartz, Roy, Jesse Dodge, Noah Smith and Oren Etzioni. “Green AI.” Communications of the ACM 63, 2019, pp. 54-63.
https://doi.org/10.1145/3381831
Ben Taylor, Vicent Sanz Marco, Willy Wolff, Yehia Elkhatib, and Zheng Wang. “Adaptive deep learning model selection on embedded systems”, in Proc. 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems, New York, USA, pp. 31–43, 2018.
https://doi.org/10.1145/3211332.3211336
Han, Song, Huizi Mao and William J. Dally. “Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding”, arXiv: Computer Vision and Pattern Recognition, 2015.
S. Teerapittayanon, B. McDanel and H. T. Kung, "Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices", 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, USA, pp. 328-339, 2017.
https://doi.org/10.1109/ICDCS.2017.226
Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, and Nando de Freitas. “Predicting parameters in deep learning”, in Proc. 26th International Conference on Neural Information Processing Systems, vol. 2 (NIPS'13). Curran Associates Inc., Red Hook, USA, pp. 2148–2156, 2013.
Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. “Speeding up Convolutional Neural Networks with Low Rank Expansions”, In Proceedings of the British Machine Vision Conference. BMVA Press, September 2014.
https://doi.org/10.5244/C.28.88
Novikov, A., Podoprikhin, D., Osokin, A. and Vetrov, D.P.. “Tensorizing neural networks”, Advances in neural information processing systems, 28, 2015.
Song Han, Jeff Pool, John Tran, and William J. Dally. “Learning both weights and connections for efficient neural networks”, In Proceedings of the 28th International Conference on Neural Information Processing Systems – vol. 1 (NIPS'15), MIT Press, Cambridge, USA, pp. 1135–1143, 2015.
S. Park, J. Lee, S. Mo and J. Shin, , “Lookahead: A far-sighted alternative of magnitude-based pruning”, arXiv preprint arXiv:2002.04809, 2020.
B. Hassibi and D. G. Stork. “Second order derivatives for network pruning: optimal brain surgeon”, in Proc. 5th International Conference on Neural Information Processing Systems (NIPS'92), Morgan Kaufmann Publishers Inc., San Francisco, USA, pp. 164–171, 1992.
J. Frankle, and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks”, in Proc. 7th International Conference on Learning Representations, New Orleans, USA, May 6-9, 2019..
N. Lee, T. Ajanthan and P.H. Torr, “The lottery ticket hypothesis: Finding sparse, trainable neural networks”, in Proc. 7th International Conference on Learning Representations, New Orleans, USA, May 6-9, 2019.
J. Deng, J. Guo, E. Ververas, I. Kotsia, Stefanos Zafeiriou, “RetinaFace: Single-stage Dense Face Localisation in the Wild“,in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5203-5212, 2020.
https://doi.org/10.1109/CVPR42600.2020.00525