Improving pedestrian segmentation using region proposal-based CNN semantic segmentation

2023;
: pp. 854–863
https://doi.org/10.23939/mmc2023.03.854
Received: March 26, 2023
Accepted: May 18, 2023

Mathematical Modeling and Computing, Vol. 10, No. 3, pp. 854–863 (2023)

1
Faculty of Sciences and Technics, Cadi Ayyad University, Marrakesh, Morocco
2
MAST-EMGCU, Université Gustave Eiffel, IFSTTAR, F-77477 Marne-la-Vallée, France
3
Faculty of Sciences and Technics, Cadi Ayyad University, Marrakesh, Morocco

Pedestrian segmentation is a critical task in computer vision, but it can be challenging for segmentation models to accurately classify pedestrians in images with challenging backgrounds and luminosity changes, as well as occlusions.  This challenge is further compounded for compressed models that were designed to deal with the high computational demands of deep neural networks.  To address these challenges, we propose a novel approach that integrates a region proposal-based framework into the segmentation process.  To evaluate the performance of the proposed framework, we conduct experiments on the PASCAL VOC dataset, which presents challenging backgrounds.  We use two different segmentation models, UNet and SqueezeUNet, to evaluate the impact of region proposals on segmentation performance.  Our experiments show that the incorporation of region proposals significantly improves segmentation accuracy and reduces false positive pixels in the background, leading to better overall performance.  Specifically, the SqueezeUNet model achieves a mean Intersection over Union (mIoU) of $0.682$, which is a 12% improvement over the baseline SqueezeUNet model without region proposals.  Similarly, the UNet model achieves a mIoU of $0.678$, which is a 13% improvement over the baseline UNet model without region proposals.

  1. Minaee S., Boykov Y. Y., Porikli F., Plaza A. J., Kehtarnavaz N., Terzopoulos D.  Image segmentation using deep learning: A survey.  IEEE Transactions on Pattern Analysis and Machine Intelligence.  44 (7),  3523–3542 (2021).
  2. Hearst M. A., Dumais S. T., Osuna E., Platt J., Scholkopf B.  Support vector machines.  IEEE Intelligent Systems and their Applications.  13 (4), 18–28 (1998).
  3. Lahgazi M. J., Hakim A., Argoul P.  An adaptive wavelet shrinkage based accumulative frame differencing model for motion segmentation.  Mathematical Modeling and Computing.  10 (1), 159–170 (2023).
  4. Dalal N., Triggs B.  Histograms of oriented gradients for human detection.  2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).  1, 886–893 (2005).
  5. Ashok V., Balakumaran T., Gowrishankar C., Vennila I. L. A., Nirmal Kumar A.  The Fast Haar Wavelet Transform for Signal & Image Processing.  International Journal of Computer Science and Information Security. 7 (2010).
  6. Ren S., He K., Girshick R., Sun J.  Faster R-CNN: Towards real-time object detection with region proposal networks.  IEEE Transactions on Pattern Analysis and Machine Intelligence.  39 (6), 1137–1149 (2015).
  7. Redmon J., Divvala S., Girshick R., Farhadi A.  You only look once: Unified, real-time object detection.  2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 779–788 (2016).
  8. Bochkovskiy A., Wang C.-Y., Liao H.-Y. M.  YOLOv4: Optimal Speed and Accuracy of Object Detection.  Preprint arXiv:2004.10934 (2020).
  9. Law H., Deng J.  CornerNet: Detecting objects as paired keypoints.  Proceedings of the European Conference on Computer Vision  (ECCV). 734–750 (2018).
  10. Bolya D., Zhou C., Xiao F., Lee Y. J.  YOLACT: Real-time instance segmentation.  2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9156–9165 (2019).
  11. Pavani G., Biswal B., Gandhi T. K.  Multistage DPIRef-Net: An effective network for semantic segmentation of arteries and veins from retinal surface.  Neuroscience Informatics.  2 (4), 100074 (2022).
  12. Biswal B., Geetha P. P., Prasanna T., Karn P. K.  Robust segmentation of exudates from retinal surface using M-CapsNet via EM routing.  Biomedical Signal Processing and Control.  68, 102770 (2021).
  13. Xie H.-X., Lin C.-Y., Zheng H., Lin P.-Y.  An UNet-based head shoulder segmentation network.  2018 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW). 1–2 (2018).
  14. Wang P., Bai X.  Thermal infrared pedestrian segmentation based on conditional GAN.  IEEE Transactions on Image Processing.  28 (12), 6007–6021 (2019).
  15. Baheti B., Innani S., Gajre S., Talbar S.  Eff-unet: A novel architecture for semantic segmentation in unstructured environment.  2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 1473–1481 (2020).
  16. Liu T., Stathaki T.  Faster R-CNN for robust pedestrian detection using semantic segmentation network.  Frontiers in Neurorobotics.  12, 64 (2018).
  17. Yuan L., Qiu Z.  Mask-RCNN with spatial attention for pedestrian segmentation in cyber-physical systems.  Computer Communications.  180, 109–114 (2021).
  18. Syed A., Morris B. T.  CNN, segmentation or semantic embeddings: evaluating scene context for trajectory prediction.  International Symposium on Visual Computing. 706–717 (2020).
  19. Gao G., Gao J., Liu Q., Wang Q., Wang Y.  CNN-based density estimation and crowd counting: A survey.  Preprint arXiv:2003.12783 (2020).
  20. Luo J.-H., Zhang H., Zhou H.-Y., Xie C.-W., Wu J., Lin W.  ThiNet: pruning CNN filters for a thinner net.  IEEE transactions on pattern analysis and machine intelligence.  41 (10), 2525–2538 (2018).
  21. Reed R.  Pruning algorithms-a survey.  IEEE Transactions on Neural Networks.  4 (5), 740–747 (1993).
  22. Han S., Pool J., Tran J., Dally W.  Learning both weights and connections for efficient neural network.  Proceedings of the 28th International Conference on Neural Information Processing Systems.  1, 1135–1143 (2015).
  23. Li H., Kadav A., Durdanovic I., Samet H., Graf H. P.  Pruning filters for efficient convnets.  Preprint arXiv:1608.08710 (2017).
  24. He Y., Lin J., Liu Z., Wang H., Li L.-J., Han S.  AMC: AutoML for model compression and acceleration on mobile devices.  Proceedings of the European conference on Computer Vision (ECCV). 815–832 (2018).
  25. Liu Z., Mu H., Zhang X., Guo Z., Yang X., Cheng K.-T., Sun J.  MetaPruning: Meta learning for automatic neural network channel pruning.  2019 IEEE/CVF International Conference on Computer Vision (ICCV). 3295–3304 (2019).
  26. He Y., Ding Y., Liu P., Zhu L., Zhang H., Yang Y.  Learning filter pruning criteria for deep convolutional neural networks acceleration.  2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2006–2015 (2020).
  27. Sainath T. N., Kingsbury B., Sindhwani V., Arisoy E., Ramabhadran B.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets.  2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 6655–6659 (2013).
  28. Jaderberg M., Vedaldi A., Zisserman A.  Speeding up convolutional neural networks with low rank expansions.  Preprint arXiv:1405.3866 (2014).
  29. Denton E. L., Zaremba W., Bruna J., LeCun Y., Fergus R.  Exploiting linear structure within convolutional networks for efficient evaluation.  Proceedings of the 27th International Conference on Neural Information Processing Systems.  1, 1269–1277 (2014).
  30. Yin M., Sui Y., Liao S., Yuan B.  Towards Efficient Tensor Decomposition-Based DNN Model Compression with Optimization Framework.  2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10669–10678 (2021).
  31. Wu B., Wang D., Zhao G., Deng L., Li G.  Hybrid tensor decomposition in neural network compression.  Neural Networks.  132, 309–320 (2020).
  32. Bai Z., Li Y., Woźniak M., Zhou M., Li D.  DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression.  Pattern Recognition.  110, 107538 (2021).
  33. Iandola F. N., Han S., Moskewicz M. W., Ashraf K., Dally W. J., Keutzer K.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size.  Preprint arXiv:1602.07360 (2016).
  34. Sandler M., Howard A., Zhu M., Zhmoginov A., Chen L. C.  MobileNetV2: Inverted residuals and linear bottlenecks.  2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4510–4520 (2018).
  35. Lee D.-H., Liu J.-L.  End-to-end deep learning of lane detection and path prediction for real-time autonomous driving.  Signal, Image and Video Processing.  17, 199–205 (2022).
  36. Chollet F.  Xception: Deep learning with depthwise separable convolutions.  2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1800–1807 (2017).
  37. Wu C. W.  ProdSumNet: reducing model parameters in deep neural networks via product-of-sums matrix decompositions.  Preprint arXiv:1809.02209 (2018).
  38. Cséfalvay S., Imber J.  Self-Compressing Neural Networks.  Preprint arXiv:2301.13142 (2023).
  39. Ronneberger O., Fischer P., Brox T.  U-Net: Convolutional Networks for Biomedical Image Segmentation.  International Conference on Medical Image Computing and Computer-Assisted Intervention. 9351, 234–241 (2015).
  40. Beheshti N., Johnsson L.  Squeeze U-Net: A Memory and Energy Efficient Image Segmentation Network.  2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 1495–1504 (2020).
  41. Zhang S. H., Li R., Dong X., Rosin P., Cai Z., Han X., Yang D., Huang H., Hu S. M.  Pose2Seg: Detection Free Human Instance Segmentation.  2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 889–898 (2019).