Hardware Optimization of Video Quality Improvement Methods Based on Deep Neural Networks

M. R. Maksymiv; Тарас Рак

The paper addresses various aspects of optimizing deep video enhancement models for efficient execution on modern hardware. The focus is on a multi-frame generative network with multi-scale structure and frame-by-frame smoothing (MST-GAN). A comprehensive hardware acceleration strategy is proposed, which includes structural thinning, quantization (FP16/INT8), pipeline, parallelization, and model compilation using TensorRT. A comparative analysis is performed before and after optimizations, including changes in FPS, latency, memory consumption, and FLOPs. The results demonstrate that the neural model, after optimization, achieves a 4.3x speedup with minimal loss of quality, allowing for its use in real-time applications. A comparison with other modern VSR models (BasicVSR, RSDN, EDVR) in the context of their hardware efficiency is also considered.

hardware acceleration

Generative Adversarial Networks

pruning super-resolution

TensorRT

Maksymiv, M., & Rak, T. (2023). Methods of video quality-improving. Artificial Intelligence, (3(97)), 47–62. DOI: 10.15407/jai2023.03.047.
Rak, T., & Maksymiv, M. (2021). Methods to increase the contrast of the image with preserving the visual quality. Achievements in Cyber-Physical Systems, 6(2), 140–145. DOI: 10.23939/acps2021.02.140.
Wang, X., Chan, K., Yu, K., Dong, C., & Loy, C. C. (2019). EDVR: Video restoration with enhanced deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). DOI: 10.1109/CVPRW.2019.00291.
Isobe, T., Li, X., Li, Y., Li, H., & Shan, Y. (2020). Recurrent structure-detail network for video super-resolution. In European Conference on Computer Vision (ECCV). DOI: 10.1007/978-3-030-58568-6_3.
Chan, K., Wang, X., Yu, K., Dong, C., & Loy, C. C. (2021). BasicVSR: The search for essential components in video super-resolution and beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/CVPR46437.2021.00874.
Maksymiv, M., & Rak, T. (2025). Multi-scale temporal GAN-based method for high-resolution and motion-stable video enhancement. Radio Electronics, Computer Science, Control, (3(74)), 86–94. DOI: 10.15407/jai2023.03.047.
Chan, K., Xie, L., Dong, X., & Loy, C. C. (2022). BasicVSR++: Improving video super-resolution with enhanced propagation and alignment. In European Conference on Computer Vision (ECCV). DOI: 10.1007/978-3-031-19818-6_23.
Chan, K., Dong, X., & Loy, C. C. (2023). Efficient video super-resolution through recurrent latent propagation. arXiv preprint arXiv:2304.03804. DOI: 10.48550/arXiv.2304.03804.
OpenMMLab. (n.d.). MMagic VSR Library. OpenMMLab documentation. URL: https://github.com/open-mmlab/mmediting.
Jo, Y., Wug Oh, S., Kang, J., & Kim, S. J. (2018). Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/CVPR.2018.00799.
Huang, W., & Chen, X. (2022). Improved EDVR for efficient video super-resolution. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops (WACV-W). DOI: 10.1109/WACVW54576.2022.00093.
Cao, X., Wang, L., Zhang, C., Wu, J., & Ding, E. (2021). EGVSR: Efficient generative video super- resolution. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. arXiv preprint arXiv:2107.05307. DOI: 10.1109/ICASSP39728.2021.9414952.
Chu, M., Xie, T., Mayer, H., & Thuerey, N. (2019). TecoGAN: Temporally coherent GAN for video super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). DOI: 10.1109/ICCV.2019.01071.
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. DOI: 10.48550/arXiv.1503.02531.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NeurIPS). DOI: 10.48550/arXiv.1912.01703.
Xia, M., Zhang, Y., Liu, Y., & Chen, X. (2023). Structured sparsity learning for efficient video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/CVPR52729.2023.00899.
Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2017). Pruning convolutional neural networks for resource efficient inference. In International Conference on Learning Representations (ICLR). DOI: 10.48550/arXiv.1611.06440.
Zhou, W., Chen, Z., Liu, Y., & Qiao, Y. (2022). Adaptive inference for efficient video super- resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/CVPR52688.2022.01320.
Liu, S., Ma, X., Zhang, Y., & Lin, S. (2021). Dynamic temporal pyramid network: A closer look at efficient video super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). DOI: 10.1109/ICCV48922.2021.00937.
Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., ... & Guestrin, C. (2018). TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI). DOI: 10.48550/arXiv.1802.04799.
NVIDIA Corporation. (2023). TensorRT 8.6: Developer Guide. URL: https://docs.nvidia.com/deeplearning/tensorrt. DOI: 10.5281/zenodo.7863686.
Jain, A., Shah, A., Hegarty, S., & Pienaar, J. (2020). Compiling deep learning models for custom ASICs and FPGAs with Vitis AI. In Proceedings of the ACM SIGDA International Symposium on Field- Programmable Gate Arrays. DOI: 10.1145/3386265.3400658.
Yang, F., Shi, B., Yu, W., & Li, Y. (2020). Benchmarking deep video super-resolution models on large-scale datasets.