Апаратна оптимізація методів покращення якості відео на основі глибинних нейронних мереж

Микола Максимів; Тарас Рак

Розглянуто проблеми та різноманітні аспекти оптимізації глибоких моделей покращення якості відео для ефективного виконання на сучасному апаратному забезпеченні. Основну увагу зосереджено на багатокадровій генеративній мережі з багатомасштабною структурою та покадровим вирівнюванням (MST-GAN). Запропоновано комплексну стратегію апаратного прискорення, яка охоплює структурне прорідження, квантування (FP16/INT8), конвеєризацію, паралелізацію та компіляцію моделі за допомогою TensorRT. Проведено порівняльний аналіз до та після оптимізацій, зокрема зміну FPS, затримки, споживання пам’яті та FLOPs. Результати демонструють, що нейронна модель після оптимізації досягає прискорення у 4,3 раза за мінімальної втрати якості, що дозволяє її використання в реальному часі. Також розглянуто порівняння з іншими сучасними моделями VSR (BasicVSR, RSDN, EDVR) у контексті їх апаратної ефективності.

апаратне прискорення

генеративно-змагальні мережі

Maksymiv, M., & Rak, T. (2023). Methods of video quality-improving. Artificial Intelligence, (3(97)), 47–62. DOI: 10.15407/jai2023.03.047.
Rak, T., & Maksymiv, M. (2021). Methods to increase the contrast of the image with preserving the visual quality. Achievements in Cyber-Physical Systems, 6(2), 140–145. DOI: 10.23939/acps2021.02.140.
Wang, X., Chan, K., Yu, K., Dong, C., & Loy, C. C. (2019). EDVR: Video restoration with enhanced deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). DOI: 10.1109/CVPRW.2019.00291.
Isobe, T., Li, X., Li, Y., Li, H., & Shan, Y. (2020). Recurrent structure-detail network for video super-resolution. In European Conference on Computer Vision (ECCV). DOI: 10.1007/978-3-030-58568-6_3.
Chan, K., Wang, X., Yu, K., Dong, C., & Loy, C. C. (2021). BasicVSR: The search for essential components in video super-resolution and beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/CVPR46437.2021.00874.
Maksymiv, M., & Rak, T. (2025). Multi-scale temporal GAN-based method for high-resolution and motion-stable video enhancement. Radio Electronics, Computer Science, Control, (3(74)), 86–94. DOI: 10.15407/jai2023.03.047.
Chan, K., Xie, L., Dong, X., & Loy, C. C. (2022). BasicVSR++: Improving video super-resolution with enhanced propagation and alignment. In European Conference on Computer Vision (ECCV). DOI: 10.1007/978-3-031-19818-6_23.
Chan, K., Dong, X., & Loy, C. C. (2023). Efficient video super-resolution through recurrent latent propagation. arXiv preprint arXiv:2304.03804. DOI: 10.48550/arXiv.2304.03804.
OpenMMLab. (n.d.). MMagic VSR Library. OpenMMLab documentation. URL: https://github.com/open-mmlab/mmediting.
Jo, Y., Wug Oh, S., Kang, J., & Kim, S. J. (2018). Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/CVPR.2018.00799.
Huang, W., & Chen, X. (2022). Improved EDVR for efficient video super-resolution. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops (WACV-W). DOI: 10.1109/WACVW54576.2022.00093.
Cao, X., Wang, L., Zhang, C., Wu, J., & Ding, E. (2021). EGVSR: Efficient generative video super- resolution. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. arXiv preprint arXiv:2107.05307. DOI: 10.1109/ICASSP39728.2021.9414952.
Chu, M., Xie, T., Mayer, H., & Thuerey, N. (2019). TecoGAN: Temporally coherent GAN for video super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). DOI: 10.1109/ICCV.2019.01071.
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. DOI: 10.48550/arXiv.1503.02531.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NeurIPS). DOI: 10.48550/arXiv.1912.01703.
Xia, M., Zhang, Y., Liu, Y., & Chen, X. (2023). Structured sparsity learning for efficient video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/CVPR52729.2023.00899.
Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2017). Pruning convolutional neural networks for resource efficient inference. In International Conference on Learning Representations (ICLR). DOI: 10.48550/arXiv.1611.06440.
Zhou, W., Chen, Z., Liu, Y., & Qiao, Y. (2022). Adaptive inference for efficient video super- resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/CVPR52688.2022.01320.
Liu, S., Ma, X., Zhang, Y., & Lin, S. (2021). Dynamic temporal pyramid network: A closer look at efficient video super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). DOI: 10.1109/ICCV48922.2021.00937.
Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., ... & Guestrin, C. (2018). TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI). DOI: 10.48550/arXiv.1802.04799.
NVIDIA Corporation. (2023). TensorRT 8.6: Developer Guide. URL: https://docs.nvidia.com/deeplearning/tensorrt. DOI: 10.5281/zenodo.7863686.
Jain, A., Shah, A., Hegarty, S., & Pienaar, J. (2020). Compiling deep learning models for custom ASICs and FPGAs with Vitis AI. In Proceedings of the ACM SIGDA International Symposium on Field- Programmable Gate Arrays. DOI: 10.1145/3386265.3400658.
Yang, F., Shi, B., Yu, W., & Li, Y. (2020). Benchmarking deep video super-resolution models on large-scale datasets.