ENSEMBLE IMAGE SUPER-RESOLUTION FOR UAV GEO-LOCALIZATION

2025;
: 68-76
https://doi.org/10.23939/ujit2025.01.068
Received: April 15, 2025
Revised: April 22, 2025
Accepted: June 01, 2025
1
Lviv Polytechnic National University, Lviv, Ukraine
2
Lviv Polytechnic National University, Lviv, Ukraine
3
Lviv Politechnic National University, Department of Automated Control Systems

In this paper, we address the challenge of visual geo-localization from low-quality UAV imagery captured in real world environments. We propose a two-stage architecture, which includes Super-Resolution and visual geo-localization. We introduced novel, non-learnable Ensemble Super-Resolution (ESR) module, which first refines upscaled aerial frames, then seamlessly feeds the enhanced imagery into a visual geo-localization pipeline. Designed as a parallelizable block integrated directly into any SR computation graph, ESR combines classical Bicubic interpolation with neural SR models – boosting image fidelity and overall system accuracy without additional training and executing efficiently on most hardware accelerator. We validate our approach on a dataset of 37 000 real-world UAV images, each downscaled by a factor of four and then restored via baseline methods (Bicubic, Bilinear, Nearest Neighbour, DRCT, HMA, HAT, SwinFIR) as well as our ESR-enhanced pipeline. Quantitative evaluation shows that standalone Super-Resolution methods yield PSNR in the low 20s dB and SSIM of 0.6–0.7 – far below standard benchmarks-leading to a marked drop in geo-localization accuracy (Recall@1 and AP).

In contrast, our ESR module stabilizes SR outputs and recovers image fidelity, raising geo-localization Recall@1 to 87.0 % (vs. 84.96 % with HMA restoration) and AP to 89.1 % (against 87.41 % with HMA restoration).

Our contributions are:

Two-stage framework combining Image Super-Resolution and visual geo-localization approaches tailored for low- resolution, noisy UAV data.

Non-learnable, parallelizable ESR block that fuses Bicubic interpolation with neural restoration within the network Super-Resolution graph – requiring no retraining and fully compatible with most accelerator.

Comprehensive empirical study demonstrating that ESR substantially narrows the domain gap and boosts geo- localization performance in real-world conditions.

We conclude that embedding lightweight, hardware-agnostic ensemble strategies into SR pipelines is a promising direction for robust UAV-based visual localization. Future work will explore adaptive ensemble weighting and domain- aware SR architectures to further mitigate aerial imaging noise and variability.

[1] Zheng, Z., Wei, Y., & Yang, Y. (2020). University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo- localization. In Proceedings of the 28th ACM International Conference on Multimedia (pp. 1395–1403). Association for Computing Machinery. https://doi.org/10.1145/3394171. 3413896

[2] Li, K., Yang, S., Dong, R., Wang, X., & Huang, J. (2020). Survey of single image super-resolution reconstruction. IET Image Processing, 14(11), 2273–2290. https://doi.org/ 10.1049/iet-ipr.2019.1438

[3] Hsu, C.C., Lee, C.M., & Chou, Y.S. (2024). DRCT: Saving Image Super-Resolution Away from Information Bottleneck. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (pp. 6133–6142). https://doi.org/10.1109/CVPRW63382.2024.00618

[4] Chen, X., Wang, X., Zhou, J., Qiao, Y., & Dong, C. (2023). Activating More Pixels in Image Super-Resolution Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 22367– 22377). https://doi.org/10.1109/CVPR52729. 2023.02142

[5] Chu, S. C., Dou, Z. C., Pan, J. S., Weng, S., & Li, J. (2024). HMANet: Hybrid Multi-Axis Aggregation Network for Image Super–Resolution. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 6257–6266). https://doi.org/10.1109/ CVPRW63382. 2024.00629

[6] Dafeng Zhang, Feiyu Huang, Shizhuo Liu, Xiaobing Wang, & Zhezhu Jin (2023). SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and Improved Training for Image Super- Resolution. https://doi.org/10.48550/ arXiv.2208.11247

[7] Deuser, F., Habel, K., & Oswald, N. (2023). Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 16801–16810). https://doi.org/10.1109/ ICCV51070.2023.01545

[8] Lin, J., Zheng, Z., Zhong, Z., Luo, Z., Li, S., Yang, Y., & Sebe, N. (2022). Joint Representation Learning and Keypoint Detection for Cross-View Geo-Localization. IEEE Transactions on Image Processing, 31, 3780–3792. https://doi.org/10.1109/TIP.2022.3175601

[9] Wang, T., Zheng, Z., Yan, C., Zhang, J., Sun, Y., Zheng, B., & Yang, Y. (2022). Each Part Matters: Local Patterns Facilitate Cross-View Geo-Localization. IEEE Transactions on Circuits and Systems for Video Technology, 32(2), 867– 879. https://doi.org/10.1109/TCSVT.2021.3061265

[10] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90

[11] Dai, M., Hu, J., Zhuang, J., & Zheng, E. (2022). A Transformer– Based Feature Segmentation and Region Alignment Method for UAV-View Geo-Localization. IEEE Transactions on Circuits and Systems for Video Technology, 32(7), 4376–4389. https://doi.org/10.1109/TCSVT.2021.3135013

[12] Aäron van den Oord, Yazhe Li, & Oriol Vinyals (2018). Representation Learning with Contrastive Predictive Coding. ArXiv, abs/1807.03748. http://dx.doi.org/10.48550/arXiv. 1807.03748

[13] Yang, J., Wright, J., Huang, T., & Ma, Y. (2010). Image Super–Resolution Via Sparse Representation. IEEE Transactions on Image Processing, 19(11), 2861–2873. https://doi.org/ 10.1109/TIP.2010.2050625

[14] Dong, C., Loy, C., He, K., & Tang, X. (2016). Image Super- Resolution Using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 295–307. https://doi.org/10.1109/TPAMI. 2015.2439281

[15] Dong, X. (2016). Accelerating the Super-Resolution Convolutional Neural Network. In Computer Vision – ECCV 2016 (pp. 391–407). Springer International Publishing. https://doi.org/10.1007/978-3-319-46475-6_25

[16] Kim, J., Lee, J., & Lee, K. (2016). Accurate Image Super- Resolution Using Very Deep Convolutional Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1646–1654). https://doi.org/ 10.1109/CVPR. 2016.182

[17] Kim, J., Lee, J., & Lee, K. (2016). Deeply-Recursive Convolutional Network for Image Super-Resolution. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1637–1645). https://doi.org/ 10.1109/CVPR. 2016.181 [18] Lim, B., Son, S., Kim, H., Nah, S., & Lee, K. (2017). Enhanced Deep Residual Networks for Single Image Super- Resolution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 1132– 1140). https://doi.org/10.1109/CVPRW.2017.151

[19] Zhang, Y., Tian, Y., Kong, Y., Zhong, B., & Fu, Y. (2018). Residual Dense Network for Image Super-Resolution. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2472–2481). https://doi.org/10.1109/ CVPR. 2018.00262

[20] Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., & Fu, Y. (2018). Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In ECCV. https://doi.org/ 10.1007/978-3-030-01234-2_18

[21] Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., & Timofte, R. (2021). SwinIR: Image Restoration Using Swin Transformer. In 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (pp. 1833–1844). https://doi.org/10.1109/ICCVW54120.2021.00210

[22] Chen, K., Li, L., Liu, H., Li, Y., Tang, C., & Chen, J. (2023). SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 1764–1774). https://doi.org/ 10.1109/ CVPRW59228.2023.00177