Numerical Optimization Method for Clustering in Content-Based Image Retrieval Systems

2025;
: pp. 30 - 46
1
Kharkiv National University of Radio Electronics, Software Engineering Department, Ukraine
2
Kharkiv National University of Radio Electronics, Software Engineering Department, Ukraine

The object of the study is the process of organizing a descriptor repository in content-based image retrieval systems. The subject of the study is a method of numerical optimization of descriptor clustering in a multidimensional space. The aim of this work is to develop a clustering optimization method in the Multidimensional Cube model to improve search efficiency. The core idea is to ensure a more uniform distribution of descriptors across clusters by adjusting interval boundaries in each dimension, which reduces imbalance in cluster density and improves retrieval performance. The research methodology combines analytical determination of the number of clusters with numerical optimization of descriptor distribution across intervals. The proposed method, named the Dimension Intervals Numeric Optimization Algorithm, has been implemented in two variants: one for deployment in an external relational database and another for deployment in main memory. Theoretical analysis of computational complexity demonstrated that the proposed approach does not require multiple iterations, unlike the competing methods considered for comparison, namely k-means and the Inverted Multi-Index, and it has lower asymptotic complexity. Experimental evaluation was carried out on a dataset of image descriptors. The results showed that k-means provides the highest clustering quality in terms of cluster density, but requires significantly more time. The proposed method in the main-memory variant demonstrated the best balance between clustering quality and execution time, approaching the quality of the Inverted Multi- Index while outperforming it in runtime. The external-database variant proved slower due to query-processing overheads but is appropriate for scalable systems with centralized data repositories. The conclusion is that applying the developed numerical optimization method enables a more uniform distribution of descriptors across clusters and reduces imbalance in their density.

  1. Ai, L., Cheng, H., Wang, X., Chen, C., Liu, D., Zheng, X., & Wang, Y. (2022). Approximate Nearest Neighbor Search Using Enhanced Accumulative Quantization. Electronics, 11(14), 2236. https://doi.org/10.3390/- electronics11142236.
  2. Alsmadi, M. K. (2020). Content-Based Image Retrieval Using Color, Shape and Texture Descriptors and Features. Arabian Journal for Science and Engineering, 45(4), 3317–3330. https://doi.org/10.1007/s13369-020- 04384-y.
  3. Babenko, A., & Lempitsky, V. (2012). The inverted multi-index. 2012 IEEE Conference on Computer Vision and Pattern Recognition, 3069–3076. IEEE. Retrieved from https://doi.org/10.1109/cvpr.2012.6248038.
  4. Badshah, A., Daud, A., Alharbey, R., Banjar, A., Bukhari, A., & Alshemaimri, B. (2024). Big data applications: overview, challenges and future. Artificial Intelligence Review, 57(11). https://doi.org/10.1007/s10462-024- 10938-5.
  5. Bano, S., & Khan, M. N. A. (2018). A Survey of Data Clustering Methods. International Journal of Advanced Science and Technology, 113, 133–142. https://doi.org/10.14257/ijast.2018.113.14.
  6. Chembian, W. T., Senthilkumar, G., Prasanth, A., & Subash, R. (2024). K-means Pelican Optimization Algorithm based Search Space Reduction for Remote Sensing Image Retrieval. Journal of the Indian Society of Remote Sensing, 53(1), 101–115. https://doi.org/10.1007/s12524-024-01994-z.
  7. Chen, Y., Long, Y., Yang, Z., & Long, J. (2025). Unsupervised random walk manifold contrastive hashing for multimedia retrieval. Complex & Intelligent Systems, 11(4). https://doi.org/10.1007/s40747-025-01814-y.
  8. Clissa, L., Lassnig, M., & Rinaldi, L. (2023). How big is Big Data? A comprehensive survey of data production, storage, and streaming in science and industry. Frontiers in Big Data, 6. https://doi.org/10. 3389/fdata.2023.1271639.
  9. COCO. Retrieved from Common Objects in Context website: https://cocodataset.org/#home.
  10. Danylenko, S., & Smelyakov, S. (2025). Development of a multidimensional data model for efficient content-based image retrieval in big data storage. Radioelectronic and Computer Systems, 2025(1), 137–152. https://doi.org/10.32620/reks.2025.1.10.
  11. facebookresearch. GitHub - facebookresearch/faiss: A library for efficient similarity search and clustering of dense vectors. Retrieved February 9, 2025, from GitHub website: https://github.com/facebookresearch/faiss.
  12. Ge, T., He, K., Ke, Q., & Sun, J. (2013). Optimized Product Quantization for Approximate Nearest Neighbor Search. 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. Retrieved from https://doi.org/10.1109/cvpr.2013.379.
  13. Gupta, D., Loane, R., Gayen, S., & Demner-Fushman, D. (2023). Medical image retrieval via nearest neighbor search on pre-trained image features. Knowledge-Based Systems, 278, 110907. https://doi.org/10.1016/ j.knosys.2023.110907.
  14. Jatakia, V., Korlahalli, S., & Deulkar, K. (2017). A survey of different search techniques for big data. 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), 1–4. IEEE. Retrieved from https://doi.org/10.1109/iciiecs.2017.8275939.
  15. Jégou, H., Douze, M., & Schmid, C. (2011). Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128. https://doi.org/10.1109/tpami.2010.57.
  16. Jiang, X., & Hu, F. (2024). Multi-scale Adaptive Feature Fusion Hashing for Image Retrieval. Arabian Journal for Science and Engineering. https://doi.org/10.1007/s13369-024-09627-w.
  17. Li, X., Yang, J., & Ma, J. (2021). Recent developments of content-based image retrieval (CBIR). Neurocomputing, 452, 675–689. https://doi.org/10.1016/j.neucom.2020.07.139.
  18. Liu, J., Zhao, M., & Zhan, C. (2024). Deep Representation-Based Fuzzy Graph Model for Content-Based Image Retrieval. International Journal of Fuzzy Systems, 26(6), 2011–2022. https://doi.org/10.1007/s40815-024- 01682-7.
  19. Malkov, Y. A., & Yashunin, D. A. (2020). Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 824–836. https://doi.org/10.1109/tpami.2018.2889473.
  20. RezaAbbasifard, M., Ghahremani, B., & Naderi, H. (2014). A Survey on Nearest Neighbor Search Methods.International Journal of Computer Applications, 95(25), 39–52. https://doi.org/10.5120/16754-7073.
  21. Tiwari, V. R. (2023). Developments in KD Tree and KNN Searches. International Journal of Computer Applications, 185(17), 17–23. https://doi.org/10.5120/ijca2023922879.
  22. Vopson, M. M. (2020). The information catastrophe. AIP Advances, 10(8). https://doi.org/10.1063/5.0019941. Wu, Q., Yu, Y., Zhou, L., Lu, Y., Chen, H., & Qian, X. (2023). Storage and Query Indexing Methods on Big Data.
  23. Arabian  Journal  for  Science  and  Engineering,  49(5),  7359–7374.  https://doi.org/10.1007/s13369-023- 08175-z.