The article examines data augmentation methods in the task of image recognition, specifically introducing the exponential augmentation approach to enhance the performance of deep neural networks, particularly YOLO, in object detection tasks. The proposed methodology is based on the sequential and repeated application of various transformations, including horizontal and vertical flipping, 90° rotation, Gaussian Blur, brightness and contrast adjustment. This approach ensures exponential dataset growth and significantly increases the diversity of training data, which is critical for improving the model’s generalization capability. Experimental results demonstrate that applying exponential augmentation leads to a significant improvement in detection performance, as indicated by increased mean Average Precision (mAP), Precision, and Recall, even when the initial dataset is limited. Additionally, the integration of the proposed approach with other effective augmentation techniques, such as Mosaic and MixUp, has been explored. The results indicate that combining exponential augmentation with these methods leads to more robust models that can better recognize objects under various lighting conditions, viewpoints, and noise levels. Beyond accuracy analysis, the study also investigates the impact of exponential augmentation on training stability, including the convergence speed of gradient descent and resistance to overfitting. It is shown that multiple data enrichment cycles allow neural networks to adapt more efficiently to challenging conditions and reduce the likelihood of memorizing only specific examples from the training set. The proposed method can be particularly useful in computer vision tasks with limited or imbalanced datasets, as well as in scenarios where improving model accuracy is required without significantly increasing computational costs. The obtained results confirm that exponential augmentation is a promising approach for enhancing the performance of YOLO and other modern object detectors in complex image recognition scenarios.
- Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint. doi: 10.48550/arXiv.2004.10934
- Buslaev, A., Iglovikov, V. I., Khvedchenya, E., Parinov, A., Druzhinin, M., & Seferbekov, S. (2020). Albumentations: Fast and flexible image augmentations. Information, 11(2), 125. doi: 10.3390/info11020125
- Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2019a). AutoAugment: Learning augmentation policies from data. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 113–123. doi: 10.1109/CVPR.2019.00020
- Cubuk, E. D., Zoph, B., Mané, D., Vasudevan, V., & Le, Q. V. (2019b). AutoAugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 113–123). doi: 10.1109/CVPR.2019.00020
- Ghiasi, G., Cui, Y., Qian, R., Lin, T. Y., & Le, Q. V. (2021). Simple copy-paste is a strong data augmentation method for instance segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2918–2928. doi: 10.1109/CVPR46437.2021.00293
- Luo, P., Zhu, Z., Liu, Z., Wang, X., & Tang, X. (2016). Face model compression by distilling knowledge from neurons. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). Retrieved from https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12311
- Mumuni, A., & Mumuni, F. (2022). Data augmentation: A comprehensive survey of modern approaches. Array, 16, 100258. doi: 10.1016/j.array.2022.100258
- Myshkovskyi, Y., & Nazarkevych, M. (2024). Method of fingerprint identification based on convolutional neural networks. SISN, 15, 1–14. doi: 10.23939/sisn2024.15.001
- Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788. doi: 10.1109/CVPR.2016.91
- Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6(1), 60. doi: 10.1186/s40537-019-0197-0
- Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning (ICML), 6105–6114. Retrieved from http://proceedings.mlr.press/v97/tan19a.html
- Wang, C. Y., Bochkovskiy, A., & Liao, H. Y. M. (2021). Scaled-YOLOv4: Scaling cross stage partial network. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13029–13038. doi: 10.1109/CVPR46437.2021.01283
- Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019). CutMix: Regularization strategy to train strong classifiers with localizable features. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 6023–6032. doi: 10.1109/ICCV.2019.00612
- Zhang, H., Cisse, M., D auphin, Y. N., & Lopez-Paz, D. (2018). mixup: Beyond empirical risk minimization. International Conference on Learning Representations (ICLR). Retrieved from https://openreview.net/forum?id=r1Ddp1-Rb
- Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2020). Random Erasing Data Augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7), 13001–13008. doi: 10.1609/aaai.v34i07.7000