Comprehensive Analysis of Few-shot Image Classification Method Using Triplet Loss

Mykola Baranov; Yurii Shcherbyna

Image classification task is a very important problem of a computer vision area. The first approaches to image classification tasks belong to a classic straightforward algorithm. Despite the successful applications of such algorithms a lot of image classification tasks had not been solved until machine learning approaches were involved in a computer vision area. An early successful result of machine learning applications helps researchers with extracted features classification which was not available without machine learning models. But handcrafter features were required which left the most complicated classification task impossible to solve. Recent success in deep learning allows researchers to implement automatic trainable feature extraction. This gave significant progress in the computer vision area last but not least. Processing large-scale datasets bring researchers great progress in automatic feature extraction thus combining such features with precious approaches led to groundbreaking in computer vision. But a new limitation has come - dependency on large amounts of data. Deep learning approaches to image classification task usually requires large-scale datasets. Moreover, modern models lead to unexpected behavior in distribution datasets. A few-shot learning approach of deep learning models allows us to dramatically reduce the amount of required data while keeping the same promising results. Despite reduced datasets, there is still a tradeoff between the amount of available data and trained model performance. In this paper, we implemented a siamese network based on triplet loss. Then, we investigate a relationship between the amount of available data and few-shot model performances. We compare the models obtained by metric-learning with baselines models trained using large-scale datasets.

Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence, (6), 679-698. https://doi.org/10.1109/TPAMI.1986.4767851
Said, K. A. M., Jambek, A. B., & Sulaiman, N. (2016). A study of image processing using morphological opening and closing processes. International Journal of Control Theory and Applications, 9(31), 15-21. https://doi.org/10.1109/ICED.2016.7804697
Ye, H. J., Ming, L., Zhan, D. C., & Chao, W. L. (2021). Few-shot learning with a strong teacher. arXiv preprint arXiv:2107.00197. https://doi.org/10.1109/TPAMI.2022.3160362
Hoffer, E., & Ailon, N. (2015, October). Deep metric learning using triplet network. In International workshop on similarity-based pattern recognition (pp. 84-92). Springer, Cham. https://doi.org/10.1007/978-3-319-24261-3_7
Li, X., Wei, T., Chen, Y. P., Tai, Y. W., & Tang, C. K. (2020). Fss-1000: A 1000-class dataset for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2869-2878). https://doi.org/10.1109/CVPR42600.2020.00294
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee. https://doi.org/10.1109/CVPR.2009.5206848
Xuan, H., Stylianou, A., & Pless, R. (2020). Improved embeddings with easy positive triplet mining. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2474-2482). https://doi.org/10.1109/WACV45572.2020.9093432