Contrastive language-image pre-training (CLIP) in e-commerce: applications, methodologies, and performance

Oleksandr Khainas; Nataliia Melnykova; Solomiia Fedushko

This article thoroughly examines the architecture and applications of the Contrastive Language-Image Pre-training (CLIP) model within the e-commerce domain, focusing on key tasks such as visual search, product recommendation, and attribute extraction. The article also provides an in-depth analysis of the methodologies used for CLIP’s adaptation to e-commerce tasks and the relevant datasets employed. By highlighting the unique capabilities of the CLIP model, such as its ability to perform zero-shot learning and contrastive pre-training, this article underscores its potential impact on the industry while also acknowledging its limitations, including the ‘domain gap’ and the need for adaptation strategies. Furthermore, the article explores the future research directions for enhancing CLIP’s performance in specialized e-commerce contexts and compares it with other traditional and multimodal AI techniques.

Product Recommendation

Czerwinska, U., Bircanoglu, C., & Chamoux, J. (2025). Benchmarking Image Embeddings for E-Commerce: Evaluating Off-the Shelf Foundation Models, Fine-Tuning Strategies and Practical Trade-offs [Preprint]. arXiv.
Hendriksen, M., Bleeker, M., Vakulenko, S., Van Noord, N., Kuiper, E., & De Rijke, M. (2022, April). Extending CLIP for Category-to-image Retrieval in E-commerce. In European Conference on Information Retrieval (pp. 289-303).https://doi.org/10.1007/978-3-030-99736-6_20
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021, July). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763).
Tóth, S., Wilson, S., Tsoukara, A., Moreu, E., Masalovich, A., & Roemheld, L. (2024). End-to-end multi-modal product matching in fashion e-commerce. arXiv preprint arXiv:2403.11593.
Ling, X., Peng, B., Du, H., Zhu, Z., & Ning, X. (2024).Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data. arXiv preprint arXiv:2410.17337.
Ma, H., Zhao, H., Lin, Z., Kale, A., Wang, Z., Yu, T., Gu, J., Choudhary, S., & Xie, X. (2022). EI-CLIP: Entity- Aware Interventional Contrastive Learning for E- Commerce Cross-Modal Retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 18051-18061).https://doi.org/10.1109/CVPR52688.2022.01752
Lin, J., Du, P., Liu, J., Li, W., Yu, Y., Zhang, W., & Cao, Y. (2025). Sell It Before You Make It: Revolutionizing E-Com- merce with Personalized AI-Generated Items [Preprint]. arXiv.
Gong, J., Cheng, M., Shen, H., Vandenbussche, P.-Y., Jenq, J., & Eldardiry, H. (2025). Visual Zero-Shot E-Commerce Product Attribute Value Extraction [Preprint]. arXiv.https://doi.org/10.18653/v1/2025.naacl-industry.38
Khandelwal, A., Mittal, H., Kulkarni, S. S., & Gupta, D. (2023). Large Scale Generative Multimodal Attribute Ex- traction for E-commerce Attributes. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Vol. 5: Industry Track) (pp. 305-312).https://doi.org/10.18653/v1/2023.acl-industry.29
Jia, Q., Liu, Y., Xu, S., Liu, H., Wu, D., Fu, J., Vollgraf, R., & Wang, B. (2023). KG-FLIP: Knowledge-guided Fashion-domain Language-Image Pre-training for E- commerce. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track) (pp. 81-88https://doi.org/10.18653/v1/2023.acl-industry.9
Hu, J., Gong, J., Shen, H., & Eldardiry, H. (2025, April). Hypergraph-based Zero-shot Multi-modal Product Attribute Value Extraction. In Proceedings of the ACM on Web Conference 2025 (pp. 4853-4862).https://doi.org/10.1145/3696410.3714714
Cheng, Z., Zhang, W., Chou, C. C., Jau, Y. Y., Pathak, A., Gao, P., & Batur, U. (2024, November). E-commerce product categorization with LLM-based dual-expert classification paradigm. In Proceedings of the 1st Workshop on Customizable NLP: (CustomNLP4U) (pp. 294-304).https://doi.org/10.18653/v1/2024.customnlp4u-1.22