Comparative analysis of the use of instructions for language models and automated metrics for assessing the quality of images generated by GAN models

V. S. Yakymiv; Y. Z. Piskozub; N. R. Oliiarnyk

This study explores the potential applications of language AI models in combination with Generative Adversarial Networks (GANs) for generating images based on textual descriptions derived from literary works. The effectiveness of various prompt types used to create abstractions was analyzed, and a comparative evaluation of the performance of leading contemporary image generation models – MidJourney, DALL-E, and Stable Diffusion – was conducted. The results indicate that, while language models are capable of producing meaningful abstractions that partially reflect the content of the text, current GAN models do not yet provide the necessary level of semantic correspondence and visual realism. MidJourney demonstrated the highest performance, with DALL-E trailing by only 0.69%, and Stable Diffusion by 11.7%. The evaluation results highlighted the superiority of prompts relying on the generation of generalized abstractions, with the prompt that fully delegated abstraction generation to the language model outperforming the others by 4.52% to 11.18%. In contrast, automated evaluation metrics such as CLIPScore and Inception Score proved inadequate for this specific task. Furthermore, the study discusses the limitations of current GAN training approaches based on "keyword–image" pairs and substantiates the need to enhance generation methods through the use of comprehensive textual descriptions.

штучний інтелект

computing

AI-generated images

text-to-image generation