Comparative analysis of the use of instructions for language models and automated metrics for assessing the quality of images generated by GAN models

This study explores the potential applications of language AI models in combination with Generative Adversarial Networks (GANs) for generating images based on textual descriptions derived from literary works.  The effectiveness of various prompt types used to create abstractions was analyzed, and a comparative evaluation of the performance of leading contemporary image generation models – MidJourney, DALL-E, and Stable Diffusion – was conducted.  The results indicate that, while language models are capable of producing meaningful abstractions that partially reflect the content of the text, current GAN models do not yet provide the necessary level of semantic correspondence and visual realism.  MidJourney demonstrated the highest performance, with DALL-E trailing by only 0.69%, and Stable Diffusion by 11.7%.  The evaluation results highlighted the superiority of prompts relying on the generation of generalized abstractions, with the prompt that fully delegated abstraction generation to the language model outperforming the others by 4.52% to 11.18%.  In contrast, automated evaluation metrics such as CLIPScore and Inception Score proved inadequate for this specific task.  Furthermore, the study discusses the limitations of current GAN training approaches based on "keyword–image" pairs and substantiates the need to enhance generation methods through the use of comprehensive textual descriptions.

  1. Niland A.  Picture Books and Young Learners' Reading Identities.  Read Teach.  74 (5), 649–654 (2021).
  2. Niland A.  Picture Books, Imagination and Play: Pathways to Positive Reading Identities for Young Children.  Education Sciences.  13 (5), 511 (2023).
  3. Ghazanfari M., Ziaee M., Sharifianfar E.  The Impact of Illustrations on Recall of Short Stories.  Procedia – Social and Behavioral Sciences.  98, 572–579 (2014).
  4. O'Keefe E. J., Solman R. T.  The Influence of Illustrations on Children's Comprehension of Written Stories.  Journal of Reading Behavior.  19 (4), 353–377 (1987).
  5. Willett A.  The Effect of Text Illustrations on Young Children's Vocabulary Acquisition and Construction of Meaning During Storybook Read Alouds.  (2006).
  6. Wasylenky K., Tapajna N.  The effect of positive and negative illustrations on text recall.  University of Ottawa.  105–112 (2001).
  7. Vavra K. L., Janjic-Watrich V., Loerke K., Phillips L. M., Norris S. P., Macnab J.  Visualization in science education.  Alberta Science Education Journal.  41 (1), 22–30 (2011).
  8. Hockley W. E.  The picture superiority effect in associative recognition.  Memory & Cognition.  36, 1351–1359 (2008).
  9. Carney R. N., Levin J. R.  Pictorial Illustrations Still Improve Students' Learning from Text.  Educational Psychology Review.  14, 5–26 (2002).
  10. How This AI Image Won a Major Photography Competition.  https://www.scientificamerican.com/article/how-my-ai-image-won-a-major-p....
  11. AI image wins top prize in photography contest...again. https://www.diyphotography.net/ai-image-wins-top-prize-in-photography-co....
  12. Lu Z., Huang D., Bai L., Liu X., Qu J., Ouyang W.  Seeing is not always believing: A Quantitative Study on Human Perception of AI-Generated Images.  Preprint  arXiv:2304.13023 (2023).
  13. Haase J., Djurica D., Mendling.  The Art of Inspiring Creativity: Exploring the Unique Impact of AI- generated Images.  AMCIS 2023 Proceedings.  10 (2023).
  14. Aktay S.  The usability of Images Generated by Artificial Intelligence (AI) in Education.  International Journal of Technology in Education.  6 (2), 51–62 (2022).
  15. Lee Y. H., Chiu C. Y.  The Impact of AI Text-to-Image Generator on Product Styling Design.  HCII 2023. Lecture Notes in Computer Science.  14015 (2023).
  16. Mookherjee S., Dutta S., Maschatak D., Chakraborty S., Sinha A.  Comparative Analysis of Pretrained Text to Image Models for Accurate Radiological Image Generation for a Single Text Prompt.  JMIR Preprints. 20/07/2023:51099 (2023).
  17. Abdallah Y., Estévez A.  Biomaterials Research-Driven Design Visualized by AI Text-Prompt-Generated Images.  Design.  7 (2), 48 (2023).
  18. Göring S., Ramachandra Rao R. R., Merten R., Raake A. Analysis of Appeal for Realistic AI-Generated Photos.  IEEE Access.  11, 38999–39012 (2023).
  19. Zecca P. A., Reguzzoni M., Brambilla A., Protasoni M., Borgese M., Raspanti M.  The Dark Side of Artificial Intelligence: The Possible Risk of Falsifying Images for Scientific Articles.  Microscopy and Microanalysis.  29 (5), 1688–1693 (2023).
  20. Sarhan H., Hegelich S.  Understanding and evaluating harms of AI-generated image captions in political images.  Frontiers in Political Science.  5, 1245684 (2023).
  21. Images: Evaluating Images.  https://guides.lib.uw.edu/c.php?g=344258&p=2318783.
  22. Salimans T., Goodfellow I., Zaremba W., Cheung V., Radford A., Chen X.  Improved Techniques for Training GANs.  Preprint arXiv:1606.03498 (2016).
  23. Barratt S., Sharma R.  A Note on the Inception Score.  Preprint arXiv:1801.01973 (2018).
  24. Heusel M., Ramsauer H., Unterthiner T., Nessler B., Hochreiter S.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium.  Preprint arXiv:1706.08500 (2017).
  25. Zhang R., Isola P., Efros A., Shechtman E., Wang O.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric.  Preprint arXiv:1801.03924 (2018).
  26. Wang Z., Bovik A., Sheikh H., Simoncelli E.  Image quality assessment: from error visibility to structural similarity.  Image Processing, IEEE Transactions.  13 (4), 600–612 (2004).
  27. Wang Z., Simoncelli E., Bovik A.  Multiscale structural similarity for image quality assessment.  The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.  2, 1398–1402 (2003).
  28. Erfurt J., Helmrich C. R., Bosse S., Schwarz H., Marpe D., Wiegand T.  A Study of the Perceptually Weighted Peak Signal-To-Noise Ratio (WPSNR) for Image Compression.  2019 IEEE International Conference on Image Processing (ICIP).  2339–2343 (2019).
  29. Kynkäänniemi T., Karras T., Laine S., Lehtinen J., Aila T.  Improved Precision and Recall Metric for Assessing Generative Models.  Preprint arXiv:1904.06991 (2019).
  30. Hessel J., Holtzman A., Forbes M., Bras R., Yejin C.  CLIPScore: A Reference-free Evaluation Metric for Image Captioning.  Preprint  arXiv:2104.08718 (2021).
  31. Singh J., Zheng L.  Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback.  Preprint arXiv:2307.04749 (2023).
  32. Yakymiv V. S., Piskozub Y. Z.  Research on the use of AI for Selecting Abstractions for Natural Language Image Generation Tools.  International Journal of Computing.  23 (4), 637–654 (2024).
  33. Yakymiv V. S., Piskozub Y. Z.  Comparison of the use of AI services based on general natural language for generating images for fiction.  Mathematical Modeling and Computing.  12 (1), 283–298 (2025).