Comparison of the use of AI services based on general natural language for generating images for fiction

The article describes a method of image generation with Artificial Intelligence services Dall-e, MidJourney and Stable Diffusion using text abstraction retrieved with Artificial Intelligence services ChatGPT, Claude, Copilot, PI, Gemini, that work with natural language.  The implementation of the new approach gives a significant gain in image quality and consistency with analysed text.  The methodology is based on using neural network API services instead of commonly used natural language algorithms to extract keywords or sentences. Proposed evaluation is applied to the generated images.  An analysis of evaluation options is carried out depending on Artificial Intelligence service, based on the tested book, length of result abstract, number of errors for each type and number of times AI service can understand tested book title out of proposed abstraction.

  1. Park D. H., Azadi S., Liu X., Darrell T., Rohrbach A.  Benchmark for Compositional Text-to-Image Synthesis.  NeurIPS Datasets and Benchmarks (2021).
  2. Brade S., Wang B., Sousa M., Oore S., Grossman T.  Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models.  UIST '23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 96, 1–14 (2023).
  3. Mansimov E., Parisotto E., Ba L. J., Salakhutdino R.  Generating Images from Captions with Attention.  Preprint arXiv:1511.02793 (2016).
  4. Xinyue S., Yiting Q., Michael B., Yang Z.  Prompt Stealing Attacks Against Text-to-Image Generation Models.  Preprint arXiv:2302.09923 (2023).
  5. Yakymiv V., Piskozub L., Piskozub, Y.  Using Artificial Intelligence to Generate Real-time Augmented Reality Content in Book Publishing.  International Journal of Control Systems and Robotics.  8, 1–5 (2023).
  6. Liu S.  Text-To-Image Generation (2019).
  7. Reed S., Akata Z., Logeswaran L., Schiele B., Lee H.  Generative Adversarial Text to Image Synthesis.  Preprint arXiv:1605.05396 (2016).
  8. Zhu X., Goldberg B. A., Eldawy M., Dyer C. R., Strock B.  A text-to-picture synthesis system for augmenting communication.  Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence.  1590–1595 (2007).
  9. Mihalcea R., Tarau P.  TextRank: Bringing Order into Text.  Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.  404–411 (2004).
  10. Turney P. D.  Learning Algorithms for Keyphrase Extraction.  Information Retrieval.  2, 303–336 (2000).
  11. Sanghyuck N., Mirae D., Kyeonah Y., Juntae K.  Realistic Image Generation from Text by Using BERT-Based Embedding.  Electronics.  11 (5), 764 (2022).
  12. Mathesul S., Bhutkar G., Rambhad A.  AttnGAN: Realistic Text-to-Image Synthesis with Attentional Generative Adversarial Networks.  Sense, Feel, Design.  397–403 (2022).
  13. Yutong X., Zhaoying P., Jinge M., Jie L., Qiaozhu M.  A Prompt Log Analysis of Text-to-Image Generation System.  WWW '23: Proceedings of the ACM Web Conference 2023.  3892–3902 (2023).
  14. Fan L., Wang H., Zhang K., Pei Z., Li A.  Towards an Automatic Prompt Optimization Framework for AI Image Generation.  HCI International 2023 Posters.  405–410 (2023).
  15. Liu V., Chilton L.  Design Guidelines for Prompt Engineering Text-to-Image Generative Models.  CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems.  384, 1–23 (2022).
  16. Frolov S., Hinz T., Raue F., Hees J., Dengel A.  Adversarial text-to-image synthesis: A review.  Neural Networks.  144, 187–209 (2021).
  17. Hao Y., Chi Z., Dong L., Wei F.  Optimizing Prompts for Text-to-Image Generation.  Preprint arXiv:2212.09611 (2022).
  18. Xingqian X., Jiayi G., Zhangyang W., Gao H., Irfan E., Humphrey S.  Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models.  Preprint arXiv:2305.16223 (2023).
  19. Yakymiv V., Piskozub Y.  Research on the use of AI for Selecting Abstractions for Natural Language Image Generation Tools.  International Journal of Computing.   23 (4), 637–654 (2024).