UNDERSTANDING LARGE LANGUAGE MODELS: THE FUTURE OF ARTIFICIAL INTELLIGENCE

I. Yu. Yurchak; A. O. Khich; Vira Oksentyuk

The article examines the newest direction in artificial intelligence - Large Language Models, which open a new era in natural language processing, providing the opportunity to create more flexible and adaptive systems. With their help, a high level of understanding of the context is achieved, which enriches the user experience and expands the fields of application of artificial intelligence. Large language models have enormous potential to redefine human interaction with technology and change the way we think about machine learning. An overview of the historical development of large language models is carried out, leading companies engaged in scientific research and development of effective systems are indicated. Information is provided regarding the internal structure and representation of knowledge in models. The main principles of learning are highlighted: data collection and their pre-processing, selection of an appropriate neural network architecture used in large language models. It is noted that the greatest progress has been achieved using the Transformer neural network, which is based on the mechanism of attention. The steps that significantly contribute to training, post-training, and optimizing the speed of training are highlighted. To evaluate the effectiveness and quality of language models, various metrics are used, which depend on the task to be solved. However, despite their advantages, large language models today are not without problems. The possibility of generating false information, fabricated facts, and unethical remarks presents a challenge for researchers and developers. It is important to continue work on increasing the responsibility of models, develop effective content filtering methods, and improve learning mechanisms. Understanding these problems and finding solutions to them are key steps towards building more efficient and reliable large language models. Openness, collective participation and dialogue between society, the scientific community and developers are becoming an integral part of ensuring the sustainable development of this technology

Large Language Models

machine learning

deep learning

data set

Transformer Neural Network

Prompt Engineering

[1] Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu. A Survey on Evaluation of Large Language Models [Online] URL: https://dl.acm.org/doi/pdf/10.1145/3641289 (Accessed: 02/05/2024).

[2] Large Language Models powered by world-class Google AI [Online] URL: https://cloud.google.com/ai/llms

[3] OpenAI Large Language Models [Online] URL: https://platform.openai.com/docs/models/ (Accessed: 02/05/2024).

[4] AI history: the Dartmouth Conference. [Online] URL: https://www.klondike.ai/en/ai-history-the-dartmouth-conference/, (Accessed: 02/05/2024).

[5] A Very Gentle Introduction to Large Language Models without the Hype [Online] URL: https://mark-riedl.medium.com/a-very-gentle-introduction-to-large-langua..., (Accessed: 02/05/2024).

[6] Enkelejda Kasneci, Kathrin Sessler, Stefan Küchemann. ChatGPT for good? On opportunities and challenges of large language models for education, Learning and Individual Differences, Volume 103, 2023, 102274, ISSN 1041-6080, https://doi.org/10.1016/j.lindif.2023.102274.

[7] Jiaqi Wang, Zhengliang Liu, Lin Zhao, Review of large vision models and visual prompt engineering, Meta-Radiology, Volume 1, Issue 3, 2023, 100047, ISSN 2950-1628, https://doi.org/10.1016/j.metrad.2023.100047.

[8] Usman Naseem, Imran Razzak, Shah Khalid Khan, Mukesh Prasad. A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models. ACM Transactions on Asian and Low-Resource Language Information ProcessingVolume 20Issue 5Article No.:74 pp.1–35 https://doi.org/10.1145/3434237

[9] Jakob Uszkoreit. Transformer: A Novel Neural Network Architecture for Language Understanding. [Online] URL: https://blog.research.google/2017/08/transformer-novel-neural-network.html (Accessed: 02/05/2024).

[10] Tamkin, A., Brundage, M., Clark, J., & Ganguli, D. (2021). Understanding the capabilities, limitations, and societal impact of large language models. arXiv preprint arXiv:2102.02503. https://doi.org/10.48550/arXiv.2102.02503.