Information Technologies for Errors Correction in Ukrainian-Language Texts Based on Machine Learning

Rostyslav Fedchuk; Victoria Vysotska

The relevance of the research is due to the growing need to automate the processes of text analysis and correction, in particular for Ukrainian-language content, which is characterized by a wealth of morphological and syntactic structure. Due to the wide range of errors that can occur in texts, from spelling to contextual, there is an urgent need to create systems that can accurately identify errors and offer their correct corrections. The specificity of the Ukrainian language, including its grammatical complexity and multifacetedness, requires the adaptation of machine learning models to local features. The purpose of the research is to develop a mathematical model of a decision support system for identifying and correcting errors in Ukrainian-language texts. The task includes both the formalization and mathematical description of the text processing process, and the construction of a model with an orientation to the tasks of classification and text generation. Special attention is paid to the effective consideration of structural features specific to the Ukrainian language in order to increase the accuracy and productivity of the system. The research method is based on the construction of a mathematical model of error correction, which is presented as a context-aware text generation problem. The study used statistical methods and machine learning approaches. Special attention is paid to the formation of a training sample, which combines texts with real and artificial errors to ensure a balanced learning process. The correction module includes generation mechanisms based on contextual models capable of predicting the correct correction for erroneous tokens. Approaches to text vectorization are mathe- matically substantiated, taking into account the peculiarities of the morphology and syntax of the Ukrainian language. The constructed model is a universal basis for creating intelligent systems for automatic editing of Ukrainian-language text. As a result of the research, approaches to building an error correction model in Ukrainian-language texts are formulated and mathematically substantiated. The main result was the creation of an integrated system that uses contextual information to ensure high accuracy of error recognition and correction. The applied mathematical methods include probabilistic approaches and vector representation of tokens, which allows adapting the system to the peculiarities of the Ukrainian language with its high morphological and syntactic complexity. The formed basis of the model creates opportunities for scaling and further use in practical tasks, such as automatic text editing or improving the quality of content in the Ukrainian-speaking environment.

Bryant, C., Yuan, Z., Qorib, M. R., Cao, H., Ng, H. T., & Briscoe, T. (2023). Grammatical error correction: A survey of the state of the art. Computational Linguistics, 49(3), 643–701. doi:https://doi.org/10.48550/arXiv.2211.05166.
Brovinska, M. (2024). I waited eight years for Grammarly to support Ukrainian. Dev.ua. Retrieved from https://dev.ua/news/ai-servisy-1706885687
Fedchuk, R., & Vysotska, V. (2024). Current trends in the use of machine learning for error correction in Ukrainian texts. Qeios, Article ID N4VGBJ, 1–18. doi:https://doi.org/10.32388/n4vgbj
Grammarly Inc. (n.d.). About us. Retrieved from https://www.grammarly.com/about Grammarly. (n.d.). UA-GEC. Retrieved from https://github.com/grammarly/ua-gec
Huang, M., & Fan, R. (2025). Influence of translation errors on information perception in East Slavic languages (Ukrainian–Russian; Russian–Ukrainian). Zeitschrift für Slawistik, 70(1), 141–160. doi:https://doi.org/10.1515/slaw-2025-0006
Kholodna, N., & Vysotska, V. (2023). Technology for grammatical errors correction in Ukrainian text content based on machine learning methods. Radio Electronics, Computer Science, Control, 1, 114. doi:https://doi.org/10.15588/1607-3274-2023-1-12
LanguageTool. (n.d.). We believe that anyone can write beautifully and professionally. Retrieved from https://languagetool.org/about
LanguageTool Community. (n.d.). Error rules for LanguageTool. Retrieved from https://community. languagetool.org/rule/list?lang=uk
Lytvyn, V., Pukach, P., Vysotska, V., Vovk, M., & Kholodna, N. (2023). Identification and correction of grammatical errors in Ukrainian texts based on machine learning technology. Mathematics, 11(4), 904. doi:https://doi.org/10.3390/math11040904
NLP UK. (n.d.). LanguageTool API. GitHub. Retrieved from https://github.com/brown-uk/nlp_uk
Starko, V., Rysin, A., & Shvedova, M. (2021). Ukrainian text preprocessing in GRAC. In 2021 IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT) (Vol. 2). IEEE. doi:https://doi.org/10.1109/CSIT52700.2021.9648705
Syvokon, O., & Nahorna, O. (2021). UA-GEC: Grammatical error correction and fluency corpus for the Ukrainian language. arXiv. doi: https://doi.org/10.48550/arXiv.2103.16997