IT Project Concept for Detecting and Automatically Correcting Spelling Errors in German-Language Texts

R. B. Fedchuk; V. A. Vysotska; Anastasia Hrushetska

This paper presents the concept and technical justification of a software product for automatic detection and correction of spelling errors in German-language texts. The relevance of the topic is due to the complexity of the German language in terms of grammar, spelling and word formation, which creates significant difficulties for speakers of other languages. This is especially true for those who study the language or use it in professional activities, where the accuracy of speech is crucial. The developed concept is focused on identifying typical spelling errors, including omissions, letter permutations and incorrectly used words. Instead of the classic use of large dictionary databases and search methods by editing distance, the project involves working with machine learning data on typical errors, which allows reducing resource requirements and ensuring faster text processing. This approach makes the system more mobile, autonomous and suitable for integration into educational or professional environments. During the development process, a comprehensive system analysis was carried out: a goal tree was built, a context diagram was formed to describe information connections, the main stages of the system life cycle were determined, and the logic of the work was modeled using a UML activity diagram. Particular attention was paid to the choice of methods and means of implementation, including the analysis of software tools and Python libraries that best meet the tasks of automatic correction of spelling errors. The possibilities of integrating machine learning algorithms to improve the system's adaptation to new types of errors were considered. The results of the work form the basis for creating an effective, compact and adaptive tool that can be expanded in the future - in particular, to detect grammatical errors, which will significantly increase the functionality of the system and its practical value.

Berg, K., Alfter, D., & Neef, M. (2024). Are some morphological units more prone to spelling variation than others? A case study using spontaneous handwritten data. Morphology, 34(2), 173–188.
Hansen-Schirra, S., Bartz, T., Rink, S., & Siever, C. (2020). Technologies for translation of specialised texts into easy language. In Easy Language Research (pp. 99–127). Berlin: Frank & Timme.
Heldmann, M., Rüsseler, J., Huestegge, L., & Münte, T. F. (2024). Event-related brain potentials to typing errors in transparent and intransparent German words. Neuroscience Research. https://doi.org/10.1016/j.neures.2024.06.001
Holz, H. (2020). Design, development, and evaluation of research tools for evidence-based learning: A digital game-based spelling training for German primary school children (Doctoral dissertation). Universität Tübingen.
Kuperman, V., Bertram, R., & Pollatsek, A. (2021). Prevalence of spelling errors affects reading behavior across languages. Journal of Experimental Psychology: General, 150(10), 1974–1990.
Lytvyn, V. et al (2023). Identification and correction of grammatical errors in Ukrainian texts based on machine learning technology. Mathematics, 11(4), 904. https://doi.org/10.3390/math11040904
Musk, N. (2021). “How do you spell that?”: Doing spelling in computer-assisted collaborative writing. In Classroom-based Conversation Analytic Research (pp. 103–131). Cham: Springer.
Suissa, O., Elmalech, A., & Zhitomirsky-Geffet, M. (2020). Toward the optimized crowdsourcing strategy for OCR post-correction. Aslib Journal of Information Management, 72(2), 179–197.
Vysotska, V. (2024). Modern state and prospects of information technologies development for natural language content processing. CEUR Workshop Proceedings, 3668, 198–234.
Vysotska, V. et al (2021). A comparative analysis for English and Ukrainian texts processing based on semantics and syntax approach. CEUR Workshop Proceedings, 3171, 311–356.