Information system for converting audio in ukrainian language into its textual representation using nlp methods and machine learning

Yurii Tyshchuk; Victoria Vysotska; Olha Vlasenko

Speech recognition involves various models, methods and algorithms for analysing and processing the user’s recorded voice. This allows people to control different systems that support one type of speech recognition. A speech-to-text conversion system is a type of speech recognition that uses spoken data for further processing. It also provides several stages for processing an audio file, which uses electroacoustic means, filtering algorithms in the audio file to isolate relevant sounds, electronic data arrays for the selected language, as well as mathematical models that make up the most likely words from phonemes. Thanks to the conversion of speech to text, people whose professions are closely related to typing a large amount of text on the keyboard, significantly speed up and facilitate the work process, as well as reduce the amount of stress. In addition, such systems help businesses, because the concept of remote work is becoming more and more popular, and therefore companies need tools to record and systematize meetings in the form of written text. The object of the research is the process of converting the Ukrainian-language text into a written one based on NLP and machine learning methods. The subject of the research is file processing algorithms for extracting relevant sounds and recognizing phonemes, as well as mathematical models for recognizing an array of phonemes as specific words. The purpose of the work is to design and develop an information system for converting audio Ukrainian-language text into written text based on the Ukrainian Speech-to-text Web application, which is a technology for accurate and easy analysis of Ukrainian-language audio files and their subsequent transcription into text. The application supports downloading files from the file system and recording using the microphone, as well as saving the analysed data. The article also describes the stages of design and the general typical architecture of the corresponding system for converting audio Ukrainian-language text into written text. According to the results of the experimental testing of the developed system, it was found that the number of words does not affect the accuracy of the conversion algorithm, and the decrease in percentage is not large and occurred due to the complexity of the words and the low quality of the microphone, and therefore the recorded file.

Dragan, Ya., Dzhychka, N. (2010). Detection of voice pathology on the basis of statistical processing of vocal Ukrainian announcers. Bulletin of Lviv Polytechnic National University, No. 686, 250–254.
Tymoshenko, K., Vysotska, V., Kovtun, O., Holoshchuk, R., Holoshchuk. S. (2021). Real-time Ukrainian text recognition and voicing. CEUR Workshop Proceedings, No. 2870, 357–387.
Tymoshenko, K. Vysotska, V. (2020). Algorithm of Text Recognizing in Ukrainian on the Video Mode. Computational linguistics and intelligent systems: proceedings of the 4nd International conference, 23–24 April 2020, Lviv, Ukraine, 81–89.
Dmytriv, A., Vysotska, V., Bublyk, M. (2021). The Speech Parts Identification for Ukrainian Words Based on VESUM and Horokh Using. Computer Sciences and Information Technologies (CSIT): proceedings of the IEEE 16th International Conference, 22–25 Sept., Lviv, Ukraine. 2021, 21–33. DOI: 10.1109/CSIT52700.2021.9648813.
Dmytriv, A., Holoshchuk, S., Chyrun, L., Holoshchuk, R. (2022). Comparative Analysis of Using Different Parts of Speech in the Ukrainian Texts Based on Stylistic Approach. CEUR Workshop Proceedings, Vol. 3171, 546– 560.
Kubinska, S., Vysotska, V., Matseliukh, Y. (2021). User Mood Recognition and Further Dialog Support. Computer Sciences and Information Technologies (CSIT): proceedings of the IEEE 16th International Conference, 22– 25 Sept. 2021, Lviv, Ukraine, 34–39. DOI: 10.1109/CSIT52700.2021.9648610.
Kubinska, S., Holoshchuk, R., Holoshchuk, S., Chyrun, L. (2022). Ukrainian Language Chatbot for Sentiment Analysis and User Interests Recognition based on Data Mining. CEUR Workshop Proceedings, Vol. 3171, 315–327.
Dyriv, A., Andrunyk, V., Burov, Y., Karpov, I., Chyrun, L. (2021). The user’s psychological state identification based on Big Data analysis for person’s electronic diary. Computer science and information technologies: proceedings of IEEE 16th International conference on computer science and information technologies. Lviv, Ukraine, 22–25 September, 2021, 101–112. DOI: 10.1109/CSIT52700.2021.9648810.
Berko A., Matseliukh Y., Ivaniv Y., Chyrun L., Schuchmann V. (2021). The text classification based on Big Data analysis for keyword definition using stemming. Computer science and information technologies: proceedings of IEEE 16th International conference on computer science and information technologies. Lviv, Ukraine, 22–25 September, 2021, 184–188. DOI: 10.1109/CSIT52700.2021.9648764.
Aksonov, D., Gozhyj, A., Kalinina, I., Vysotska, V. (2021). Question-Answering Systems Development Based on Big Data Analysis. Computer Sciences and Information Technologies (CSIT): proceedings of the IEEE 16th International Conference, 22–25 Sept. 2021, Lviv, Ukraine, 113–118. DOI: 10.1109/CSIT52700.2021.9648631.
Lozytskyi, O. A. (2015). Applied software system for processing Ukrainian-language technical texts for people with visual impairments. Bulletin of Lviv Polytechnic National University, No. 832, 315–331.
Lozytskyi, O. A., Kunanets, N. E. (2014). A system for processing technical texts in the Ukrainian language with the aim of adapting them for people with visual impairments. Bulletin of Lviv Polytechnic National University, No. 805, 316–324.
Lozytskyi, O. A., Pasichnyk, V. V. (2010). Computer tools of educational processes for visually impaired people. Analytical review. Bulletin of Lviv Polytechnic National University, No. 673, 325–339.
Kunanets, N. E., Lozytskyi, O. A., Pasichnyk, V. V. (2011). Organization of educational and informational processes for people with visual impairments with the use of special. Innovative computer technologies in higher education: materials of the 3rd Scientific and Practical Conference, October 8–12, 2011, Lviv, 156–159.
Lozytskyi, O. A., Pasichnyk, V. V. (2010). Standards, structure and technology for creating “talking” books. Bulletin of Lviv Polytechnic National University, No. 689, 281–294.
Kunanets, N. E., Lozytskyi, O. A., Pasichnyk, V. V. (2016). Information technologies for voicing mathematical formulas in Ukrainian for people with visual impairments. Bulletin of Lviv Polytechnic National University, No. 843, 84–93.
Davydov, M. (2013). Synthesis of visible articulation of a virtual character from an audio stream for a sign language translation system. Bulletin of Lviv Polytechnic National University, No. 771, 94–100.
Krak Y. V., Lozinska O. V., Pasichnyk V. V., Ternov A. P., Shkilniuk, D. V. (2016). Mathematical methods and applied information technologies of modeling, translation and teaching for Ukrainian sign language. Bulletin of Lviv Polytechnic National University, No. 854, 210–227.
Chaban, V. (2007). Two touches to Ukrainian spelling. Bulletin of Lviv Polytechnic National University, No. 593, 103–105.
Kunanets, N. E., Malinovskyi, O. B. (2011). Information and multimedia product in libraries. Modern problems of library activity in the conditions of the information society: materials of the third scientific and practical conference, September 29, 2011, Lviv, 225–229.
Dovbysh, A., Alieksieiev, V. (2018). Embedding speech recognition tools for custom software: Engines Overview. Computational linguistics and intelligent systems : proceedings of the 2nd International conference, 25–27 June 2018, Lviv, Ukraine, 114–121.
Lobur, M., Romaniuk, A., Romanyshyn, M. (2012). Defining an approach for deep sentiment analysis of reviews in Ukrainian. Bulletin of Lviv Polytechnic National University, No. 747,124–130.
Romaniuk, A., Romanyshyn, M. (2013). Named-entity recognition for sentiment analysis of Ukrainian reviews. Bulletin of Lviv Polytechnic National University, No. 777, 83–86.
Kotsyba, N. (2013). Overview of the Ukrainian language resources within the multilingual European MULTEXT-East project. Bulletin of Lviv Polytechnic National University, No. 770, 122–129.
Palinska, O., Kaczala, O. (2013). Regional dialect of modern Lviv: language-contact processes. Humanities and social sciences: materials of the IV International Conference of Young Scientists HSS-2013, November 21–23, 2013, Lviv, Ukraine, 66–71.
Boiko, D. (2020). Using of Natural Language Processing in Chatbot. Computational linguistics and intelligent systems: proceedings of the 4nd International conference, 23–24 April 2020, Lviv, Ukraine, 410–415.
Basyuk, T. M., Vasylyuk, A. P. (2019). Promotion of Internet resources using voice search technologies. Bulletin of Lviv Polytechnic National University, No. 5, 3–13. DOI: 10.23939/sisn2019.01.003.
Shevchuk, R. P. (2013). Identification and execution of voice commands by personal mobile assistants using a production model of knowledge representation. Bulletin of Lviv Polytechnic National University, No. 773, 143–150.
Vasyltsov, I. V., Karpinsky, M. P., Kavka, P. B. (2003). The structure of the system of authentication of subjects by voice. Bulletin of Lviv Polytechnic National University, No. 471, 144–148.
Hnatyuk, M. (2013). Prevailing tendencies of North Lemkian resettled dialects in Western Ukraine: phonetic aspect. Humanities and social sciences: materials of the IV International Conference of Young Scientists HSS-2013, November 21–23, 2013, Lviv, Ukraine, 78–79.
Halych, Yu. (2012). Comparative analysis of modern speech recognition systems. 70th student scientific and technical conference: collection of theses of reports, October – November 2012, Lviv Polytechnic National University, 198–199.
Nyzhnyk, O., Burov, Y., Zavushchak, I. (2020). Intelligent Climate Control System in Office Space. Computational linguistics and intelligent systems : proceedings of the 4nd International conference, 23–24 April 2020, Lviv, Ukraine, 349–351.
Rashkevich, Yu., Szymanski, Z., Figura, R. (2010). Dynamics of changes in the durations of structural elements of Polish diphthongs at different pronunciation rates. Bulletin of Lviv Polytechnic National University. No. 672, 211–214.
Gadek, J. (2005). The database of emotional speech. Bulletin of Lviv Polytechnic National University. No. 534, 165–172.
Dacyshyn, H. (2018). Possibilities of direct speech in reproduction of oral speech in printed media text. Bulletin of Lviv Polytechnic National University, No. 896, 145–149.
Warren, E. (2018). The 44 Phonemes in English. URL: https://www.dyslexia-reading-well.com/44-phonemes- in-english.html.
The Past, Present, and Future of Speech-to-Text and AI Transcription (2022). URL: https://imerit.net/blog/the- past-present-and-future-of-speech-to-text-and-ai-transcription-all-una/.
Innovative Uses of Speech Recognition Today (2021). URL: https://summalinguae.com/language- technology/innovative-uses-of-speech-recognition/.
Tebelskis, J. (1995). Speech Recognition using Neural Networks. URL: https://isl.anthropomatik.kit.edu/pdf/Tebelskis1995.pdf.
Gupta, T. (2017). Deep Learning: Feedforward Neural Network. URL: https://towardsdatascience.com/deep- learning-feedforward-neural-network-26a6705dbdc7.
Recurrent Neural Networks (2022). URL: https://www.ibm.com/cloud/learn/recurrent-neural-networks.
Google Cloud Speech-to-text (2022). URL: https://cloud.google.com/speech-to-text.
IBM Cloud Watson Speech-to-text (2022). URL: https://www.ibm.com/cloud/watson-speech-to-text.
Microsoft Dictate. (2022). URL: https://www.microsoft.com/en-us/garage/profiles/dictate/.
Odrey (2022). URL: https://odreyapp.com/.
Kustovska, O. V. (2005). System approach methodology and scientific research. Ternopil: Economic thought.
Shershnyova, Z. E. (2004). Strategic management. Kyiv: KNEU. 221 p.
Shvydanenko, G., Revutska, N. (2013). Formation of the business model of the enterprise. Kyiv: KNEU.
StatCounter Global Stats (2022). Browser Market Share Worldwide Apr 2021 – Apr 2022. URL: https://gs.statcounter.com/browser-market-share.
Most used programming languages among developers worldwide, as of 2021 (2022). URL: https://www.statista.com/statistics/793628/worldwide-developer-survey-most-used-languages/.
Shan, P. (2014). Node.js – reasons to use, pros and cons, best practices! URL: https://www.voidcanvas.com/describing-node-js/.
Walls, C. (2014). Spring Boot in Action. New York: Manning Publications, 2014.
Nader, Y. (2022). What is Django? Advantages and Disadvantages. URL: https://hackr.io/blog/what-is- django-advantages-and-disadvantages-of-using-django.
Express.js Mobile App Development: Pros and Cons for Developers (2022). URL: https://apiko.com/blog/express-mobile-app-development/.
Pollack M., Gierke O., Risberg T. et al. (2012). Spring Data: Modern Data Access for Enterprise Java. Sebastopol, California: O’Reilly Media, 2012.
Google Cloud Storage (2022). URL: https://cloud.google.com/storage.
Chason, S., Straub, B. (2014). Pro Git. New York: Apress. 25 р.
MVC Pattern (2022). URL: https://www.tutorialspoint.com/design_pattern/mvc_pattern.htm.
JetBrains Intellij Idea (2022). URL: https://www.jetbrains.com/idea/.
Pasichnyk, V. V., Reznichenko, V. A. (2006). Organization of databases and knowledge. Kyiv: BHV PITER.