IOT System for Real-Time Audio Information Processing

2025;
: pp. 16 - 21
1
Ivan Franko National University
2
Ivan Franko National University

This paper presents the development and inves- tigation of a speech-to-text conversion and speaker identi- fication system based on a Raspberry Pi microcomputer, designed for local audio data processing in environments with limited network connectivity. The system integrates Silero and WebRTC models for voice activity detection, SpeechBrain for speaker identification, and the Whisper family of models for speech recognition. In particular, a comparative analysis has been conducted on the efficiency of local speech processing using Whisper Tiny and Whisper Large 2 models versus cloud-based processing through the Whisper-1 and Whisper-1-en APIs (the latter applied exclu- sively to English-language speech). The study evaluates the impact of sentence length, processing time, memory consum- ption, and recognition accuracy on system performance. The advantages and resource-related limitations of the models in local and cloud-based IoT environments has been analyzed, and the feasibility of their application in real-time and data privacy contexts has been determined. Performance metrics of the models under various conditions has been used for the analysis.

  1. Sarbast, H. (2024). Voice Recognition Based on Machine Learning Classification Algorithms: A Review. Indonesian Journal of Computer Science, 13, 4414-4431. DOI: https://doi.org/10.33022/ijcs.v13i3.4110.
  2. Fatima, I., Fahim, M., Lee, Y.K., & Lee, S. (2013). Analysis and Effects of Smart Home Dataset Characteristics for Daily Life Activity Recognition. The Journal of Supercomputing, 66, 760-780. DOI: https://doi.org/10.1007/S11227-013-0978-8.
  3. Luo, X., Zhou, L., Adelgais, K.M., & Zhang, Z. (2024). Assessing the Effectiveness of Automatic Speech Recognition Technology in Emergency Medicine Settings: A Comparative Study of Four AI-powered Engines. DOI: https://doi.org/10.21203/rs.3.rs-4727659/v1.
  4. Wang, X. (2024). Research on Oral English  Learning System Integrating AI Speech Data Recognition and Speech Quality Evaluation Algorithm. Journal  of Electrical Systems, 20, 2466-2477. DOI:  https://doi.org/ 10.52783/jes.2688.
  5. Thandil, R.K., & Basheer, K.P.M. (2020). Accent Based Speech Recognition: A Critical Overview. Malaya Journal of Matematik, 8, 1743-1750. DOI: https://doi.org/ 10.26637/MJM0804/0070.
  6. Subhi, H., Qashi, R., Abdulrahman, L.M., Ayoub, M. & Adil, A. (2023). Performance Analysis of Enterprise Cloud Computing: A Review. Journal of Applied Science and Technology Trends, 4, 1-12. DOI: https://doi.org/ 10.38094/jastt401139.
  7. Sikarwar, S.S.  (2025).  Computation  Intelligence Techniques for Security in IoT Devices. International Journal on Computational Modelling Applications, 2(1), 15–27. DOI: https://doi.org/10.63503/j.ijcma.2025.48.
  8. Abnas, M., Imkan, K. M., Ajmal, J.S., Vasudevan A.P., Thampi, S., & Philip, R.K. (2024). Colloquial Language Speech Converter API: A Comprehensive Survey. DOI: https://doi.org/10.20944/preprints202412.2503.v1.
  9. Balan, R.V.S., Vignesh, K., Jose, T., Kalpana, P., & Jothi- kumar, R. (2024). An Investigation and Analysis on Auto- matic Speech Recognition Systems. Journal of Autonomous Intelligence, 7(3), 1-13. DOI: https://doi.org/
  10. Cheng, S., Xu, Z., Li, X., Wu, X., Fan, Q., Wang, X., & Leung, V.C.M. (2020). Task Offloading for Automatic Speech Recognition in Edge-Cloud Computing Based Mobile Networks. 2020 IEEE Symposium on Computers and Communications (ISCC), 1-6. DOI: https://doi.org/ 10.1109/ISCC50000.2020.9219579.
  11. Toshniwal, S., Sainath, T.N., Weiss, R.J., Li, B., Moreno, P., Weinstein, E., & Rao, K. (2018). Multilingual Speech Recognition with a Single End-to-End Model. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4904-4908. DOI: https://doi.org/ 10.1109/ICASSP.2018.8461972.
  12. Orellana, C., Cereceda-Balic, F., Solar, M., & Astudillo, H. (2024). Enabling Design of Secure IoT Systems with Trade-Off-Aware Architectural Tactics. Sensors, 24(22), 7314. DOI: https://doi.org/10.3390/s24227314.