Audio Reading Assistant for Visually Impaired People

: pp. 81 - 88
Lviv Polytechnic National University, Ukraine
Lviv Polytechnic National University

This paper describes an Android mobile phone application designed for blind or visually impaired people. The main aim of this system is to create an automatic text- reading assistant using the hardware capabilities of a mobile phone associated with innovative algorithms. The Android platform was chosen for people who already have a mobile phone and do not need to  buy new hardware. Four key technologies are required: camera capture, text detection, speech synthesis, and voice detection. Moreover, a voice recognition subsystem has been created that meets the needs of blind users, allowing them to effectively control the appli- cation by voice. It requires three key technologies: voice capture over the embedded microphone, speech-to-text, and user request interpretation. As a result, the application for an Android  platform was developed based  on  these tech- nologies.


  1. Ramoa G., Moured O., Schwarz T., Muller K., Stiefelha- gen R., (2023). Enabling People with Blindness to Distin- guish Lines of Mathematical Charts with Audio-Tactile Graphic Readers. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Re- lated to Assistive Environments. Pp. 384—391. DOI: 3594806.3594818
  2. Yang P., Zhang J., Xu J., Li Y., (2022). An OCR System: Towards Mobile Device. ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technolo- gies. Pp. 1–7. DOI: 3556677.3556685
  3. Hildebrandt P., Schulze M., Cohen S., (2022). Optical character recognition guided image super-resolution. Do- cEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering. Article No. 14. Pp. 1—4. DOI:
  4. Thi-Tuyet-Hai N., Jatowt A., Coustaty A., Nhu-Van N., Doucet A., (2019). Deep statistical analysis of OCR errors for effective post-OCR processing. JCDL ’19: Proceed- ings of the 18th Joint Conference on Digital Libraries. Pp. 29–38. DOI: 00015
  5. Liu R., Sisman B., Gao G., Li H., (2022). Decoding Knowledge Transfer for Neural Text-to-Speech Training. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 30. Pp. 1—5. DOI: 10.1109/TASLP.2022.3171974
  6. Alexanderson S., Székely É., Henter G. E., Kucherenko T., Beskow J., (2020). Generating coherent  spontaneous speech and gesture from text. IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. Pp. 1—3. DOI: 3383652.3423874
  7. Zhou Y., Tian X., Li H., (2021). Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 29. Pp. 3427— 3439. DOI: 3125142
  8. Langlois Q., Jodogne S., (2023). Practical Study of Deep Learning Models for Speech Synthesis. PETRA '23: Proceed- ings of the 16th International Conference on PErvasive Tech- nologies Related to Assistive Environments.  Pp. 700—706.DOI:
  9. Yakubovskyi R., Morozov Y., (2023). Speech Models Training Technologies Comparison Using Word Error Rate. Advances in Cyber-Physical Systems. vol. 8, num. 1. Pp. 74–80. DOI: [10]  Liao J., Eskimez S., Lu L., Shi Y., Gong M., Shou L., Qu H., (2023). Improving Readability for Automatic Speech Recognition  Transcription.  ACM  Transactions  on  Asian and Low-Resource Language Information Processing. vol. 22,   num.   5.   Pp. 1–23.   DOI: