Audio Reading Assistant for Visually Impaired People

Yurii Chypak; Yuriy Morozov

This paper describes an Android mobile phone application designed for blind or visually impaired people. The main aim of this system is to create an automatic text- reading assistant using the hardware capabilities of a mobile phone associated with innovative algorithms. The Android platform was chosen for people who already have a mobile phone and do not need to buy new hardware. Four key technologies are required: camera capture, text detection, speech synthesis, and voice detection. Moreover, a voice recognition subsystem has been created that meets the needs of blind users, allowing them to effectively control the appli- cation by voice. It requires three key technologies: voice capture over the embedded microphone, speech-to-text, and user request interpretation. As a result, the application for an Android platform was developed based on these tech- nologies.

Ramoa G., Moured O., Schwarz T., Muller K., Stiefelha- gen R., (2023). Enabling People with Blindness to Distin- guish Lines of Mathematical Charts with Audio-Tactile Graphic Readers. PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Re- lated to Assistive Environments. Pp. 384—391. DOI: https://doi.org/10.1145/ 3594806.3594818
Yang P., Zhang J., Xu J., Li Y., (2022). An OCR System: Towards Mobile Device. ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technolo- gies. Pp. 1–7. DOI: https://doi.org/10.1145/ 3556677.3556685
Hildebrandt P., Schulze M., Cohen S., (2022). Optical character recognition guided image super-resolution. Do- cEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering. Article No. 14. Pp. 1—4. DOI: https://doi.org/10.1145/3558100.3563841
Thi-Tuyet-Hai N., Jatowt A., Coustaty A., Nhu-Van N., Doucet A., (2019). Deep statistical analysis of OCR errors for effective post-OCR processing. JCDL ’19: Proceed- ings of the 18th Joint Conference on Digital Libraries. Pp. 29–38. DOI: https://doi.org/10.1109/JCDL.2019. 00015
Liu R., Sisman B., Gao G., Li H., (2022). Decoding Knowledge Transfer for Neural Text-to-Speech Training. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 30. Pp. 1—5. DOI: https://doi.org/ 10.1109/TASLP.2022.3171974
Alexanderson S., Székely É., Henter G. E., Kucherenko T., Beskow J., (2020). Generating coherent spontaneous speech and gesture from text. IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. Pp. 1—3. DOI: https://doi.org/10.1145/ 3383652.3423874
Zhou Y., Tian X., Li H., (2021). Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation. IEEE/ACM Transactions on Audio, Speech and Language Processing. vol. 29. Pp. 3427— 3439. DOI: https://doi.org/10.1109/TASLP.2021. 3125142
Langlois Q., Jodogne S., (2023). Practical Study of Deep Learning Models for Speech Synthesis. PETRA '23: Proceed- ings of the 16th International Conference on PErvasive Tech- nologies Related to Assistive Environments. Pp. 700—706.DOI: https://doi.org/10.1145/3594806.3596536
Yakubovskyi R., Morozov Y., (2023). Speech Models Training Technologies Comparison Using Word Error Rate. Advances in Cyber-Physical Systems. vol. 8, num. 1. Pp. 74–80. DOI: https://doi.org/10.23939/acps2023.01.074 [10] Liao J., Eskimez S., Lu L., Shi Y., Gong M., Shou L., Qu H., (2023). Improving Readability for Automatic Speech Recognition Transcription. ACM Transactions on Asian and Low-Resource Language Information Processing. vol. 22, num. 5. Pp. 1–23. DOI: https://doi.org/10.1145/3557894