Software Implementation of Gesture Recognition Algorithm Using Computer Vision

Vladyslav Kotyk; Оксана Лашко

This paper examines the main methods and principles of image formation, display of the sign language recognition algorithm using computer vision to improve communication between people with hearing and speech impairments. This algorithm allows to effectively recognize gestures and display information in the form of labels. A system that includes the main modules for implementing this algorithm has been designed. The modules include the implementation of perception, transformation and image processing, the creation of a neural network using artificial intelligence tools to train a model for predicting input gesture labels. The aim of this work is to create a full-fledged program for implementing a real-time gesture recognition algorithm using computer vision and machine learning.

система технічного зору

Deep Learnig

Gaussian Blur

градації сірого

класифікація зображень

сегментація

ML.NET

Thresholding

Stenger, B., Thayananthan, A., Torr, P. and Cipolla, R., (2006). Model-based hand tracking using a hierarchical Bayesian filter. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), pp.1372-1384.
https://doi.org/10.1109/TPAMI.2006.189
Wang, H., Chai, X. and Chen, X., (2016). Sparse Observation (SO) Alignment for Sign Language Recognition. Neurocomputing, 175, pp.674-685.
https://doi.org/10.1016/j.neucom.2015.10.112
Wang, Q., Chen, X., Zhang, L., Wang, C. and Gao, W., (2007). Viewpoint invariant sign language recognition. Computer Vision and Image Understanding, 108(1-2).
https://doi.org/10.1016/j.cviu.2006.11.009
Nixon, M. and Aguado, A., (2019). Feature extraction and image processing for computer vision. 4th ed. New York: Academic Press, p.650.
Barghout, L., (2016). Image Segmentation Using Fuzzy Spatial-Taxon Cut: Comparison of Two Different Stage One Perception Based Input Models of Color (Bayesian Classifier and Fuzzy Constraint). Electronic Imaging, 2016(16), pp.1-6.
https://doi.org/10.2352/ISSN.2470-1173.2016.16.HVEI-121
Zhang, Y. and Wu, L., (2011). Optimal Multi-Level Thresholding Based on Maximum Tsallis Entropy via an Artificial Bee Colony Approach. Entropy, 13(4), pp.841-859.
https://doi.org/10.3390/e13040841
Lai, Y. and Rosin, P., (2014). Efficient Circular Thresholding. IEEE Transactions on Image Processing, 23(3), pp.992-1001.
https://doi.org/10.1109/TIP.2013.2297014
Brinkmann, R., (1999). The Art and science of digital compositing. San Diego, Calif.: Morgan Kaufmann, p.184.
Shapiro, L. and Stockman, G., (2001). Computer vision. Upper Saddle River, NJ: Prentice Hall, pp.137,150.
Morris, T., (2004). Computer vision and image processing. Basingstoke: Palgrave Macmillan.
Vandoni, C. and Huang, T., (1996). Proceedings / 1996 CERN School of Computing. Geneva: CERN.
Schmidhuber, J., (2015). Deep learning in neural networks: An overview. Neural Networks, 61, pp.85-117.
https://doi.org/10.1016/j.neunet.2014.09.003
Bengio, Y., (2009). Learning Deep Architectures for AI. Foundations and Trends<sup class="reg">®</sup> in Machine Learning, 2(1), pp.1-127.
https://doi.org/10.1561/9781601982957
Cireşan, D., Meier, U., Masci, J. and Schmidhuber, J., (2012). Multi-column deep neural network for traffic sign classification. Neural Networks, 32, pp.333-338.
https://doi.org/10.1016/j.neunet.2012.02.023
Capellman, J., (2020). Hands-On Machine Learning with ML.NET. [S.l.]: Packt Publishing.
Esposito, D. and Esposito, F., (2020). Introducing Machine Learning. 1st ed. Microsoft Press, p.256.
Asthana, A., (2021). Introducing ML.NET: Cross- platform, Proven and Open Source Machine Learning Framework | .NET Blog. [online] Available at:https://devblogs.microsoft.com/dotnet/introducing-ml-net- cross-platform-proven-and-open-source-machine-learning- framework/ .
Hamill, P., (2009). Unit Test Frameworks for High- Quality Software Development. Sebastopol: O’Reilly Media, Inc.
Lingojam.com. 2021. American Sign Language Translator (ASL) — LingoJam. [online] Available at: https://lingojam.com/AmericanSignLanguageTranslator/
Techcrunch.com. (2021). TechCrunch is now a part of Verizon Media. [online] Available at: https://techcrunch.com/2014/06/06/motionsavvy-is-a-tablet- app-that-understands-sign-language/.