Methods and means for real-time object recognition accuracy increase in video images on iOS mobile platform

Dmytro Kushnir

As a result of the analytical review, it was established that the family of Yolo models is a promising area of search and recognition of objects. However, existing implementations do not support the ability to run the model on the iOS platform. To achieve these goals, a comprehensive scalable conversion system has been developed to improve the recognition accuracy of arbitrary models based on the Docker system. The method of improvement is to add a layer with the Mish activation function to the original model. The method of conversion is to quickly convert any Yolo model to CoreML format. As part of the study of these techniques, a model of the neural network Yolov4_TCAR was created. Additionally, a method of accelerating the load on the CPU using an additional layer of neural network with the function of activating Mish in Swift for the iOS mobile platform was added. As a result, the effectiveness of the Mish activation function, the CPU load of the mobile device, the amount of RAM used, and the frame rate when using the improved original Yolov4- TCAR model were studied. The results of the research confirmed the functioning of the algorithm for conversion and accuracy increase of the neural network model in real-time.

Yolo

input model conversion and improvement algorithm

Yuefeng Zhang, (2020). Deep Learning for Detecting Objects in an Image on Mobile Devices [Online]. Available: https://towardsdatascience.com/deep-learning-for-detecting-objects-in-an... 7d5b2e5621f9 (Accessed: April 2020)
J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91.
Alexey Bochkovskiy, Joseph Redmon, Stefano Sinigardi, cyy, Tino Hager, JaledMC, Muhammad Maaz, Vinjn Zhang, Juuso Alasuutari, Philip Kahn, IlyaOvodov, Josh Veitch-Michaelis, Aymeric Dujardin, John Aughey, Akash Patel, duohappy, Aven, David Smith, Jud White, … Mosè Giordano. (2021). AlexeyAB/darknet: YOLOv4 (Version yolov4). Zenodo. DOI: https://doi.org/10.5281/zenodo.5622675
Kin-Yiu, Wong. (2021). Implementation of Scaled-YOLOv4 using PyTorch framework (v1.0.0). Zenodo. DOI: https://doi.org/10.5281/zenodo.5534091
Glenn Jocher, Alex Stoken, Ayush Chaurasia, Jirka Borovec, NanoCode012, TaoXie, Yonghye Kwon, Kalen Michael, Liu Changyu, Jiacong Fang, Abhiram V, Laughing, tkianai, yxNONG, Piotr Skalski, Adam Hogan, Jebastin Nadar, imyhxy, Lorenzo Mammana, … wanghaoyang0106. (2021). ultralytics/yolov5: v6.0 - YOLOv5n 'Nano' models, Roboflow integration, TensorFlow export, OpenCV DNN support (v6.0). Zenodo. DOI: https://doi.org/10.5281/zenodo.5563715
Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao (2021). You Only Learn One Representation: Unified Network for Multiple Tasks [Online]. Available at: https://arxiv.org/abs/2105.04206 (Accessed: May 2021)
Chamidu Supeshala (2020). YOLO v4 or YOLO v5 or PP-YOLO? [Online]. Available at: https://blog.roboflow.com/yolov5-is-here (Accessed: June 2020)
Chaity Banerjee, Tathagata Mukherjee, and Eduardo Pasiliao. 2020. The Multi-phase ReLU Activation Function. In Proceedings of the 2020 ACM Southeast Conference (ACM SE '20). Association for Computing Machinery, New York, NY, USA, 239–242. DOI:https://doi.org/10.1145/3374135.3385313
Diganta Misra (2019). Mish: A Self Regularized Non-Monotonic Activation Function [Online]. Available at: https://arxiv.org/abs/1908.08681 (Accessed: June 2020)
Joshi, V., Das, A., Sun, E., Mehta, R.R., Li, J., Gong, Y. (2021) Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems. Proc. Interspeech 2021, 1767-1771, doi: 10.21437/Interspeech.2021-1298
Sridhar Narayan (1997). The generalized sigmoid activation function: Competitive supervised learning [Online]. doi: https://doi.org/10.1016/S0020-0255(96)00200-9 (Accessed: June 1997)
Abhishek Mishra. “Machine Learning for iOS Developers”, John Wiley & Sons, 2020. DOI: 10.1002/9781119602927
Li Shuangfeng. TensorFlow Lite: On-Device Machine Learning Framework[J]. Journal of Computer Research and Development, 2020, 57(9): 1839-1853. DOI: https://doi.org/10.7544/issn1000-1239.2020.20200291
Mateusz Opala (2018). TensorLite. Core ML vs TensorflowLite: ML Mobile Frameworks Comparison [Online]. Available at: https://www.netguru.com/blog/coreml-vs-tensorflow-lite-mobile (Accessed: December 2018)
Dirk Merkel (2014). “Docker: lightweight Linux containers for consistent development and deployment”. Linux journal, 2014, No. 239, –pp.2 [online] Available at: https://www.linuxjournal.com/content/docker-lightweight-linux-containers-consistent-development-and-deployment (Accessed: May 2014)
D. Kushnir and Y. Paramud, "Model for Real-Time Object Searching and Recognizing on Mobile Platform," 2020 IEEE 15th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET), 2020, pp. 127-130, doi: 10.1109/TCSET49122.2020.235407.