ML MODELS AND OPTIMIZATION STRATEGIES FOR ENHANCING THE PERFORMANCE OF CLASSIFICATION ON MOBILE DEVICES

2024;
: 74–82
https://doi.org/10.23939/ujit2024.02.074
Received: October 15, 2024
Accepted: November 19, 2024
1
Lviv Polytechnic National University, Lviv, Ukraine
2
Lviv Polytechnic National University, Lviv, Ukraine

The paper highlights the increasing importance of machine learning (ML) in mobile applications, with mobile devices becoming ubiquitous due to their accessibility and functionality. Various ML models, including Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), are explored for their applications in real-time classification on mobile devices. The paper identifies key challenges in deploying these models, such as limited computational resources, battery consumption, and the need for real-time performance.

Central to the research is the comparison of MobileNetV2, a lightweight CNN designed for mobile applications, and Vision Transformers (ViTs), which have shown success in image recognition tasks. MobileNetV2, with its depthwise separable convolutions and residual connections, is optimized for resource efficiency, while ViTs apply self-attention mechanisms to achieve competitive performance in image classification. The study evaluates the performance of both models before and after applying optimization techniques like quantization and graph optimization.

It was discovered that quantization is one of the most effective optimization strategies for mobile environments, reducing model size by up to 74 % and improving inference speed by 44 % in ViTs. Additionally, graph optimization techniques, such as operator fusion, pruning, and node reordering, are examined for their role in reducing computational complexity and improving performance on resource-constrained devices.

Experimental results on different datasets, including MNIST and the ASL Alphabet dataset, demonstrate the significant performance improvements achieved through optimization. The study shows that post-training quantization and graph optimization can reduce model size, inference time, and CPU usage, making ML models more suitable for mobile applications. The experiments were conducted on a Xiaomi Redmi Note 8 Pro device, showcasing the practical benefits of these optimizations in real-world mobile deployments.

The research concludes that optimization techniques like quantization and graph optimization are essential for deploying ML models on mobile devices, where resource constraints and real-time performance are critical. It also provides valuable insights into how ML architectures can be optimized for mobile environments, contributing to the advancement of efficient AI-driven mobile applications.

1. ITU/UN tech agency. (2024, May 19). Measuring Digital Development - Facts and Figures 2023. International Telecommunication Union. https://www.itu.int/hub/publication/d-ind-ict_mdd-2023-1/

2. Statista. (2024, June 14). Topic: US smartphone market. https://www.statista.com/topics/2711/us-smartphone-market/#topicOverview

3. Brand, L. (2023). Towards improved user experience for artificial intelligence systems. In Engineering Applications of Neural Networks (pp. 33-44). Cham. https://doi.org/10.1007/978-3-031-34204-2_4

4. Li, Y., Dang, X., Tian, H., Sun, T., Wang, Z., Ma, L., Klein, J., & Bissyande, T. F. (2022). AI-driven mobile apps: An explorative study. ArXiv. https://doi.org/10.48550/arxiv.2212.01635

5. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. ArXiv. https://doi.org/10.48550/arxiv.1706.03762

6. Sun, C., Shrivastava, A., Singh, S., & Gupta, A. K. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In 2017 IEEE International Conference on Computer Vision (ICCV) (pp. 843-852). https://doi.org/10.1109/ICCV.2017.97

7. Sandler, M., Howard, A. G., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4510-4520). https://doi.org/10.1109/CVPR.2018.00474

8. Howard, A. G., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., Le, Q. V., & Adam, H. (2019). Searching for mobilenetv3. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 1314-1324). https://doi.org/10.1109/ICCV.2019.00140

9. Tan, M., & Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. ArXiv. https://doi.org/10.48550/arXiv.1905.11946

10. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jegou, H. (2021). Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, 10347-10357.

11. Xiao, T., Singh, M., Mintun, E., Darrell, T., Dollar, P., & Girshick, R. (2021). Early convolutions help transformers see better. ArXiv. https://doi.org/10.48550/arXiv.2106.14881

12. Ranftl, R., Bochkovskiy, A., & Koltun, V. (2021). Vision transformers for dense prediction. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 12159-12168). https://doi.org/10.1109/ICCV48922.2021.01196

13. Chen, L., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. ArXiv, abs/1706.05587.

14. Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., & Zhang, L. (2021). Cvt: Introducing convolutions to vision transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 22-31). https://doi.org/10.1109/ICCV48922.2021.00009

15. Srinivas, A., Lin, T.-Y., Parmar, N., Shlens, J., Abbeel, P., & Vaswani, A. (2021). Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 16519-16529). https://doi.org/10.1109/CVPR46437.2021.01625

16. d'Ascoli, S., Touvron, H., Leavitt, M., Morcos, A., Biroli, G., & Sagun, L. (2021). Convit: Improving vision transformers with soft convolutional inductive biases. International Conference on Machine Learning, 2286-2296. https://doi.org/10.1088/1742-5468/ac9830

17. Ryoo, M., Piergiovanni, A. J., Arnab, A., Dehghani, M., & Angelova, A. (2021). Tokenlearner: Adaptive space-time tokenization for videos. Advances in Neural Information Processing Systems, 34.

18. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 10012-10022). https://doi.org/10.1109/ICCV48922.2021.00986

19. Caron, M. (2021). Emerging properties in self-supervised vision transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 9630-9640). https://doi.org/10.1109/ICCV48922.2021.00951

20. Kitaev, N., Kaiser, Ł., & Levskaya, A. (2020). Reformer: The efficient transformer. ArXiv. https://doi.org/10.48550/arXiv.2001.04451

21. Mehta, S., & Rastegari, M. (2021). MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer. ArXiv. https://doi.org/10.48550/arXiv.2110.02178

22. Wu, H., Judd, P., Zhang, X., Isaev, M., & Micikevicius, P. (2020). Integer quantization for deep learning inference: Principles and empirical evaluation. ArXiv. https://doi.org/10.48550/arXiv.2004.09602

23. Wan, L. (2014). A study of factors affecting mobile application download. Journal of Digital Convergence, 12, 189-196. https://doi.org/10.14400/JDC.2014.12.7.189