ML MODELS AND OPTIMIZATION STRATEGIES FOR ENHANCING THE PERFORMANCE OF CLASSIFICATION ON MOBILE DEVICES

V. Ya. Chornenkyi; I. Ya. Kazymyra

The paper highlights the increasing importance of machine learning (ML) in mobile applications, with mobile devices becoming ubiquitous due to their accessibility and functionality. Various ML models, including Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), are explored for their applications in real-time classification on mobile devices. The paper identifies key challenges in deploying these models, such as limited computational resources, battery consumption, and the need for real-time performance.

Central to the research is the comparison of MobileNetV2, a lightweight CNN designed for mobile applications, and Vision Transformers (ViTs), which have shown success in image recognition tasks. MobileNetV2, with its depthwise separable convolutions and residual connections, is optimized for resource efficiency, while ViTs apply self-attention mechanisms to achieve competitive performance in image classification. The study evaluates the performance of both models before and after applying optimization techniques like quantization and graph optimization.

It was discovered that quantization is one of the most effective optimization strategies for mobile environments, reducing model size by up to 74 % and improving inference speed by 44 % in ViTs. Additionally, graph optimization techniques, such as operator fusion, pruning, and node reordering, are examined for their role in reducing computational complexity and improving performance on resource-constrained devices.

Experimental results on different datasets, including MNIST and the ASL Alphabet dataset, demonstrate the significant performance improvements achieved through optimization. The study shows that post-training quantization and graph optimization can reduce model size, inference time, and CPU usage, making ML models more suitable for mobile applications. The experiments were conducted on a Xiaomi Redmi Note 8 Pro device, showcasing the practical benefits of these optimizations in real-world mobile deployments.

The research concludes that optimization techniques like quantization and graph optimization are essential for deploying ML models on mobile devices, where resource constraints and real-time performance are critical. It also provides valuable insights into how ML architectures can be optimized for mobile environments, contributing to the advancement of efficient AI-driven mobile applications.

deep learning

convolutional neural networks

vision transformers

mobile applications

1. ITU/UN tech agency. (2024, May 19). Measuring Digital Development - Facts and Figures 2023. International Telecommunication Union. https://www.itu.int/hub/publication/d-ind-ict_mdd-2023-1/
2. Statista. (2024, June 14). Topic: US smartphone market. https://www.statista.com/topics/2711/us-smartphone-market/#topicOverview
3. Brand, L. (2023). Towards improved user experience for artificial intelligence systems. In Engineering Applications of Neural Networks (pp. 33-44). Cham. https://doi.org/10.1007/978-3-031-34204-2_4
https://doi.org/10.1007/978-3-031-34204-2_4
4. Li, Y., Dang, X., Tian, H., Sun, T., Wang, Z., Ma, L., Klein, J., & Bissyande, T. F. (2022). AI-driven mobile apps: An explorative study. ArXiv. https://doi.org/10.48550/arxiv.2212.01635
5. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. ArXiv. https://doi.org/10.48550/arxiv.1706.03762
6. Sun, C., Shrivastava, A., Singh, S., & Gupta, A. K. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In 2017 IEEE International Conference on Computer Vision (ICCV) (pp. 843-852). https://doi.org/10.48550/arXiv.1707.02968
https://doi.org/10.1109/ICCV.2017.97
7. Sandler, M., Howard, A. G., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4510-4520). https://doi.org/10.1109/CVPR.2018.00474
https://doi.org/10.1109/CVPR.2018.00474
8. Howard, A. G., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., Le, Q. V., & Adam, H. (2019). Searching for mobilenetv3. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 1314-1324). https://doi.org/10.1109/ICCV.2019.00140
https://doi.org/10.1109/ICCV.2019.00140
9. Tan, M., & Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. ArXiv. https://doi.org/10.48550/arXiv.1905.11946
10. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jegou, H. (2021). Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, 10347-10357.
11. Xiao, T., Singh, M., Mintun, E., Darrell, T., Dollar, P., & Girshick, R. (2021). Early convolutions help transformers see better. ArXiv. https://doi.org/10.48550/arXiv.2106.14881
12. Ranftl, R., Bochkovskiy, A., & Koltun, V. (2021). Vision transformers for dense prediction. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 12159-12168). https://doi.org/10.1109/ICCV48922.2021.01196
https://doi.org/10.1109/ICCV48922.2021.01196
13. Chen, L., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. ArXiv, abs/1706.05587.
14. Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., & Zhang, L. (2021). Cvt: Introducing convolutions to vision transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 22-31). https://doi.org/10.1109/ICCV48922.2021.00009
https://doi.org/10.1109/ICCV48922.2021.00009
15. Srinivas, A., Lin, T.-Y., Parmar, N., Shlens, J., Abbeel, P., & Vaswani, A. (2021). Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 16519-16529). https://doi.org/10.1109/CVPR46437.2021.01625
https://doi.org/10.1109/CVPR46437.2021.01625
16. d'Ascoli, S., Touvron, H., Leavitt, M., Morcos, A., Biroli, G., & Sagun, L. (2021). Convit: Improving vision transformers with soft convolutional inductive biases. International Conference on Machine Learning, 2286-2296. https://doi.org/10.1088/1742-5468/ac9830
https://doi.org/10.1088/1742-5468/ac9830
17. Ryoo, M., Piergiovanni, A. J., Arnab, A., Dehghani, M., & Angelova, A. (2021). Tokenlearner: Adaptive space-time tokenization for videos. Advances in Neural Information Processing Systems, 34.
18. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 10012-10022). https://doi.org/10.1109/ICCV48922.2021.00986
https://doi.org/10.1109/ICCV48922.2021.00986
19. Caron, M. (2021). Emerging properties in self-supervised vision transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 9630-9640). https://doi.org/10.1109/ICCV48922.2021.00951
https://doi.org/10.1109/ICCV48922.2021.00951
20. Kitaev, N., Kaiser, Ł., & Levskaya, A. (2020). Reformer: The efficient transformer. ArXiv. https://doi.org/10.48550/arXiv.2001.04451
21. Mehta, S., & Rastegari, M. (2021). MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer. ArXiv. https://doi.org/10.48550/arXiv.2110.02178
22. Wu, H., Judd, P., Zhang, X., Isaev, M., & Micikevicius, P. (2020). Integer quantization for deep learning inference: Principles and empirical evaluation. ArXiv. https://doi.org/10.48550/arXiv.2004.09602
23. Wan, L. (2014). A study of factors affecting mobile application download. Journal of Digital Convergence, 12, 189-196. https://doi.org/10.14400/JDC.2014.12.7.189
https://doi.org/10.14400/JDC.2014.12.7.189