METHOD FOR WORKLOAD-BASED SELECTION OF LIGHTWEIGHT PREDICTION MODELS IN MICROSERVICE AUTOSCALING

Mykhaylo Klymash; Kostiantyn Morhoiev

The paper investigates the feasibility of using lightweight predictive models for proactive microservice autoscaling, addressing the limitations of the alternative approaches: reactive threshold-based scaling, causing potential delays in resource adjustment and response time, and the deep learning models, such as LSTM, requiring high computational resource allocation. Using Alibaba Cluster Trace dataset, microservice workloads are analyzed and classified into four distinct categories (Stable, Periodic, Spiky, Mixed) based on the coefficient of variation and peak-to-mean ratio. Coming from the considerations of simplicity in implementation, low level of computational complexity, and covering main methodological categories, six forecasting methods were selected for evaluation: simple moving average (SMA), exponential moving average (EMA), Holt-Winters smoothing, Kalman filter, autoregressive integrated moving average (ARIMA), and percentile-based estimation. Each method is tested for different forecast horizons in both vertical and horizontal scaling scenarios. The evaluation criteria were forecast accuracy (RMSE, MAE, MAPE), computational efficiency (execution time, amount of memory used), and model suitability for specific types of workloads. The results showed that lightweight approaches provide acceptable forecast accuracy (RMSE 0.0621–0.0846) with minimal computational costs (0.43–11.76 ms per forecast). Across predictive algorithms compared, SMA offers optimal efficiency for stable workloads, Holt-Winters is most effective for periodic patterns, Kalman filter excels in handling spiky and mixed workloads, while percentile-based estimation is advantageous for long-horizon volatile patterns. Aggregation at the service level significantly reduced errors for spiky workloads. Based on the findings, a method for workload-aware selection of lightweight prediction models was proposed, mapping workload type, scaling objective, and prediction horizon to the most suitable model and parameters, enabling resource-efficient autoscaling.

microservice autoscaling

prediction models

workload classification

[1] A. Jindal, V. Podolskiy, and M. Gerndt, “Performance modeling for cloud microservice applications,” Proc. ACM/SPEC Int. Conf. Performance Engineering, pp. 25–32, 2019.

[2] M. Abdullah, W. Iqbal, and F. Bukhari, “Elastic resource provisioning for containerized microservices using reinforcement learning,” IEEE Access, vol. 8, pp. 182505–182518, 2020, doi: 10.1109/ACCESS.2020.3029307.

[3] K. Rzadca et al., “Autopilot: Workload autoscaling at Google,” Proc. Eur. Conf. Computer Systems, pp. 1–16, 2020, doi: 10.1145/3341301.3383303.

[4] T. Nguyen, Y. Zhou, D. Hwang, and S. Kim, “Computational efficiency of time series forecasting models in containerized environments,” Proc. 15th IEEE Int. Conf. Cloud Computing (CLOUD), pp. 367–374, 2021,
doi: 10.1109/CLOUD53861.2021.00058.

[5] Y. Zhang, X. Cheng, Y. Chen, and H. Huang, “Learning-based pod auto-scaling for Kubernetes container platform,” Proc. IEEE Int. Conf. Cloud Computing (CLOUD), pp. 196–203, 2020,
doi: 10.1109/CLOUD49709.2020.00035.

[6] S. Mahdavi-Hezavehi, P. Avgeriou, and L. Baresi, “A survey of approaches for evaluating and facilitating self-adaptive systems quality requirements,” Proc. IEEE/ACM Int. Conf. Automated Software Engineering (ASE), pp. 1072–1083, 2021, doi: 10.1109/ASE51524.2021.9678687.

[7] H. Lu, X. Xu, V. Chang, S. Ren, and C. Liu, “Performance analysis and prediction of containerized microservices: A time-series approach,” IEEE Trans. Cloud Comput., vol. 10, no. 4, pp. 2576–2588, 2021,
doi: 10.1109/TCC.2021.3054902.

[8] S. Eismann, J. Grohmann, A. van Hoorn, C. Chiu, and T. Würthinger, “Variations on a theme: Cloud function workload heterogeneity,” Proc. IEEE Int. Conf. Cloud Engineering (IC2E), pp. 103–114, 2021,
doi: 10.1109/IC2E52221.2021.00021.

[9] M. Grambow, F. Lehmann, and D. Bermbach, “Continuous benchmarking: Using system benchmarking in build pipelines,” Proc. IEEE Int. Conf. Cloud Engineering (IC2E), pp. 184–190, 2021,
doi: 10.1109/IC2E52221.2021.00036.

[10] E. Casalicchio and V. Perciballi, “Measuring Docker performance: What a mess!!!,” Proc. ACM/SPEC Int. Conf. Performance Engineering, pp. 285–296, 2020, doi: 10.1145/3358960.3379138.

[11] M. V. Netto, M. Mendonça, H. T. Maia, R. Galante, and R. Buyya, “A taxonomy of container startup time reduction strategies for cloud-native applications,” J. Syst. Archit., vol. 110, p. 101771, 2020,
doi: 10.1016/j.sysarc.2020.101771.

[12] D. Taibi, V. Lenarduzzi, and C. Pahl, “Microservices anti-patterns: A taxonomy,” in Microservices, Cham: Springer, 2020, pp. 111–128, doi: 10.1007/978-3-030-31646-4_6.

[13] V. Podolskiy, A. Jindal, and M. Gerndt, “Measuring horizontal and vertical scaling of microservices in Kubernetes,” Proc. IEEE Int. Conf. Cloud Computing Technology and Science, pp. 313–318, 2018,
doi: 10.1109/CloudCom.2018.00057.

[14] C. Qiu, H. Shen, and L. Chen, “Towards green cloud computing: Demand allocation and pricing policies for cloud service brokerage,” IEEE Trans. Big Data, vol. 6, no. 2, pp. 290–306, 2020,
doi: 10.1109/TBDATA.2018.2876826.

[15] N. Roy, A. Dubey, and A. Gokhale, “Efficient autoscaling in the cloud using predictive models for workload forecasting,” Proc. IEEE Int. Conf. Cloud Computing, pp. 500–507, 2021,
doi: 10.1109/CLOUD53861.2021.00067.

[16] R. N. Calheiros, E. Masoumi, R. Ranjan, and R. Buyya, “Workload prediction using ARIMA model and its impact on cloud applications' QoS,” IEEE Trans. Cloud Comput., vol. 3, no. 4, pp. 449–458, 2015,
doi: 10.1109/TCC.2015.2415795.

[17] Y. Jiang, C. S. Hwang, and Z. Liu, “Pattern-aware workload prediction for container-based microservices,” IEEE Trans. Services Comput., vol. 15, no. 4, pp. 1612–1625, 2022,
doi: 10.1109/TSC.2020.3036316.

[18] J. Li, S. Ma, S. Liu, S. Venugopal, and R. Buyya, “WLEC: A weighted low error container scaling approach for cloud applications,” Proc. IEEE Int. Conf. Web Services (ICWS), pp. 149–156, 2020,
doi: 10.1109/ICWS49710.2020.00030.

[19] Y. Al-Dhuraibi, F. Paraiso, N. Djarallah, and P. Merle, “Elasticity in cloud computing: state of the art and research challenges,” IEEE Trans. Services Comput., vol. 11, no. 2, pp. 430–447, 2018,
doi: 10.1109/TSC.2017.2711009.

[20] M. Imdoukh, I. Ahmad, and M. G. Alfailakawi, “Machine learning-based auto-scaling for containerized applications,” Neural Comput. Appl., vol. 32, pp. 9745–9760, 2019,
doi: 10.1007/s00521-019-04507-4.

[21] A. Naskos, A. Gounaris, and P. Katsaros, “Cost-aware horizontal scaling of NoSQL databases using probabilistic model checking,” Cluster Comput., vol. 22, no. 1, pp. 307–320, 2019,
doi: 10.1007/s10586-017-1010-z.

[22] K. Weber, L. Quente, V. Andrikopoulos, and F. Leymann, “Evaluating forecasting methods for cloud workloads,” Proc. Int. Conf. Service-Oriented Computing, pp. 343–355, 2019,
doi: 10.1007/978-3-030-33702-5_24.