The continuously growing number of users and their requests to the server demands substantial resources to ensure fast responses without delays. However, server load is inherently unevenly distributed throughout the day, week, or month. Accurately predicting the required resources and dynamically managing their allocation is crucial, as it can lead to significant cost savings in server maintenance without compromising the user experience. This study investigates the influence of activation function choice on the forecasting accuracy of Long Short-Term Memory (LSTM) neural networks applied to real-world server request data. A dataset of incoming server requests was collected and aggregated into 20-minute intervals over 16 consecutive days. Several activation functions—including ReLU, Swish, and Softplus—were evaluated using mean squared error (MSE) as the primary performance metric. Each model configuration was trained six times to ensure statistical reliability, and the results were taken from one of the most stable runs. The experiments demonstrate that the selection of activation function has a significant impact on prediction accuracy: Swish and ReLU achieved the lowest MSE values, reducing error by up to $6.6-12.3$% and $10.5-16.3$%, respectively, compared to the baseline. Although the sigmoid function yielded the lowest test loss, further analysis revealed that this outcome was misleading: the model systematically underestimated peak loads, resulting in lower error values but poor predictive fidelity with respect to actual server load dynamics. These findings validate the hypothesis that activation function choice is a critical factor in optimizing LSTM-based forecasting models for server load prediction.
- Saraswat M., Tripathi R. C. Cloud computing: Comparison and analysis of cloud service providers–AWS, Microsoft, and Google. 2020 9th International Conference System Modeling and Advancement in Research Trends (SMART). 281–285 (2020).
- Pahl C., Brogi A., Soldani J., Jamshidi P. Cloud container technologies: A state-of-the-art review. IEEE Transactions on Cloud Computing. 7 (3), 677–692 (2019).
- The Kubernetes Authors, Kubernetes, Kubernetes (2024). https://kubernetes.io.
- Pukach P., Hladun V. Using dynamic neural networks for server load prediction. Computational Linguistics and Intelligent Systems (CoLInS), Workshop. Vol. II. 157–160 (2018). https://colins.in.ua/wp-content/uploads/2018/07/colins_2018_157_160.pdf.
- Shivakumar S. K. Architecting High Performing, Scalable and Available Enterprise Web Applications. Amsterdam, Netherlands, Morgan Kaufmann (2015).
- Amazon Web Services, Inc., Amazon Elastic Compute Cloud (EC2). Overview of Amazon Web Services, Aug. 27, 2024, pp. 1–3. https://aws.amazon.com/ec2/.
- Bisong E. An overview of Google Cloud Platform services. Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners. Berkeley, CA, USA, Apress (2019).
- Collier M., Shahan R. Microsoft Azure Essentials: Fundamentals of Azure. Redmond, WA, USA, Microsoft Press (2016).
- Ranković T., Šiljić F., Tomić J., Sladić G., Simić M. Misconfiguration Prevention and Error Cause Detection for Distributed-Cloud Applications. 2024 IEEE 22nd Jubilee International Symposium on Intelligent Systems and Informatics (SISY). 000297–000302 (2024).
- Nobles C. Investigating cloud computing misconfiguration errors using the human factors analysis and classification system. Scientific Bulletin. 27 (1), 59–66 (2022).
- Menasce D. A. Load testing of Web sites. IEEE Internet Computing. 6 (4), 70–74 (2002).
- Hamilton J. D. Time Series Analysis. Princeton University Press (2020).
- Skiera B., Reiner J., Albers S. Regression Analysis. Handbook of Market Research. 299–327 (2022).
- Brockwell P. J., Davis R. A. Introduction to Time Series and Forecasting. New York, USA, Springer (2002).
- Hochreiter S., Schmidhuber J. Long short-term memory. Neural Computation. 9 (8), 1735–1780 (1997).
- Zhu X., Sobhani P., Guo H. Long short-term memory over recursive structures. ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning. 1604–1612 (2025).
- Mienye I. D., Swart T. G., Obaido G. Recurrent neural networks: A comprehensive review of architectures, variants, and applications. Information. 15 (9), 517 (2024).
- Mathew A., Amudha P., Sivakumari S. Deep learning techniques: An overview. Advanced Machine Learning Technologies and Applications. 599–608 (2021).
- Van Houdt G., Mosquera C., Nápoles G. A review on the long short-term memory model. Artificial Intelligence Review. 53 (8), 5929–5955 (2020).
- Zell A. Simulation neuronaler Netze. Bonn, Germany, Addison-Wesley (1994).
- Sewak M., Sahay S. K., Rathore H. Assessment of the relative importance of different hyper-parameters of LSTM for an IDS. 2020 IEEE Region 10 Conference (TENCON). 414–419 (2020).
- Santos C. F. G. D., Papa J. P. Avoiding overfitting: A survey on regularization methods for convolutional neural networks. ACM Computing Surveys. 54 (10s), 213 (2022).
- Ying X. An overview of overfitting and its solutions. Journal of Physics: Conference Series. 1168 (2), 022022 (2019).
- Jabbar H. K., Khan R. Z. Methods to avoid over-fitting and under-fitting in supervised machine learning (comparative study). Computer Science, Communication & Instrumental Devices. 163–170 (2014).
- Everitt B. S., Skrondal A. The Cambridge Dictionary of Statistics. Cambridge, Cambridge University Press (2010).
- Liang X., Wu L., Li J., Wang Y., Meng Q., Qin T., Chen W., Zhang M., Liu T.-Y. R-Drop: Regularized dropout for neural networks. 35th Conference on Neural Information Processing Systems (NeurIPS '2021). 10890–10905 (2021).
- Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research. 15 (1), 1929–1958 (2014).
- Taylor S. J., Letham B. Forecasting at Scale. The American Statistician. 72 (1), 37–45 (2018).
- Salinas D., Flunkert V., Gasthaus J., Januschowski T. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting. 36 (3), 1181–1191 (2020).
- Herzen J., Lässig F., Piazzetta S. G., Neuer T. et al. Darts: User-Friendly Modern Machine Learning for Time Series. Journal of Machine Learning Research. 23 (124), 1–6 (2022).
- Wilder B. Cloud Architecture Patterns: Using Microsoft Azure. CA, USA, O'Reilly Media (2012).
- Dagum E. B. Time series modeling and decomposition. Statistica. 70 (4), 433–457 (2010).
- Hodson T. O., Over T. M., Foks S. S. Mean squared error, deconstructed. Journal of Advances in Modeling Earth Systems. 13 (12), e2021MS002681 (2021).
- Ketkar N., Moolayil J. Introduction to PyTorch. Deep Learning with Python. Berkeley (2021).
- Imambi S., Prakash K. B., Kanagachidambaresan G. R. PyTorch. In Programming with TensorFlow, K. B. Prakash and G. R. Kanagachidambaresan, Eds. Cham, Switzerland, Springer, 87–104 (2021).
- Zamanlooy B., Mirhassani M. Efficient VLSI implementation of neural networks with hyperbolic tangent activation function. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 22 (1), 39–48 (2014).
- Ali M. H. E., Abdel-Raman A. B., Badry E. A. Developing novel activation functions based deep learning LSTM for classification. IEEE Access. 10, 97259–97275 (2022).
- Farzad A., Mashayekhi H., Hassanpour H. A comparative performance analysis of different activation functions in LSTM networks for classification. Neural Computing and Applications. 31 (7), 2507–2521 (2019).
- Fernandez A., Mali A. Stable and robust deep learning by hyperbolic tangent exponential linear unit (TeLU). Preprint arXiv:2402.02790 (2024).
- Sharma S., Sharma S., Athaiya A. Activation functions in neural networks. International Journal of Engineering Applied Sciences and Technology. 4 (12), 310–316 (2020).
- Narayan S. The generalized sigmoid activation function: Competitive supervised learning. Information Sciences. 99 (1–2), 69–82 (1997).
- Schmidt-Hieber J. Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics. 48 (4), 1875–1897 (2020).
- Ramachandran P., Zoph B., Le Q. V. Searching for activation functions. Preprint arXiv:1710.05941 (2017).
- Mercioni M. A., Holban S. P-Swish: Activation function with learnable parameters based on Swish activation function in deep learning. 2020 International Symposium on Electronics and Telecommunications (ISETC). 1–4 (2020).
- Javid I., Ghazali R., Syed I., Husaini N. A., Zulqarnain M. Developing novel T-Swish activation function in deep learning. 2022 International Conference on IT and Industrial Technologies (ICIT). 1–7 (2022).
- Nair V., Hinton G. E. Rectified linear units improve restricted Boltzmann machines. ICML'10: Proceedings of the 27th International Conference on International Conference on Machine Learning. 807–814 (2010).