Study of Regression Model Optimization by Means of Regularization

Ihor Popel; Yuriy Shcherbyna

The article addresses the problem of optimizing linear regression models under conditions of high dimensionality and multicollinearity, which are typical for modern machine learning applications. The relevance of the study is обусловлена the need to ensure a balance between model generalization ability and interpretability, especially when dealing with noisy and limited datasets. The aim of the study is to investigate and comparatively analyze the effectiveness of Ridge (L2) and Lasso (L1) regularization methods for regression model optimization through the selection of the optimal regularization hyperparameter.

The k-fold cross-validation technique is employed to determine the optimal value of the regularization parameter, enabling an objective assessment of model generalization and prevention of overfitting. The open Wine Quality dataset, containing physicochemical characteristics of red wine samples, is used as an experimental benchmark due to its representative nature for problems with correlated features. Data preprocessing, feature scaling, model training, and evaluation were performed using standard machine learning procedures.

The models were assessed based on mean squared error (MSE), mean absolute error (MAE), and the coefficient of determination (R²). The results demonstrate that both approaches provide effective predictive performance but exhibit different properties. The Ridge model shows greater stability and better generalization ability, while the Lasso model enables automatic feature selection by inducing sparsity in the model structure.

The study identifies relationships between the regularization parameter, prediction accuracy, and structural characteristics of the models. The scientific novelty of the research lies in the improvement of the comparative analysis methodology for regularized regression models through a comprehensive evaluation of the regularization parameter using cross-validation and multiple performance criteria. The practical significance of the study consists in the applicability of the proposed approach for developing robust, interpretable, and efficient regression models across various application domains.

hyperparameter optimization

machine learning

Wine Quality

Bellec, P. C., Lecué, G., & Tsybakov, A. B. (2018). Slope meets Lasso: Improved oracle bounds and adaptive sparsity. Annals of Statistics, 46(6B), 3603–3642. https://doi.org/10.1214/17-AOS1624
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer. 144–146. https://link.springer.com/book/10.1007/978-0-387-45528-0
Bühlmann, P., & van de Geer, S. (2018). Statistics for high-dimensional data: Methods, theory and applications. Springer.
Dobriban, E., & Wager, S. (2018). High-dimensional asymptotics of prediction: Ridge regression and classification. Annals of Statistics, 46(1), 247–279. https://doi.org/10.1214/17-AOS1549
Harris, C. R., et al. (2020). Array programming with NumPy. Nature, 585, 357–362. https://doi.org/10.1038/s41586-020-2649-2 Режим доступу https://numpy.org/doc/stable/
Hastie, T., Montanari, A., Rosset, S., & Tibshirani, R. (2020). Surprises in high-dimensional ridgeless least squares interpolation. Annals of Statistics, 48(3), 1193–1224. https://doi.org/10.1214/19-AOS1849
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. 37–38.https://web.stanford.edu/~hastie/ElemStatLearn
Hastie, T., Tibshirani, R., & Tibshirani, R. J. (2020). Best subset, forward stepwise or Lasso? Analysis and recommendations based on extensive comparisons. Statistical Science, 35(4), 579–592. https://doi.org/10.1214/19-STS733
Hoerl, A. E., & Kennard, R. W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(1), 55–67. Режим доступу https://doi.org/10.2307/1267351
Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9(3), 90–95. DOI: 10.1109/MCSE.2007.55. Режим доступу https://matplotlib.org/stable/
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer. 250–251. https://hastie.su.domains/ISLR2/ISLRv2_website.pdf
Javanmard, A., & Montanari, A. (2018). De-biasing the Lasso: Optimal sample size for Gaussian designs. Annals of Statistics, 46(6A), 2593–2622. https://doi.org/10.1214/17-AOS1622
Liang, T., & Rakhlin, A. (2020). Just interpolate: Kernel “ridgeless” regression can generalize. Annals of Statistics, 48(3), 1329–1347. https://doi.org/10.1214/19-AOS1841
McKinney, W. (2010). Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, pp. 51–56. https://doi.org/10.25080/Majora-92bf1922-00a. Режим доступу https://pandas.pydata.org/docs/
Pedregosa, F., et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. Режим доступу https://scikit-learn.org/stable/
Russell, S. J., & Norvig, P. (2021). Artificial Intelligence: A Modern Approach (4th ed.). Pearson. 695–699. https://aima.cs.berkeley.edu/
Sur, P., & Candès, E. J. (2019). A modern maximum-likelihood theory for high-dimensional logistic regression. Proceedings of the National Academy of Sciences, 116(29), 14516–14525. https://doi.org/10.1073/pnas.1810420116
Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x Режим доступу https://www.jstor.org/stable/2346178
Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint. Cambridge University Press. https://doi.org/10.1017/9781108627771
Zou, H., & Hastie, T. (2005). Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. Режим доступу https://www.jstor.org/stable/3647580
Zou, H., & Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics, 37(4), 1733–1751. https://doi.org/10.1214/08-AOS625