The article addresses the problem of optimizing linear regression models under conditions of high dimensionality and multicollinearity, which are typical for modern machine learning applications. The relevance of the study is обусловлена the need to ensure a balance between model generalization ability and interpretability, especially when dealing with noisy and limited datasets. The aim of the study is to investigate and comparatively analyze the effectiveness of Ridge (L2) and Lasso (L1) regularization methods for regression model optimization through the selection of the optimal regularization hyperparameter.
The k-fold cross-validation technique is employed to determine the optimal value of the regularization parameter, enabling an objective assessment of model generalization and prevention of overfitting. The open Wine Quality dataset, containing physicochemical characteristics of red wine samples, is used as an experimental benchmark due to its representative nature for problems with correlated features. Data preprocessing, feature scaling, model training, and evaluation were performed using standard machine learning procedures.
The models were assessed based on mean squared error (MSE), mean absolute error (MAE), and the coefficient of determination (R²). The results demonstrate that both approaches provide effective predictive performance but exhibit different properties. The Ridge model shows greater stability and better generalization ability, while the Lasso model enables automatic feature selection by inducing sparsity in the model structure.
The study identifies relationships between the regularization parameter, prediction accuracy, and structural characteristics of the models. The scientific novelty of the research lies in the improvement of the comparative analysis methodology for regularized regression models through a comprehensive evaluation of the regularization parameter using cross-validation and multiple performance criteria. The practical significance of the study consists in the applicability of the proposed approach for developing robust, interpretable, and efficient regression models across various application domains.
- Bellec, P. C., Lecué, G., & Tsybakov, A. B. (2018). Slope meets Lasso: Improved oracle bounds and adaptive sparsity. Annals of Statistics, 46(6B), 3603–3642. https://doi.org/10.1214/17-AOS1624
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer. 144–146. https://link.springer.com/book/10.1007/978-0-387-45528-0
- Bühlmann, P., & van de Geer, S. (2018). Statistics for high-dimensional data: Methods, theory and applications. Springer.
- Dobriban, E., & Wager, S. (2018). High-dimensional asymptotics of prediction: Ridge regression and classification. Annals of Statistics, 46(1), 247–279. https://doi.org/10.1214/17-AOS1549
- Harris, C. R., et al. (2020). Array programming with NumPy. Nature, 585, 357–362. https://doi.org/10.1038/s41586-020-2649-2 Режим доступу https://numpy.org/doc/stable/
- Hastie, T., Montanari, A., Rosset, S., & Tibshirani, R. (2020). Surprises in high-dimensional ridgeless least squares interpolation. Annals of Statistics, 48(3), 1193–1224. https://doi.org/10.1214/19-AOS1849
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. 37–38.https://web.stanford.edu/~hastie/ElemStatLearn
- Hastie, T., Tibshirani, R., & Tibshirani, R. J. (2020). Best subset, forward stepwise or Lasso? Analysis and recommendations based on extensive comparisons. Statistical Science, 35(4), 579–592. https://doi.org/10.1214/19-STS733
- Hoerl, A. E., & Kennard, R. W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(1), 55–67. Режим доступу https://doi.org/10.2307/1267351
- Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9(3), 90–95. DOI: 10.1109/MCSE.2007.55. Режим доступу https://matplotlib.org/stable/
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer. 250–251. https://hastie.su.domains/ISLR2/ISLRv2_website.pdf
- Javanmard, A., & Montanari, A. (2018). De-biasing the Lasso: Optimal sample size for Gaussian designs. Annals of Statistics, 46(6A), 2593–2622. https://doi.org/10.1214/17-AOS1622
- Liang, T., & Rakhlin, A. (2020). Just interpolate: Kernel “ridgeless” regression can generalize. Annals of Statistics, 48(3), 1329–1347. https://doi.org/10.1214/19-AOS1841
- McKinney, W. (2010). Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, pp. 51–56. https://doi.org/10.25080/Majora-92bf1922-00a. Режим доступу https://pandas.pydata.org/docs/
- Pedregosa, F., et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. Режим доступу https://scikit-learn.org/stable/
- Russell, S. J., & Norvig, P. (2021). Artificial Intelligence: A Modern Approach (4th ed.). Pearson. 695–699. https://aima.cs.berkeley.edu/
- Sur, P., & Candès, E. J. (2019). A modern maximum-likelihood theory for high-dimensional logistic regression. Proceedings of the National Academy of Sciences, 116(29), 14516–14525. https://doi.org/10.1073/pnas.1810420116
- Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x Режим доступу https://www.jstor.org/stable/2346178
- Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint. Cambridge University Press. https://doi.org/10.1017/9781108627771
- Zou, H., & Hastie, T. (2005). Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. Режим доступу https://www.jstor.org/stable/3647580
- Zou, H., & Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics, 37(4), 1733–1751. https://doi.org/10.1214/08-AOS625