Математична модель логістичної регресії для бінарної класифікації. Ч. 2. Процеси підготовки, навчання і тестування даних

Петро Кравець; Володимир Пасічник; Микола Проданюк

У цій статті розглянуто теоретичні аспекти логістичної регресії для бінарної класифікації даних, включаючи процеси підготовки даних, навчання, тестування та показники оцінювання моделей.

Сформульовано вимоги до вхідних наборів даних, описано способи кодування категоріальних даних, визначено та обґрунтовано способи масштабування вхідних ознак.

Розроблено схему навчання логістичної регресії методом градієнтного спуску для мінімізації функції втрат відповідним налаштуванням ваг ознак призначеної для класифікації вибірки об’єктів. Визначено особливості побудови рекурентних методів класичного та стохастичного градієнтного спуску. Описано вимоги до організації вибірки даних для моделі багатоетапного навчання з метою уникнення перенавчання або недонавчання логістичної регресії.

Наведено схему тестування навченої логістичної регресії та описано основні метрики якості бінарної класифікації. Відмічено вплив висоти порогу класифікації на ефективність логістичної регресії.

За результатами роботи намічено напрями перспективних досліджень логістичної регресії.

логарифмічна функція втрат

градієнт пониження

поріг класифікації

метрики якості класифікації

Басюк, Т. М., Литвин, В. В., Захарія, Л. М., & Кунанець, Н. Е. (2019). Машинне навчання: навчаль- ний посібник. Львів: Видавництво “Новий Світ – 2000”.
Christensen, R. (1997). Log-Linear Models and Logistic Regression. Springer. ISBN 10: 0387982477 / ISBN 13: 9780387982472.
Hosmer, D. W., & Lemeshow, S. (2000). Applied Logistic Regression. John Wiley & Sons Inc. DOI: https://doi.org/10.1002/0471722146.
Hilbe, J. M. (2009). Logistic Regression Models (1st ed.). Chapman and Hall/CRC. DOI: https://doi.org/10.1201/9781420075779.
Cramer, J. S. (2003). The standard multinomial logit model. In Logit Models from Economics and Other Fields, Chapter 7. Cambridge: Cambridge University Press, 104–125. DOI: https://doi.org/10.1017/CBO9780511615412.008.
Leonard, T. (2020). A course in categorical data analysis. Taylor & Fransis.
Duboue, P. (2020). The Art of Feature Engineering: Essentials for Machine Learning, 1st Edition. Cambridge University Press.
Sun, T., Tang, K., & Li, D. (2022). Gradient Descent Learning With Floats. IEEE Transactions on Cybernetics, 3 (52), 1763–1771. DOI: 10.1109/TCYB.2020.2997399.
Nocedal, J., & Wright, S. (2006). Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer. ISBN 9780387303031.
Ruder, S. (2016). An overview of gradient descent optimization algorithms. Access mode: https://www.ruder.io/optimizing-gradient-descent/ .
Barzilai, J., & Borwein, J. M. (1988). Two-Point Step Size Gradient Methods. IMA Journal of Numerical Analysis, 8, 141–148. DOI: https://doi.org/10.1093/imanum/8.1.141.
Wolfe, P. (1969). Convergence Conditions for Ascent Methods. SIAM Review. 11 (2), 226–235. DOI: https://doi.org/10.1137/1011036. JSTOR 2028111.
Armijo, L. (1966). Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math, 16 (1), 1–3. DOI: https://doi.org/10.2140/pjm.1966.16.1.
Yang, Z. (2022). Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications, 206, 117719. DOI: https://doi.org/10.1016/j.eswa.2022.117719.
Wang, X., Yan, L., & Zhang, Q. (2021). Research on the Application of Gradient Descent Algorithm in Machine Learning. International Conference on Computer Network, Electronic and Automation (ICCNEA), Xi'an, China, 11–15. DOI: https://doi.org/10.1109/ICCNEA53019.2021.00014.
Fehrman, B., Gess, B., & Jentzen, A. (2020). Convergence Rates for the Stochastic Gradient Descent Method for Non-Convex Objective Functions. Journal of Machine Learning Research, 21 (136), 1–48. Access mode: https://www.jmlr.org/papers/volume21/19-636/19-636.pdf.
Shapiro, A., & Wardi, Y. Convergence analysis of gradient descent stochastic algorithms. Journal of Optim Theory Appl., 91, 439–454 (1996). DOI: https://doi.org/10.1007/BF02190104.
Li, X. & Orabona, F. (2019). On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research, 89, 983–992. Access mode: https://proceedings.mlr.press/v89/li19c.html.
Khirirat, S., Feyzmahdavian, H. R., & Johansson, M. (2017). Mini-batch gradient descent: Faster convergence under data sparsity. IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, VIC, Australia, 2880–2887. DOI: https://doi.org/10.1109/CDC.2017.8264077.
Qi, H., Wang, F., & Wang, H. (2023) Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator.Journal of Computational and Graphical Statistics, 32(4), 1348–1360, DOI: 10.1080/10618600.2023.2204130.
Li, M., Zhang, Y., Chen, Y., & Smola, A.Y. (2014). Efficient Mini-batch Training for Stochastic Optimization.KDD’14, August, 24–27, New York, NY, USA. DOI: http://dx.doi.org/ 10.1145/2623330.2623612.
Hossin, M., & Sulaiman, M.N. (2015). A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining & Knowledge Management Process 5(2), 1–11. DOI: https://doi.org/10.5121/ijdkp.2015.5201.
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., & Herrera, F. (2018). Learning from Imbalanced Data Sets (1-st ed.). Springer.