Mathematical Model of Logistic Regression for Binary Classification. Part 2. Data Preparation, Learning and Testing Processes

Petro Kravets; Volodymyr Pasichnyk; Mykola Prodaniuk

This article reviews the theoretical aspects of logistic regression for binary data classification, including data preparation processes, training, testing, and model evaluation metrics.

Requirements for input data sets are formulated, methods of coding categorical data are described, methods of scaling input features are defined and substantiated.

A scheme for learning logistic regression using the gradient descent method has been developed to minimize the loss function by the appropriate adjustment of the weights of the features of the sample of objects intended for classification. Features of the construction of recurrent methods of classical and stochastic gradient descent are determined. The requirements for the organization of the data sample for the multi-stage learning model in order to avoid overtraining or undertraining of logistic regression are described.

The scheme of testing the trained logistic regression is given and the main quality metrics of binary classification are described. The influence of the height of the classification threshold on the efficiency of logistic regression was noted.

According to the results of the work, the directions of perspective research of logistic regression are outlined.

mathematical model

logistic regression

binary classification

data analysis

machine learning

sigmoid function

logarithmic loss function

gradient descent

classification threshold

classification quality metrics

Basyuk, T. M., Lytvyn, V. V., Zakharia, L. M., & Kunanets, N. E. (2019). Machine learning: a study guide (in Ukrainian). Lviv: “Novyy Svit – 2000” Publishing House.
Christensen, R. (1997). Log-Linear Models and Logistic Regression. Springer. ISBN 10: 0387982477 / ISBN 13: 9780387982472.
Hosmer, D. W., & Lemeshow, S. (2000). Applied Logistic Regression. John Wiley & Sons Inc. DOI: https://doi.org/10.1002/0471722146.
Hilbe, J. M. (2009). Logistic Regression Models (1st ed.). Chapman and Hall/CRC. DOI: https://doi.org/10.1201/9781420075779.
Cramer, J. S. (2003). The standard multinomial logit model. In Logit Models from Economics and Other Fields, Chapter 7. Cambridge: Cambridge University Press, 104–125. DOI: https://doi.org/10.1017/CBO9780511615412.008.
Leonard, T. (2020). A course in categorical data analysis. Taylor & Fransis.
Duboue, P. (2020). The Art of Feature Engineering: Essentials for Machine Learning, 1st Edition. Cambridge University Press.
Sun, T., Tang, K., & Li, D. (2022). Gradient Descent Learning With Floats. IEEE Transactions on Cybernetics, 3 (52), 1763–1771. DOI: 10.1109/TCYB.2020.2997399.
Nocedal, J., & Wright, S. (2006). Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer. ISBN 9780387303031.
Ruder, S. (2016). An overview of gradient descent optimization algorithms. Access mode: https://www.ruder.io/optimizing-gradient-descent/ .
Barzilai, J., & Borwein, J. M. (1988). Two-Point Step Size Gradient Methods. IMA Journal of Numerical Analysis, 8, 141–148. DOI: https://doi.org/10.1093/imanum/8.1.141.
Wolfe, P. (1969). Convergence Conditions for Ascent Methods. SIAM Review. 11 (2), 226–235. DOI: https://doi.org/10.1137/1011036. JSTOR 2028111.
Armijo, L. (1966). Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math, 16 (1), 1–3. DOI: https://doi.org/10.2140/pjm.1966.16.1.
Yang, Z. (2022). Adaptive stochastic conjugate gradient for machine learning. Expert Systems with Applications, 206, 117719. DOI: https://doi.org/10.1016/j.eswa.2022.117719.
Wang, X., Yan, L., & Zhang, Q. (2021). Research on the Application of Gradient Descent Algorithm in Machine Learning. International Conference on Computer Network, Electronic and Automation (ICCNEA), Xi'an, China, 11–15. DOI: https://doi.org/10.1109/ICCNEA53019.2021.00014.
Fehrman, B., Gess, B., & Jentzen, A. (2020). Convergence Rates for the Stochastic Gradient Descent Method for Non-Convex Objective Functions. Journal of Machine Learning Research, 21 (136), 1–48. Access mode: https://www.jmlr.org/papers/volume21/19-636/19-636.pdf.
Shapiro, A., & Wardi, Y. Convergence analysis of gradient descent stochastic algorithms. Journal of Optim Theory Appl., 91, 439–454 (1996). DOI: https://doi.org/10.1007/BF02190104.
Li, X. & Orabona, F. (2019). On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research, 89, 983–992. Access mode: https://proceedings.mlr.press/v89/li19c.html.
Khirirat, S., Feyzmahdavian, H. R., & Johansson, M. (2017). Mini-batch gradient descent: Faster convergence under data sparsity. IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, VIC, Australia, 2880–2887. DOI: https://doi.org/10.1109/CDC.2017.8264077.
Qi, H., Wang, F., & Wang, H. (2023) Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator.Journal of Computational and Graphical Statistics, 32(4), 1348–1360, DOI: 10.1080/10618600.2023.2204130.
Li, M., Zhang, Y., Chen, Y., & Smola, A.Y. (2014). Efficient Mini-batch Training for Stochastic Optimization.KDD’14, August, 24–27, New York, NY, USA. DOI: http://dx.doi.org/10.1145/2623330.2623612.
Hossin, M., & Sulaiman, M.N. (2015). A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining & Knowledge Management Process 5(2), 1–11. DOI: https://doi.org/10.5121/ijdkp.2015.5201.
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., & Herrera, F. (2018). Learning from Imbalanced Data Sets (1-st ed.). Springer.