Mathematical Model of Logistic Regression for Binary Classification. Part 1. Regression Models of Data Generalization

2024;
: pp. 290 - 321
1
Lviv Polytechnic National University, Information Systems and Networks Department
2
Lviv Politechnik National University
3
Lviv Polytechnic National University, Ukraine

In this article, the mathematical justification of logistic regression as an effective and simple to implement method of machine learning is performed.

A review of literary sources was conducted in the direction of statistical processing, analysis and classification of data using the logistic regression method, which confirmed the popularity of this method in various subject areas.

The logistic regression method was compared with the linear and probit regression methods regarding the possibility of predicting the probabilities of events. In this context, the disadvantages of linear regression and the advantages and affinity of logit and probit regression methods are noted. It is indicated that the possibility of forecasting probabilities and binary classification by the method of logistic regression is provided by the use of a sigmoid function with the property of compressive transformation of an argument with an unlimited numerical value into a limited range from 0 to 1 real value of the function. The derivation of the sigmoid function in two different ways is described: based on the model of the logarithm of the odds of events and the model of logistic population growth.

Based on the method of maximum likelihood, the construction of a logarithmic loss function was demonstrated, the use of which made it possible to move from a multi-extremal nonlinear regression problem to a unimodal optimization problem. Methods of regularization of the loss function are presented to control the complexity and prevent retraining of the logistic regression model.

  1. Basyuk, T. M., Lytvyn, V. V., Zakharia, L. M., & Kunanets, N. E. (2019). Machine learning: a study guide (in Ukrainian). Lviv: “Novyy Svit – 2000” Publishing House.
  2. Kumar, P. P., Vairachilai, S., Sirisha, P., & Mohanty, S. N. (2021). Recommender Systems: Algorithms and Applications. Boca Raton, London, New York: CRC Press. DOI: https://doi.org/10.1201/9780367631888.
  3. Haghighi, M. H. Z. (2023). Analyzing astronomical data with machine learning techniques. Astronomical & Astrophysical Transactions, 33(3), 323–336. DOI: https://doi.org/10.48550/arXiv. 2302.11573.
  4. Matviychuk, A., & Artyukh, O. (2022) Assessment of credit risks of small and medium-sized enterprises by methods of intellectual data analysis (in  Ukrainian). Scientific Notes of the National University of Ostroh Academy, “Economics” Series: scientific journal, 26(54), 114–120. DOI: 10.25264/2311-5149-2022-26(54)-114-120.
  5. Golovach, K. S., Olifir, I. A., & Golovach, O. P. (2022). Recognition of crisis phenomena and methods of their detection (in Ukrainian). Business navigator: science and production. magazine, 1(68), 155–159. DOI: https://doi.org/10.32847/business-navigator.68-24.
  6. Wang, Z., Sun, X., Wang, B., Shi, S., & Chen, X. (2023). Lasso-Logistic regression model for the identification of serum biomarkers of neurotoxicity induced by strychnos alkaloids. Toxicology Mechanisms and Methods, 33(1), 65–72. DOI: https://doi.org/10.1080/15376516.2022.2086088.
  7. Nottingham, Q. J., Birch, J. B., & Bodt, B. A. (2000). Local logisitic regression an application to army penetration data. Journal of Statistical Computation and Simulation, 66(1), 35–50, DOI: https://doi.org/10.1080/00949650008812010.
  8. Madani, N., Maleki, M., & Soltani-Mohammadi, S. (2022). Geostatistical modeling of heterogeneous geo-clusters in a copper deposit integrated with multinomial logistic regression: An exercise on resource estimation. Ore Geology Reviews, 150, 105132, 1–22. DOI: https://doi.org/10.1016/ j.oregeorev.2022.
  9. Yaseliani, M., & Khedmati, M. (2023). Prediction of Heart Diseases Using Logistic Regression and Likelihood Ratios. International Journal of Industrial Engineering & Production Research, 34(1), 1–15. DOI: https://doi.org/10.22068/ijiepr.34.1.5.
  10. Hu, X., Luo, H., Guo, M., & Wang, J. (2022). Ecological technology evaluation model and its application based on Logistic Regression. Ecological Indicators, 136 (108641), 1–11. DOI: https://doi.org/10.1016/j.ecolind.2022.108641.
  11. Zomchak, L. M., & Starchevska, I. M. (2022). Modeling the economic growth of Ukraine using logistic regression (in Ukrainian). Scientific Bulletin of the Poltava University of Economics and Trade. Series “Economic Sciences”, 2(106), 78–83. DOI: https://doi.org/10.37734/2409-6873-2022-2-11.
  12. Ahn, Y. H., Park, K. R., Kim, D. H., & Cho, H. J. (2021). Logistic Regression Algorithm-Based Product Recommendation System Model. Journal of Computational and Theoretical Nanoscience, 18(5), 1429–1435. DOI: https://doi.org/10.1166/jctn.2021.9619.
  13. Hernández, J., Etemadi, A., Roberts-Baca, S., & Muthyapu, V. K. (2021, April). Developing a logistic regression method for valuation of grid-level energy storage systems. In 2021 IEEE Conference on Technologies for Sustainability (SusTech), 1–8. DOI: https://doi.org/10.1109/SusTech51236.2021. 9467419.
  14. Tan, K. L., Lee, C. P., & Lim, K. M. (2023). A survey of sentiment analysis: Approaches, datasets, and future research. Applied Sciences, 13(7), 4550. DOI: https://doi.org/10.3390/app13074550.
  15. Indu, R., & Dimri, S. C. (2023). Detecting Spam E-mails with Content and Weight-Based Binomial Logistic Model. Journal of Web Engineering, 22(7), 939–959. DOI: https://doi.org/10.13052/jwe1540-9589.2271.
  16. Berezka, K. M., Kovalchuk, O. Ya., Banakh, S. V., Zlyvko, S. V., & Hrechaniuk, R. (2022). A Binary Logistic Regression Model for Support Decision Making in Criminal Justice. Folia Oeconomica Stetinensia, 22(1), 1–17. DOI: https://doi.org/10.2478/foli-2022-0001.
  17. Zhang, L. (2022). Smart Marketing Data Collection and Analysis based on Logistic Regression Algorithm. 3rd International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 1611- 1614. DOI: https://doi.org/10.1109/ICOSEC54921.2022.9951974.
  18. Fayaz, S. A., Zaman, M., & Butt, M. A. (2021). An application of logistic model tree (LMT) algorithm to ameliorate Prediction accuracy of meteorological data. International Journal of Advanced Technology and Engineering Exploration, 8(84), 1424–1440. DOI: https://doi.org/10.19101/IJATEE. 2021.874586.
  19. Niu, L. (2020). A review of the application of logistic regression in educational research: common issues, implications, and suggestions. Educational Review, 72(1), 41–67. DOI: https://doi.org/10.1080/ 00131911.2018.1483892.
  20. Rivera, P. P., & Garashchuk, A. (2023). Strategic partner election: proposal for a Binary Logistic Model for the European Union. Humanities and Social Sciences Communications, 10(1), 1–13. DOI: https://doi.org/10.1057/s41599-023- 02121-y.
  21. Velu, A. (2021). Application of logistic regression models in risk management.  International Journal of Innovations in Engineering Research and Technology, 8(04), 251–260. Retrieved from https://repo.ijiert.org/index.php/ijiert/article/view/2594.
  22. Gai, R., & Zhang, H. (2023). Prediction model of agricultural water quality based on optimized logistic regression algorithm. EURASIP Journal on Advances in Signal Processing, 21, 1–14, DOI: https://doi.org/10.1186/s13634- 023-00973-9.
  23. Chen, Q. (2022). Research on identifying psychological health problems of college students by logistic regression model based on data mining. Applied Mathematics and Nonlinear Sciences, 8(1), 2253–2262. DOI: https://doi.org/10.2478/amns.2021.2.00195.
  24. Borucka, A. (2020). Logistic regression in modeling and assessment of transport services. Open Engineering, 10, 26–34. DOI: https://doi.org/10.1515/eng-2020-0029.
  25. Kang, R. (2020). Using logistic regression for persona segmentation in tourism: A case study. Social Behavior and Personality: an international journal, 48(4), 1–16. DOI: https://doi.org/10.2224/sbp.8793.
  26. Christensen, R. (1997). Log-Linear Models and Logistic Regression. Springer.  ISBN 10: 0387982477 / ISBN 13: 9780387982472.
  27. Hosmer, D. W., & Lemeshow, S. (2000). Applied Logistic Regression. John Wiley & Sons, Inc. DOI: https://doi.org/10.1002/0471722146.
  28. Hilbe, J. M. (2009). Logistic Regression Models (1st ed.). Chapman and Hall/CRC. DOI: https://doi.org/10.1201/9781420075779.
  29. Cramer, J. S. (2003). The standard multinomial logit model. In Logit Models from Economics and Other Fields, Chapter 7. Cambridge: Cambridge University Press, 104–125. DOI:https://doi.org/10.1017/CBO9780511615412.008.