GROUP SELECTION OF ELEMENTARY TRAITS IN SCHEMES FOR CONSTRUCTING HYBRID DECISION TREE STRUCTURES

I. F. Povkhan; A. V. Leheza

The object of research is classification trees. The subject of research is methods, algorithms, and schemes for constructing classification trees. The aim of this work is to build an effective method (scheme) for synthesizing classification tree models based on a group assessment of the importance of discrete features within a branched attribute selection. A method for constructing classification trees is proposed, which for a given training sample determines the individual information content (importance) of groups of features (and their combinations) in relation to the initial value of the classification function (data from the training sample). The developed logical tree method, when constructing the next node of the classification tree, tries to identify a group of the most closely interrelated discrete features. This reduces the overall structural complexity of the model (the number of levels of the classification tree), speeds up calculations when recognizing objects based on the model, and also increases the generalizing properties of the model and its enterprise. The proposed scheme for selecting groups of discrete traits allows using the constructed decision tree to assess the informative value (importance) of traits. The developed classification tree method is implemented programmatically and studied when solving the problem of classifying discrete objects represented by a set of features. The conducted experiments confirmed the operability of the proposed mathematical support and allow us to recommend it for use in practice in solving applied problems of classification of discrete objects based on logical classification trees. Prospects for further research may consist in creating a modified method of the logical classification tree by effectively iterating and evaluating sets of elementary features based on the proposed method, optimizing its software implementations, and experimentally studying the proposed method on a wider set of applied problems.

1. Hastie, T., Tibshirani, R., & Friedman, J. (2008). The elements of statistical learning. Springer.
2. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
3. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Chapman and Hall/CRC.
4. Kintonova, A., Mussaif, M., & Gabdreshov, G. (2023). Improvement of iris recognition technology for biometric identification of a person. Eastern-European Journal of Enterprise Technologies, 6(2(120)), 60–69. https://doi.org/ 10.15587/1729-4061.2022. 269948
5. Bodyanskiy, Y. V., Shafronenko, A. Y., & Pliss, I. P. (2021). Credibilistic fuzzy clustering based on evolutionary method of crazy cats. System Research and Information Technologies, 3, 110–119. https://doi.org/10.20535/SRIT.2308-8893.2021.3.09
6. Miyakawa, M. (1989). Criteria for selecting a variable in the construction of efficient decision trees. IEEE Transactions on Computers, 38(1), 130–141.
7. Koskimaki, H., Juutilainen, I., Laurinen, P., & Roning, J. (2008). Two-level clustering approach to training data instance selection: A case study for the steel industry. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2008) (pp. 3044–3049). IEEE. https://doi.org/10.1109/ IJCNN. 2008.4634228
8. Jaman, S. F., & Anshari, M. (2019). Facebook as marketing tools for organizations: Knowledge management analysis. In S. F. Jaman & M. Anshari (Eds.), Dynamic perspectives on globalization and sustainable business in Asia (pp. 92–105). IGI Global. https://doi.org/10.4018/978-1-5225-7095-0.ch007
9. Strilets, V. E., Shmatkov, S. I., & Ugryumov, M. L. (2020). Methods of machine learning in the problems of system analysis and decision making. Karazin Kharkiv National University.
10. De Mántaras, R. L. (1991). A distance-based attribute selection measure for decision tree induction. Machine Learning, 6(1), 81–92.
11. Karimi, K., & Hamilton, H. (2011). Generation and interpretation of temporal decision rules. International Journal of Computer Information Systems and Industrial Management Applications, 3, 314–323.
12. Kamiński, B., Jakubczyk, M., & Szufel, P. (2017). A framework for sensitivity analysis of decision trees. Central European Journal of Operations Research, 26(1), 135–159.
13. Deng, H., Runger, G., & Tuv, E. (2011). Bias of importance measures for multi-valued attributes and solutions. In Proceedings of the 21st International Conference on Artificial Neural Networks (ICANN 2011) (Vol. 2, pp. 293–300). Springer-Verlag. https://doi.org/10.1007/978-3-642-21738-8_38
14. Subbotin, S. A. (2019). Construction of decision trees for the case of low-information features. Radio Electronics, Computer Science, Control, 1, 121–130. https://doi.org/10.15588/1607-3274-2019-1-12
15. Shyshatskyi, A. (2020). Complex methods of processing different data in intellectual systems for decision support system. International Journal of Advanced Trends in Computer Science and Engineering, 9(4), 5583–5590. https://doi.org/10.30534/ ijatcse/ 2020/206942020
16. Painsky, A., & Rosset, S. (2017). Cross-validated variable selection in tree-based methods improves predictive performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2142–2153. https:// doi.org/10.1109/TPAMI. 2016. 2636831
17. Imamovic, D., Babovic, E., & Bijedic, N. (2020). Prediction of mortality in patients with cardiovascular disease using data mining methods. In Proceedings of the 19th International Symposium INFOTEH-JAHORINA (INFOTEH 2020) (pp. 1–4). IEEE. https://doi.org/10.1109/INFOTEH48170.2020.9066297
18. Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. Informatica, 31, 249–268.
19. Zhuravlev, Y. I., & Nikiforov, V. V. (1971). Recognition algorithms based on the calculation of estimates. Cybernetics, 3, 1–11.
20. Povkhan, I., Mulesa, O., Melnyk, O., Bilak, Y., & Polishchuk, V. (2022). The problem of convergence of classifiers construction procedure in the schemes of logical and algorithmic classification trees. In Proceedings of the Second International Workshop on Computer Modeling and Intelligent Systems (CMIS-2022) (Vol. 3137, pp. 1–13). CEUR Workshop Proceedings.
21. Povkhan, I. (2020). A constrained method of constructing the logic classification trees on the basis of elementary attribute selection. In Proceedings of the Second International Workshop on Computer Modeling and Intelligent Systems (CMIS-2020) (Vol. 2608, pp. 843–857). CEUR Workshop Proceedings.
22. Povkhan, I., & Lupei, M. (2020). The algorithmic classification trees. In Proceedings of the IEEE Third International Conference on Data Stream Mining & Processing (DSMP 2020) (pp. 37–44). IEEE.
23. Povkhan, I., Lupei, M., Kliap, M., & Laver, V. (2020). The issue of efficient generation of generalized features in algorithmic classification tree methods. In Proceedings of the International Conference on Data Stream Mining and Processing (DSMP 2020) (pp. 98–113). IEEE.
24. Povkhan, I. (2020). Classification models of flood-related events based on algorithmic trees. Eastern-European Journal of Enterprise Technologies, 6(4), 58–68. https://doi.org/10.15587/1729-4061.2020.219525
25. Rabcan, J., Levashenko, V., Zaitseva, E., Kvassay, M., & Subbotin, S. (2019). Application of fuzzy decision tree for signal classification. IEEE Transactions on Industrial Informatics, 15(10), 5425–5434. https://doi.org/10.1109/TII.2019.2904845
26. Bodyanskiy, Y., Vynokurova, O., Setlak, G., & Pliss, I. (2015). Hybrid neuro-neo-fuzzy system and its adaptive learning algorithm. In Proceedings of the Xth International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT) (pp. 111–114). Lviv.