ПРОБЛЕМА ЗБІЖНОСТІ ПРОЦЕДУРИ ПОБУДОВИ КЛАСИФІКАТОРІВ У СХЕМАХ ЛОГІЧНИХ І АЛГОРИТМІЧНИХ ДЕРЕВ КЛАСИФІКАЦІЇ

І. Ф. Повхан

Розглядається проблема збіжності процедури синтезу схем класифікаторів у методах логічних і алгоритмічних дерев класифікації. Запропонована верхня оцінка складності схеми дерева алгоритмів у задачі апроксимації масиву реальних даних набором узагальнених ознак з фіксованим критерієм зупинки процедури розгалуження на етапі побудови дерева класифікації. Даний підхід дає змогу забезпечити необхідну точність моделі, оцінити її складність, знизити кількість розгалужень та досягти необхідних показників ефективності. Вперше для методів побудови структур логічних і алгоритмічних дерев класифікації дана верхня оцінки збіжності побудови дерев класифікації. Запропонована оцінка збіжності процедури побудови класифікаторів для структур ЛДК/АДК дає можливість будувати економні та ефективні моделі класифікації заданої точності. Метод побудови алгоритмічного дерева класифікації базується на поетапній апроксимації начальної вибірки довільного об'єму та структури набором незалежних алгоритмів класифікації. Даний метод при формуванні поточної вершини алгоритмічного дерева, вузла, узагальненої ознаки забезпечує виділення найбільш ефективних, якісних автономних алгоритмів класифікації з початкового набору. Методи синтезу логічних і алгоритмічних дерев класифікації були реалізовані в бібліотеці алгоритмів програмної системи "ОРІОН ІІІ" для розв'язку різноманітних прикладних задач штучного інтелекту. Проведені практичні застосування підтвердили працездатність побудованих моделей дерев класифікації та розробленого програмного забезпечення. В роботі наведена оцінка збіжності процедури побудови схем розпізнавання для випадків логічних і алгоритмічних дерев класифікації в умовах слабкого та сильного розділення класів початкової начальної вибірки.

логічне дерево

алгоритмічне дерево

класифікатор

розпізнавання образів

ознака

начальна вибірка

[1] Hastie, T., Tibshirani, R., & Friedman, J. (2008). The Elements of Statistical Learning. Berlin, Springer. https://doi.org/10.1007/978-0-387-84858-7

[2] Quinlan, J. R. (1986). Induction of Decision Trees, Machine Learning, 1, 81–106. https://doi.org/10.1007/BF00116251

[3] Breiman, L. L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Boca Raton, Chapman and Hall/CRC.

[4] Lupei, M., Mitsa, A., Repariuk, V., & Sharkan, V. (2020). Identification of authorship of Ukrainian-language texts of journalistic style using neural networks. Eastern-European Journal of Enterprise Technologies, 1-2(103), 30–36. https://doi.org/10.15587/1729-4061.2020.195041

[5] Subbotin, S. A., & Oliinyk, A. A. (2017). The Dimensionality Reduction Methods Based on Computational Intelligence in Problems of Object Classification and Diagnosis. Szewczyk, R., Kaliczyńska, M. (eds) Recent Advances in Systems, Control and Information Technology. SCIT 2016. Advances in Intelligent Systems and Computing, vol 543, 11–19. Springer, Cham. https://doi.org/10.1007/978-3-319-48923-0_2

[6] Miyakawa, M. (1989). Criteria for selecting a variable in the construction of efficient decision trees, IEEE Transactions on Computers, 38(1), 130–141. https://doi.org/10.1109/12.8736

[7] Koskimaki, H., Juutilainen, I., Laurinen, P., & Roning, J. Two-level clustering approach to training data instance selection: a case study for the steel industry, Neural Networks: International Joint Conference (IJCNN-2008), Hong Kong, 1–8 June 2008: proceedings. Los Alamitos, IEEE, 2008, 3044–3049. https://doi.org/10.1109/IJCNN.2008.4634228

[8] Subbotin, S. (2013). The neuro-fuzzy network synthesis and simplification on precedents in problems of diagnosis and pattern recognition, Optical Memory and Neural Networks, 22(2), 97–103. https://doi.org/10.3103/S1060992X13020082

[9] Subbotin, S. A. (2013). Methods of sampling based on exhaustive and evolutionary search, Automatic Control and Computer Sciences, 47(3), 113–121. https://doi.org/10.3103/S0146411613030073

[10] De Mántaras, R. L. (1991). A distance-based attribute selection measure for decision tree induction, Machine learning, 6(1), 81–92. https://doi.org/10.1023/A:1022694001379

[11] Karimi, K., & Hamilton, H.J. (2011). Generation and Interpretation of Temporal Decision Rules, International Journal of Computer Information Systems and Industrial Management Applications, 3, 314–323.

[12] Kamiński, B., Jakubczyk, M., & Szufel, P. (2017). A framework for sensitivity analysis of decision trees, Central European Journal of Operations Research, 26(1), 135–159. https://doi.org/10.1007/s10100-017-0479-6

[13] Deng, H., Runger, G., & Tuv, E. (2011). Bias of importance measures for multi-valued attributes and solutions, Proceedings of the 21st International Conference on Artificial Neural Networks (ICANN), 293–300. https://doi.org/10.1007/978-3-642-21738-8_38

[14] Subbotin, S. A. (2019). Construction of decision trees for the case of low-information features, Radio Electronics, Computer Science, Control, 1, 121–130. https://doi.org/10.15588/1607-3274-2019-1-12

[15] Deng, H., Runger, G., & Tuv, E. (2011). Bias of importance measures for multi-valued attributes and solutions, 21st International Conference on Artificial Neural Networks (ICANN), Espoo, 14–17 June 2011: proceedings. Berlin, Springer-Verlag, 2, 293–300. https://doi.org/10.1007/978-3-642-21738-8_38

[16] Painsky, A., & Rosset, S. (2017). Cross-validated variable selection in tree-based methods improves predictive performance, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2142–2153. https://doi.org/10.1109/TPAMI.2016.2636831

[17] Subbotin, S. A. (2014). Methods and characteristics of locality preserving transformations in the problems of computational intelligence, Radio Electronics, Computer Science, Control, 1, 120–128. https://doi.org/10.15588/1607-3274-2014-1-17

[18] Kotsiantis, S. B. (2007). Supervised Machine Learning: A Review of Classification Techniques, Informatica, 31, 249–268.

[19] Zhuravlev, Yu. I., & Nikiforov, V. V. (1971). Recognition algorithms based on the calculation of estimates, Cybernetics, 3, 1–11.

[20] Vasilenko, Y. A., Vasilenko, E. Y., & Povkhan, I. F. (2003). Branched feature selection method in mathematical modeling of multi-level image recognition systems, Artificial Intelligence, 7, 246−249.

[21] Povkhan, I. (2020). A constrained method of constructing the logic classification trees on the basis of elementary attribute selection, CEUR Workshop Proceedings: Proceedings of the Second International Workshop on Computer Modeling and Intelligent Systems (CMIS-2020), Zaporizhzhia, Ukraine, April 15–19, 2020. Zaporizhzhia, 2608, 843–857. https://doi.org/10.32782/cmis/2608-63

[22] Vasilenko, Y. A., Vasilenko, E. Y., & Povkhan, I. F. (2004). Conceptual basis of image recognition systems based on the branched feature selection method, European Journal of Enterprise Technologies, 7(1), 13–15.

[23] Povkhan, I., & Lupei, M. (2020). The algorithmic classification trees. Proceedings of the "2020 IEEE Third International Conference on Data Stream Mining & Processing (DSMP)", August 21–25, Lviv, Ukraine, 37–44. https://doi.org/10.1109/DSMP47368.2020.9204198

[24] Povkhan, I., Lupei, M., Kliap, M., & Laver, V. (2020). The issue of efficient generation of generalized features in algorithmic classification tree methods. International Conference on Data Stream Mining and Processing: DSMP Data Stream Mining & Processing, Springer, Cham, 98–113. https://doi.org/10.1007/978-3-030-61656-4_6

[25] Povkhan, I. (2020). Classification models of flood-related events based on algorithmic trees. Eastern-European Journal of Enterprise Technologies, 6(4), 58–68. https://doi.org/10.15587/1729-4061.2020.219525

[26] Rabcan, J., Levashenko, V., Zaitseva, E., Kvassay, M., & Subbotin, S. (2019). Application of Fuzzy Decision Tree for Signal Classification. IEEE Transactions on Industrial Informatics, 15(10), 5425–5434. https://doi.org/10.1109/TII.2019.2904845

[27] Utgoff, P. E. (1989). Incremental induction of decision trees. Machine learning, 4(2), 161–186. https://doi.org/10.1023/A:1022699900025

[28] Hyafil, L., & Rivest, R. L. (1976). Constructing optimal binary decision trees is npcomplete. Information Processing Letters, 5(1), 15–17. https://doi.org/10.1016/0020-0190(76)90095-8

[29] Wang, H., & Hong, M. (2019). Online ad effectiveness evaluation with a two-stage method using a Gaussian filter and decision tree approach. Electronic Commerce Research and Applications. 35, Article 100852. https://doi.org/10.1016/j.elerap.2019.100852

[30] Kaftannikov, I. L., & Parasich, A. V. (2015). Decision Trees Features of Application in Classification Problems. Bulletin of the South Ural State University. Ser. Computer Technologies, Automatic Control, Radio Electronics, 15(3), 26–32. https://doi.org/10.14529/ctcr150304

[31] Povhan, I. F. (2020). Logical recognition tree construction on the basis a step-to-step elementary attribute selection. Radio Electronics, Computer Science, Control, 2, 95–106. https://doi.org/10.15588/1607-3274-2020-2-10

[32] Bodyanskiy, Y., Vynokurova, O., Setlak, G., & Pliss, I. (2015). Hybrid neuro-neo-fuzzy system and its adaptive learning algorithm, Xth Scien. and Tech. Conf. "Computer Sciences and Information Technologies" (CSIT), 111–114. https://doi.org/10.1109/STC-CSIT.2015.7325445

[33] Srikant, R., Agrawal, R. (1997). Mining generalized association rules, Future Generation Computer Systems, 13(2), 161–180. https://doi.org/10.1016/S0167-739X(97)00019-8

[34] Vasilenko, Y. A., & Vashuk, F. G. (2012). General estimation of minimization of tree logical structures, European Journal of Enterprise Technologies, 1/4(55), 29–33.

[35] Kushneryk, P., Kondratenko, Y., & Sidenko, I. (2019). Intelligent dialogue system based on deep learning technology. 15th International Conference on ICT in Education, Research, and Industrial Applications: PhD Symposium (ICTERI 2019: PhD Symposium), Kherson, Ukraine, 2403, 53–62.

[36] Kotsovsky, V., Geche, F., & Batyuk, A. (2018). Finite generalization of the offline spectral learning. Proceedings of the 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine August 21–25, 356–360. https://doi.org/10.1109/DSMP.2018.847858