Дослідження алгоритмів машинного навчання для побудови математичних моделей задач класифікації мультимодальних даних

Наталія Бойко

В даний час алгоритми машинного навчання (ML) все більше інтегруються у повсякденне життя. Можна навести безліч сфер сучасного життя, де вже застосовуються методи класифікації. Досліджуються методи, які враховують попередні передбачення та помилки, які обчислюються в результаті інтегрування даних задля отримання прогнозів, для отримання результату класифікації. Проведено загальний огляд методів класифікації. Здійснено експерименти над алгоритмами машинного навчання для мультимодальних даних. Важливо враховувати всі характеристики метрик та ознак при використанні алгоритмів ML для прогнозування мультимодальних даних. В роботі проаналізовані основні переваги та недоліки алгоритмів Gradient Boosting, Random Forest, Logistic Regression та XGBoost.

“Dataset South African Heart Disease”, https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/76SIQD
“Open Machine Learning Course: Gradient Boosting Machines”, http://uc-r.github.io/gbm_regression
P. Rathi and A. Sharma, “A review paper on prediction of diabetic retinopathy using data mining techniques”, in International journal of innovative research in technology, Vol. 4, pp. 292-297, 2017.
N. Boyko and K. Boksho, “Application of the Naive Bayesian Classifier in Work on Sentimental Analysis of Medical Data”, in Proc. 3rd International Conference on Informatics & Data-Driven Medicine (IDDM 2020), Växjö, Sweden, pp. 230-239, 2020.
C. Maklin, “XGBoost Python Example”, https://towardsdatascience.com/xgboost-python-example-42777d01001e, last accessed 2020/12/21.
R.M.V. Humphris, Testing Algorithm Fairness Metrics for Binary Classification Problems by Supervised Machine Learning Algorithms, Vrije Universiteit Amsterdam, 2020.
R.S. Brid, “Boosting”, https://medium.com/greyatom/ boosting-ce84639a805d, last accessed 2018/11/01.
J. Brownlee, “A gentle introduction to xgboost for applied machine learning”, https://machinelearningmastery.com/ gentle-introduction-xgboost-applied-machine-learning/, last accessed 2019/11/18.
N. Boyko and R. Hlynka, “Application of Machine Algorithms for Classification and Formation of the Optimal Plan”, in Proceedings of the 5th International Conference on Computational Linguistics and Intelligent Systems (COLINS 2021), Vol. 1, Main Conference Lviv, Ukraine, April 22-23, pp. 1853-1865, 2021.
J. Brownlee, “A gentle introduction to the bootstrap method”, https://machinelearningmastery.com/ a-gentle-introduction-to-the-bootstrap-method/, last accessed 2020/06/29.
A. Chakure, “Decision tree classification”, https://towardsdatascience. com/decision-tree-classification-de64fc4d5aac, last accessed 2019/11/28.
C. Cortes and V.N. Vapnik, “Support-vector networks”, Machine Learning, Vol. 20(3), pp.273–297, 1995. doi: https://doi.org/10.1023/A:1022627411411.
N. Boyko, “Information system of catering selection by using clustering analysis”, in 2018 IEEE Ukraine Student, Young Professional and Women in Engineering Congress (UKRSYW) October 2 – 6, Kyiv, Ukraine, pp.7-13, 2018.
“DataCamp. Hyperparameter tuning with randomizedsearchcv”, https: //campus.datacamp.com/courses/supervised-learning-with-scikit-learn/ fine-tuning-your-model?ex=11, last accessed 2020/06/11.
“DeZyre. Metrics for evaluating machine learning algorithms”, https://www.dezyre.com/data-science-in-python-tutorial/ performance-metrics-for-machine-learning-algorithm, last accessed 2019/11/28.