Research into machine learning algorithms for the construction of mathematical models of multimodal data classification problems

Nataliya Boyko

Currently, machine learning algorithms (ML) are increasingly integrated into everyday life. There are many areas of modern life where classification methods are already used. Methods taking into account previous predictions and errors that are calculated as a result of data integration to obtain forecasts for obtaining the classification result are investigated. A general overview of classification methods is conducted. Experiments on machine learning algorithms for multimodal data are performed. It is important to consider all the characteristics of metrics and features when using ML algorithms to predict multimodal data. The main advantages and disadvantages of Gradient Boosting, Random Forest, Logistic Regression and XGBoost algorithms are analyzed in the work.

classification

binary classification

“Dataset South African Heart Disease”, https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/76SIQD
“Open Machine Learning Course: Gradient Boosting Machines”, http://uc-r.github.io/gbm_regression
P. Rathi and A. Sharma, “A review paper on prediction of diabetic retinopathy using data mining techniques”, in International journal of innovative research in technology, Vol. 4, pp. 292-297, 2017.
N. Boyko and K. Boksho, “Application of the Naive Bayesian Classifier in Work on Sentimental Analysis of Medical Data”, in Proc. 3rd International Conference on Informatics & Data-Driven Medicine (IDDM 2020), Växjö, Sweden, pp. 230-239, 2020.
C. Maklin, “XGBoost Python Example”, https://towardsdatascience.com/xgboost-python-example-42777d01001e, last accessed 2020/12/21.
R.M.V. Humphris, Testing Algorithm Fairness Metrics for Binary Classification Problems by Supervised Machine Learning Algorithms, Vrije Universiteit Amsterdam, 2020.
R.S. Brid, “Boosting”, https://medium.com/greyatom/ boosting-ce84639a805d, last accessed 2018/11/01.
J. Brownlee, “A gentle introduction to xgboost for applied machine learning”, https://machinelearningmastery.com/ gentle-introduction-xgboost-applied-machine-learning/, last accessed 2019/11/18.
N. Boyko and R. Hlynka, “Application of Machine Algorithms for Classification and Formation of the Optimal Plan”, in Proceedings of the 5th International Conference on Computational Linguistics and Intelligent Systems (COLINS 2021), Vol. 1, Main Conference Lviv, Ukraine, April 22-23, pp. 1853-1865, 2021.
J. Brownlee, “A gentle introduction to the bootstrap method”, https://machinelearningmastery.com/ a-gentle-introduction-to-the-bootstrap-method/, last accessed 2020/06/29.
A. Chakure, “Decision tree classification”, https://towardsdatascience. com/decision-tree-classification-de64fc4d5aac, last accessed 2019/11/28.
C. Cortes and V.N. Vapnik, “Support-vector networks”, Machine Learning, Vol. 20(3), pp.273–297, 1995. doi: https://doi.org/10.1023/A:1022627411411.
N. Boyko, “Information system of catering selection by using clustering analysis”, in 2018 IEEE Ukraine Student, Young Professional and Women in Engineering Congress (UKRSYW) October 2 – 6, Kyiv, Ukraine, pp.7-13, 2018.
“DataCamp. Hyperparameter tuning with randomizedsearchcv”, https: //campus.datacamp.com/courses/supervised-learning-with-scikit-learn/ fine-tuning-your-model?ex=11, last accessed 2020/06/11.
“DeZyre. Metrics for evaluating machine learning algorithms”, https://www.dezyre.com/data-science-in-python-tutorial/ performance-metrics-for-machine-learning-algorithm, last accessed 2019/11/28.