METHODS AND MODELS OF MACHINE LEARNING IN CHEMISTRY AND MATERIAL SCIENCE USING SOLUTE DIFFUSION EXPERIMENT

Oleksii Veretiuk; Nazariy Andrushchak

Machine learning is a logical extension of automation using computer systems. While a large number of different areas of human activity have been improved by algorithmic software, a large number of other problems remain unsolved because creating an algorithm for them is almost impossible. One of these fields is science. The empirical approach is still main approach in achieving results, because for many studies there is still no clear mathematical apparatus. Machine learning is the solution that allows to save resources and speed up the research process. Conducting experiments always leads to collecting data about the results. Machine learning algorithms allow to use this information to build a model capable of predicting the results of experiments or the properties of new compounds. Within the scope of this article, the effectiveness of different algorithms, both standard and ensemble algorithms, is tested on the data obtained from experiments with solute diffusion. As a result, the effectiveness data of various algorithms were calculated using the formulas of root mean square error, as well as mean absolute percentage error. An example and description of the process of building different types of machine learning models are given.

limited amount of data

solute diffusion

[1] K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, and A. Walsh, "Machine learning for molecular and materials science," Nature, vol. 559, no. 7715, pp. 547-555, 2018. [Online]. Available: https://doi.org/10.1038/s41586-018-0337-2

[2] V. Kulyk et al., "Prediction of hardness, flexural strength, and fracture toughness of ZrO₂based ceramics using ensemble learning algorithms," Acta Metallurgica Slovaca, 2023. [Online]. Available: https://doi.org/10.36547/ams.29.2.1819

[3] A. Trostianchyn et al., "Boosting – based model for solving Sm-Coalloy’s maximum energy product prediction task," Archives of Materials Science and Engineering, 2022. [Online]. Available: https://doi.org/10.5604/01.3001.0016.1191

[4] J. G. Wickerand R. I. Cooper, "Will it crystallise? Predicting crystallinity of molecular materials," Cryst Eng Comm, vol. 17, no. 9, pp. 1927-1934, 2015. [Online]. Available: https://doi.org/10.1039/C4CE01912A

[5] J. Kirmanetal., "Machine-learning-accelerated perovskite crystallization," Matter, vol. 2, no. 4, pp. 938-947, 2020. [Online]. Available: https://doi.org/10.1016/j.matt.2020.02.012

[6] R. A. Friesner, "Abinitio quantum chemistry: Methodology and applications," Proceedings of the National Academy of Sciences, vol. 102, no. 19, pp. 6648-6653, 2005.

[7] D. Mauludand A. M. Abdulazeez, "A review on linear regression comprehensive in machine learning," Journal of Applied Science and Technology Trends, vol. 1, no. 4, pp. 140-147, 2020. [Online]. Available: https://doi.org/10.38094/jastt1457

[8] K. Taunk et al., "A brief review of nearest neighbor algorithm for learning and classification," in 2019 international conference oni ntelligent computing and control systems (ICCS), May 2019, pp. 1255-1260. IEEE. [Online]. Available: https://doi.org/10.1109/ICCS45141.2019.9065747

[9] B. Kumar, O. P. Vyas, and R. Vyas, "A comprehensive review on the variants of support vector machines," Modern Physics Letters B, vol. 33, no. 25, 1950303, 2019. [Online]. Available: https://doi.org/10.1142/S0217984919503032

[10] B. Charbutyand A. Abdulazeez, "Classification based on decision treeal gorithm for machine learning," Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 20-28, 2021. [Online]. Available: https://doi.org/10.38094/jastt20165

[11] S. Smys, J. I. Z. Chen, and S. Shakya, "Survey on neural network architectures with deep learning," Journal of Soft Computing Paradigm (JSCP), vol. 2, no. 03, pp. 186-194, 2020. [Online]. Available: https://doi.org/10.36548/jscp.2020.3.007

[12] D. Morgan, "Machine Learning Materials Datasets," [Online]. Available: http://doi.org/10.6084/m9.figshare.7017254.v5

[13] T. O. Hodson, "Root means quare error (RMSE) or mean absolute error (MAE): When touse the mornot," Geoscientific Model Development Discussions, 2022, pp. 1-10. [Online]. Available: https://doi.org/10.5194/gmd-15-5481-2022

[14] T. G. Dietterich, "Ensemble learning," The handbook of brain the oryand neural networks, vol. 2, no. 1, pp. 110-125, 2002.

[15] T. G. Dietterich, "An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization," Machine learning, vol. 40, pp. 139-157, 2000. [Online]. Available: https://doi.org/10.1023/A:1007607513941

[16] D. D. Margineantuand T. G. Dietterich, "Pruning adaptive boosting," in ICML, July 1997, vol. 97, pp. 211-218.

[17] L. Guelman, "Gradient boosting trees for auto insurance losscost modeling and prediction," Expert Systems with Applications, vol. 39, no. 3, pp. 3659-3667, 2012. [Online]. Available: https://doi.org/10.1016/j.eswa.2011.09.058

[18] A. Parmar, R. Katariya, and V. Patel, "A review on random forest: An ensemble classifier," in International conference on intelligent data communication technologies and internet of things (ICICI) 2018, 2019, pp. 758-763. [Online]. Available: https://doi.org/10.1007/978-3-030-03146-6_86

[19] A. DeMyttenaere, B. Golden, B. LeGrand, and F. Rossi, "Mean absolute percentage error for regression models," Neuro computing, vol. 192, pp. 38-48, 2016. [Online]. Available: https://doi.org/10.1016/j.neucom.2015.12.114