Machine learning for the analysis of quality of life using the World Happiness Index and Human Development Indicators

Machine learning algorithms play an important role in analyzing complex data in research across various fields.  In this paper, we employ multiple regression algorithms and statistical techniques to investigate the relationship between objective and subjective quality of life indicators and reveal the key factors affecting happiness at the international level based on data from the Human Development Index and the World Happiness Index covering the period from 2015 to 2021.  The Pearson correlation analysis showed that happiness is related to the HDI score and GNI per capita.  The best-performing model for forecasting happiness was the random forest regression, with a R2 score of 0.93667, a mean squared error of 0.0033048, and a root mean squared error of 0.05748, followed by the XGBoost regression and the Decision Tree regression, respectively.  These models indicated that GNI per capita is the most significant feature in predicting happiness.

  1. Radford J., Joseph K.  Theory In, Theory Out: The Uses of Social Theory in Machine Learning for Social Science.  Frontiers in Big Data.  3, 18 (2020).
  2. Grimmer J., Roberts M. E., Stewart B. M.  Machine Learning for Social Science: An Agnostic Approach.  Annual Review of Political Science.  24 (1), 395–419 (2021).
  3. WHOQOL – Measuring Quality of Life| The World Health Organization.  https://www.who.int/tools/whoqol.
  4. Davis E., Waters E., Shelly A., Gold L.  Children and Adolescents, Measuring the Quality of Life of.  International Encyclopedia of Public Health. 641–648 (2008).
  5. Helliwell J. F., Layard R., Sachs J. D., Neve J.-E. D., Aknin L. B., Wang S.  World Happiness Report (2022).  https://worldhappiness.report/ed/2022/.
  6. Human Development Index, United Nations.  https://hdr.undp.org/data-center/human-development-index.
  7. Taner M., Sezen B., Mihci H.  An Alternative Human Development Index Considering Unemployment.  South East European Journal of Economics and Business.  6 (1), 45–60 (2011).
  8. Martinez R.  Inequality and the new human development index.  Applied Economics Letters.  19 (6), 533–535 (2012).
  9. Saputri T. R. D., Lee S. D.  A Study of Cross-National Differences in Happiness Factors Using Machine Learning Approach.  International Journal of Software Engineering and Knowledge Engineering.  25 (09n10), 1699–1702 (2015).
  10. Basu R., Behera S. K., Adak D. K.  Human Development and Happiness: Are the Two Interlinked?  International Journal of Indian Psychology.  6 (3), 141–150 (2018).
  11. Yaman E., Music-Kilic A., Zerdo Z.  Using Classification to Determine Whether Personality Profiles of Countries Affect Various National Indexes.  2018 International Conference on Control, Artificial Intelligence, Robotics & Optimization (ICCAIRO). 48–52 (2018).
  12. Carlsen L.  Happiness as a sustainability factor. The world happiness index: a posetic-based data analysis.  Sustainability Science.  13 (2), 549–571 (2018).
  13. Chaipornkaew P., Prexawanprasut T.  A Prediction Model for Human Happiness Using Machine Learning Techniques.  2019 5th International Conference on Science in Information Technology (ICSITech). 33–37 (2019).
  14. Riyantoko P. A.  Southeast Asia Happiness Report in 2020 Using Exploratory Data Analysis.  International Journal of Computer, Network Security and Information System.  2 (1), 1 (2020).
  15. Dixit S., Chaudhary M., Sahni N.  Network Learning Approaches to study World Happiness. ArXiv:2007.09181 (2020).
  16. Okagbue H. I., Oguntunde P. E., Bishop S. A., Adamu P. I., Akhmetshin E. M., Iroham C. O.  Significant Predictors of Henley Passport Index.  Journal of International Migration and Integration.  22 (1), 21–32 (2021).
  17. Jannani A., Sael N., Benabbou A.  Predicting Quality of Life using Machine Learning: case of World Happiness Index.  2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT). 1–6 (2021).
  18. Pawliczek A., Kurowska-Pysz J., Smilnak R.  Relation between Globe Latitude and the Quality of Life: Insights for Public Policy Management.  Sustainability.  14 (3), 1461 (2022).
  19. Farooq S. A., Shanmugam S. K.  A Performance Analysis of Supervised Machine Learning Techniques for COVID-19 and Happiness Report Dataset.  Sentimental Analysis and Deep Learning. 591–601 (2022).
  20. Khder M. A., Sayf M., Fujo S. W.  Analysis of World Happiness Report Dataset Using Machine Learning Approaches.  International Journal of Advances in Soft Computing and its Applications.  14 (1), 15–34 (2022).
  21. Home. Human Development Reports.  https://hdr.undp.org/.
  22. Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D.  Scikit-learn: Machine Learning in Python.  Journal of Machine Learning Research.  12 (85), 2825–2830 (2011).
  23. sklearn.preprocessing.StandardScaler scikit-learn.  https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.....
  24. Helliwell J. F., Huang H., Wang S., Norton M.  World Happiness, Trust and Deaths under COVID-19 (2021).
  25. Nettleton D.  Chapter 6 – Selection of Variables and Factor Derivation.  Commercial Data Mining. Processing, Analysis and Modeling for Predictive Analytics Projects. 79–104 (2014).
  26. Jolliffe I. T., Cadima J.  Principal component analysis: a review and recent developments.  Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.  374 (2065),  20150202 (2016).
  27. Angelini C.  Regression Analysis.  Reference Module in Life Sciences. Encyclopedia of Bioinformatics and Computational Biology.  1, 722–730 (2019).
  28. Shobha G., Rangaswamy S.  Chapter 8 – Machine Learning.  Handbook of Statistics.  38, 197–228 (2018).
  29. Misra S., Li H., He J.  Chapter 5 – Robust geomechanical characterization by analyzing the performance of shallow-learning regression methods using unsupervised clustering methods.  Machine Learning for Subsurface Characterization. 129–155 (2020).
  30. Fathi E., Shoja B. M.  Chapter 9 – Deep Neural Networks for Natural Language Processing.  Handbook of Statistics.  38, 229–316 (2018).
  31. Simske S.  Chapter 4 – Meta-analytic design patterns.  Meta-Analytics. 147–185 (2019).
  32. Banks D. L., Fienberg S. E.  Statistics, Multivariate.  Encyclopedia of Physical Science and Technology (Third Edition). 851–889 (2003).
  33. Basak D., Pal S., Patranabis D.  Support Vector Regression.  Neural Information Processing – Letters and Reviews.  11 (10), 203–224 (2007).
  34. Dong J., Chen Y., Yao B., Zhang X., Zeng N.  A neural network boosting regression model based on XGBoost.  Applied Soft Computing.  125, 109067 (2022).
  35. Torgo L.  Regression Trees.  Encyclopedia of Machine Learning and Data Mining. 1080–1083 (2017).
  36. Williams B., Halloin C., Löbel W., Finklea F., Lipke E., Zweigerdt R., Cremaschi S.  Data-Driven Model Development for Cardiomyocyte Production Experimental Failure Prediction.  Computer Aided Chemical Engineering.  48, 1639–1644 (2020).
  37. Abirami S., Chitra P.  Chapter Fourteen – Energy-efficient edge based real-time healthcare support system.  Advances in Computers.  117 (1), 339–368 (2020).
  38. Chicco D., Warrens M. J., Jurman G.  The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation.  PeerJ Computer Science.  \textbf{7}, e623 (2021).
  39. Benard C., Da Veiga S., Scornet E.  Mean decrease accuracy for random forests: inconsistency, and a practical solution via the Sobol-MDA.  Biometrika.  109 (4), 881–900 (2022).
  40. Scornet E.  Trees, forests, and impurity-based variable importance.  ArXiv:2001.04295 (2020).
  41. Shi X., Wong Y. D., Li M. Z.-F., Palanisamy C., Chai C.  A feature learning approach based on XGBoost for driving assessment and risk prediction.  Accident Analysis & Prevention.  129, 170–179 (2019).
  42. 4.2. Permutation feature importance.  https://scikit-learn/stable/modules/permutation_importance.html.