Large-scale recommender systems using Hadoop and collaborative filtering: a comparative study

With the rapid advancements in internet technologies over the past two decades, the amount of information available online has exponentially increased.  This data explosion has led to the development of recommender systems, designed to understand individual preferences and provide personalized recommendations for desirable new content.  These systems act as helpful guides, assisting users in discovering relevant and appealing information tailored to their specific tastes and interests.  This study's primary objective is to assess and contrast the latest methods utilized in recommender systems within a distributed system architecture that relies on Hadoop.  Our analysis will focus on collaborative filtering and will be conducted using a large dataset.  We have implemented the algorithms using Python and PySpark, enabling the processing of large datasets using Apache Hadoop and Spark.  The studied approaches have been implemented on the MovieLens dataset and compared using the following evaluation metrics: RMSE, precision, recall, and F1 score. Their training times have also been compared.

  1. Roy D., Dutta M.  A Systematic Review and Research Perspective on Recommender Systems.  Journal of Big Data.  9 (1), 59 (2022).
  2. Dahdouh K., Dakkak A., Oughdir L., Ibriz A.  Large-Scale e-Learning Recommender System Based on Spark and Hadoop.  Journal of Big Data.  6 (1), 2 (2019).
  3. Patel B., Desai P., Panchal U.  Methods of Recommender System: A Review.  2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS). 1–4 (2017).
  4. Raza S., Ding C.  Progress in Context-Aware Recommender Systems – An Overview.  Computer Science Review.  31, 84–97 (2019).
  5. Anwar T., Uma V.  Comparative study of recommender system approaches and movie recommendation using collaborative filtering.  International Journal of System Assurance Engineering and Management.  12, 426–436 (2021).
  6. Isinkaye F. O., Folajimi Y. O., Ojokoh B. A.  Recommendation Systems: Principles, Methods and Evaluation.  Egyptian Informatics Journal.  16 (3), 261–273 (2015).
  7. Haruna K., Akmar I. M., Damiasih D., Sutopo J., Herawan T.  A Collaborative Approach for Research Paper Recommender System.  PLoS ONE.  12 (10), e0184516 (2017).
  8. Soares M., Viana P.  Tuning Metadata for Better Movie Content-Based Recommendation Systems.  Multimedia Tools and Applications.  74 (17), 7015–7036 (2015).
  9. Suhaim A. B., Berri J.  Context-Aware Recommender Systems for Social Networks: Review, Challenges and Opportunities.  IEEE Access.  9, 57440–57463 (2021).
  10. Ben Sassi I., Mellouli S., Ben Yahia S.  Context-aware recommender systems in mobile environment: On the road of future research.  Information Systems.  72, 27–61 (2017).
  11. Thaipisutikul P., et al.  A Hybrid Recommender System for Personalized Point of Interest Recommendations in Online Social Networks.  Information Sciences.  418, 541–562 (2017).
  12. Zhang Y.  An Introduction to Matrix factorization and Factorization Machines in Recommendation System, and Beyond.  Preprint arXiv:2203.11026 (2022).
  13. Mallouk I., Abou el Majd B., Sallez Y.  A generic model of the information and decisional chain using Machine Learning based assistance in a manufacturing context.  Mathematical Modeling and Computing.  10 (4), 1023–1036 (2023).
  14. Aljunid M. F., Manjaiah D. H.  An Improved ALS Recommendation Model Based on Apache Spark.  Soft Computing Systems (ICSCS 2018). 302–311 (2018).
  15. Jannani A., Sael N., Benabbou F.  Machine learning for the analysis of quality of life using the World Happiness Index and Human Development Indicators.  Mathematical Modeling and Computing.  10 (2), 534–546 (2023).
  16. Sehta M. N., Patidar A.  A Product Recommendation System Using HADOOP Server.  International Journal of Engineering Science.  19, 137 (2018).
  17. Bansal M., Goyal A., Choudhary A.  A comparative analysis of K-Nearest Neighbour, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning.  Decision Analytics Journal.  3, 100071 (2022).
  18. Bouzaachane K., Darouichi A., El Guarmah E. M.  Deep learning for photovoltaic panels segmentation.  Mathematical Modeling and Computing.  10 (3), 638–650 (2023).
  19. Sedhain S., Menon A. K., Sanner S., Xie L.  AutoRec: Autoencoders Meet Collaborative Filtering.  WWW'15 Companion: Proceedings of the 24th International Conference on World Wide Web.  111–112 (2015).