Topic Modeling for News Recommendations: Evaluating the Performance of LDA and BERTopic

Text analysis is an important component in the evolution of recommender systems, as it enables meaningful information to be extracted from vast amounts of textual data.  This study performs a comparative analysis of two main topic modeling techniques, Latent Dirichlet Allocation (LDA) and BERTopic in the context of news recommender systems.  Using a dataset of Moroccan news articles, we evaluate the ability of these models to generate coherent and interpretable topics.  Our results demonstrate that BERTopic outperforms LDA in terms of topic consistency and semantic richness, particularly when applied to large and diverse datasets.  Moreover, the integration of BERTopic into a content-based recommender system results in improved relevance and accuracy of recommendations.  This research highlights the potential of advanced topic modeling techniques for refining recommender systems and suggests future directions, including the exploration of hybrid models and the integration of sentiment analysis for further improvements.

  1. Basha M. J., Vijayakumar S., Jayashankari J., Alawadi A. H., Durdona P.  Advancements in Natural Language Processing for Text Understanding.  E3S Web of Conferences.  399, 04031 (2023).
  2. Torfi A., Shirvani R. A., Keneshloo Y., Tavaf N., Fox E. A.  Natural Language Processing Advancements By Deep Learning: A Survey.  Preprint arXiv:2003.01200 (2021).
  3. AL-Ghuribi S. M., Noah S. A. M.  A Comprehensive Overview of Recommender System and Sentiment Analysis.  Preprint arXiv:2109.08794 (2021).
  4. Amores J. J., Blanco-Herrero D., Arcila-Calderón C.  The Conversation around COVID-19 on Twitter – Sentiment Analysis and Topic Modelling to Analyse Tweets Published in English during the First Wave of the Pandemic.  Journalism and Media.  4 (2), 467–484 (2023).
  5. A. A. M. G., Robledo S., Zuluaga M.  Topic Modeling: Perspectives From a Literature Review.  IEEE Access.  11, 4066–4078 (2023).
  6. Gregoriades A., Herodotou H., Pampaka M., Christodoulou E.  A restaurant recommendation method that combines neural network algorithms and information extraction from electronic word of mouth.  Research Square.  Preprint (2024).
  7. Bao Y., Fang H., Zhang J.  TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation.  Proceedings of the AAAI Conference on Artificial Intelligence.  28 (1), 2–8 (2014).
  8. Liu D.-R., Liao Y.-S., Lu J.-Y.  Online news recommendations based on topic modeling and online interest adjustment.  Industrial Management & Data Systems.  119 (8), 1802–1818 (2019).
  9. Zhang H., Shen Z.  News Recommendation Based on User Topic and Entity Preferences in Historical Behavior.  Information.  14 (2), 60 (2023).
  10. Kordabadi M., Nazari A., Mansoorizadeh M.  A Movie Recommender System Based on Topic Modeling Using Machine Learning Methods.  Research Square.  Preprint (2022).
  11. Bergamaschi S., Po L., Sorrentino S.  Comparing Topic Models for a Movie Recommendation System.  Proceedings of the 10th International Conference on Web Information Systems and Technologies.  1,  172–183 (2014).
  12. Buranasing W., Meeklai P., Pattarathananan P.  Recommendation System for Lesser-Known Places to Visit in Thailand.  ICSED '21: Proceedings of the 2021 3rd International Conference on Software Engineering and Development.  24–28 (2021).
  13. Alves A. P. S., Flix L. G. S., Barbosa C. M., Vieira V. D. F., Xavier C. R.  Tourism Recommendation System Using Complex Network Approaches.  Symposium on Knowledge Discovery, Mining and Learning (KDMILE). 10 (2022).
  14. Lin K.-P., Chang Y.-W., Shen C.-Y., Lin M.-C.  Leveraging Online Word of Mouth for Personalized App Recommendation.  IEEE Transactions on Computational Social Systems.  5 (4), 1061–1070 (2018).
  15. Liao C.-H., Chen L.-X., Yang J.-C., Yuan S.-M.  A Photo Post Recommendation System Based on Topic Model for Improving Facebook Fan Page Engagement.  Symmetry.  12 (7), 1105 (2020).
  16. Blei D. M., Ng A. Y., Jordan M. I.  Latent Dirichlet Allocation.  Journal of Machine Learning Research.  3, 993–1022 (2003).
  17. Ounacer S., Mhamdi D., Ardchir S., Daif A., Azzouazi M.  Customer Sentiment Analysis in Hotel Reviews Through Natural Language Processing Techniques.  International Journal of Advanced Computer Science and Applications.  14 (1), 569–579 (2023).
  18. Lghaouch E. M., Ounacer S., Ardchir S., Azzouazi M.  Enhancing Sentiment Analysis Through Topic Modeling: Comprehensive Overview.  Studies in Systems, Decision and Control.  565, 161–179 (2024).
  19. Calistus U. C., Onyesolu M. O., Doris A. C., Egwu C. V.  Exploring Latent Dirichlet Allocation (LDA) in Topic Modeling: Theory, Applications, and Future Directions.  NEWPORT International Journal of Engineering and Physical Sciences.  4 (1), 9–16 (2024).
  20. Grootendorst M.  BERTopic: Neural topic modeling with a class-based TF-IDF procedure.  Preprint arXiv:2203.05794 (2022).
  21. McInnes L., Healy J., Melville J.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.  Preprint arXiv:1802.03426 (2020).
  22. Resnick P., Varian H. R.  Recommender systems.  Communications of the ACM.  40 (3), 56–58 (1997).
  23. Yu X., Jiang F., Du J., Gong D.  A cross-domain collaborative filtering algorithm with expanding user and item features via the latent factor space of auxiliary domains.  Pattern Recognition.  94, 96–109 (2019).
  24. AL-Ghuribi S. M., Noah S. A. M.  Multi-Criteria Review-Based Recommender System–The State of the Art.  IEEE Access.  7, 169446–169468 (2019).