Intelligent Fake News Prediction System Based on NLP and Machine Learning Technologies

2024;
: pp. 325 - 347
1
Lviv Polytechnic National University, Information Systems and Networks Department; Osnabrück University, Institute of Computer Science
2
Ivan Franko National University of Lviv, Applied Mathematics Department
3
Lviv Polytechnic National University, Information Systems and Networks Department
4
Lviv Polytechnic National University, Information Systems and Networks Department
5
Lviv Polytechnic National University, Information Systems and Networks Department

The article describes a study of identification of fake news based on natural language processing, big data analysis and deep learning technology. The developed system automatically checks the news for signs of fake news, such as the use of manipulative language, unverified sources and unreliable information. Data visualization is implemented on the basis of a friendly user interface that displays the results of news analysis in a convenient and understandable format. For news classification, a neural network was developed using LSTM bidirectional recurrent neural network (BRNN) and bidirectional layers in the model. The study demonstrates better performance of news analysis based on LSTM with 8 epochs compared to similar works with 3–4 epochs (99% vs. 85-96%). Deep learning models such as bidirectional LSTM are highly accurate in recognizing patterns in textual data, providing better results. The model showed high accuracy on the test sample, which indicates its ability to effectively recognize fake news. The confusion matrix showed that all the news items were classified correctly. The classification report confirmed high accuracy, completeness and F1 score for both classes (real and fake news).

  1. Shupta, A., Barmak, O., Wierzbicki, A., & Skrypnyk, T. (2023). An Adaptive Approach to Detecting Fake News Based on Generalized Text Features. CEUR Workshop Proceedings, Vol-3387, 300–310.
  2. Dar, R. A., & Hashmy, R. (2023). A Survey on COVID-19 related Fake News Detection using Machine Learning Models. CEUR Workshop Proceedings, Vol-3426, 36–46.
  3. Mykytiuk, A., et. al. (2023). Technology of Fake News Recognition Based on Machine Learning Methods. CEUR Workshop Proceedings, Vol-3387, 311–330.
  4. Afanasieva, I., Golian, N., Golian, V., Khovrat, A., & Onyshchenko, K. (2023, April). Application of Neural Networks to Identify of Fake News. CEUR Workshop Proceedings, Vol-3396, 346–358.
  5. Vysotska, V., et. al. (2022, November). NLP tool for extracting relevant information from criminal reports or fakes/propaganda content. In 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT) (pp. 93–98). IEEE. DOI: 10.1109/CSIT56902.2022.10000563.
  6. Oliinyk, V. A., et. al. (2020). Propaganda Detection in Text Data Based on NLP and Machine Learning. CEUR workshop proceedings, Vol-2631, 132–144.
  7. Ahmed, S., Hinkelmann, K., & Corradini, F. (2022). Development of fake news model using machine learning through natural language processing. arXiv preprint arXiv:2201.07489.
  8. Sharifani, K., Amini, M., Akbari, Y., & Aghajanzadeh Godarzi, J. (2022). Operating machine learning across natural language processing techniques for improvement of fabricated news model. International Journal of Science and Information System Research, 12(9), 20–44.
  9. Prachi, N. N., et. al. (2022). Detection of Fake News Using Machine Learning and Natural Language Processing Algorithms [J]. Journal of Advances in Information Technology, 13(6). DOI: 10.12720/jait.13.6.652-661.
  10. Alghamdi, J., Lin, Y., & Luo, S. (2022). A comparative study of machine learning and deep learning techniques for fake news detection. Information, 13(12), 576. DOI: 10.3390/info13120576
  11. Lai, C. M., Chen, M. H., Kristiani, E., Verma, V. K., & Yang, C. T. (2022). Fake news classification based on content level features. Applied Sciences, 12(3), 1116. DOI: 10.3390/app12031116
  12. Capuano, N., Fenza, G., Loia, V., & Nota, F. D. (2023). Content-based fake news detection with machine and deep learning: a systematic review. Neurocomputing, 530, 91–103. DOI: 10.1016/j.neucom.2023.02.005.
  13. Vysotska, V., Chyrun, L., Chyrun, S., & Holets, I. (2024). Information technology for identifying disinformation sources and inauthentic chat users’ behaviours based on machine learning. CEUR Workshop Proceedings, Vol. 3723, 466–483. https://ceur-ws.org/Vol-3723/paper24.pdf.
  14. Vysotska, V. (2024). Modern State and Prospects of Information Technologies Development for Natural Language Content Processing. CEUR Workshop Proceedings, 198–234. https://ceur-ws.org/Vol-3668/paper15.pdf.
  15. Vysotska, V. (2024). Computer Linguistic Systems Design and Development Features for Ukrainian Language Content Processing. CEUR Workshop Proceedings, 229–271. https://ceur-ws.org/Vol-3688/paper18.pdf.
  16. Vysotska, V., Chyrun, L., Chyrun, S., & Soltys, M. (2024). Information technology for textual content author’s gender and age determination based on machine learning. CEUR Workshop Proceedings, https://ceur- ws.org/Vol-3723/paper27.pdf.
  17. Reuter, C., et. al. (2019). Fake news perception in Germany: A representative study of people’s attitudes and approaches to counteract disinformation. Retrieved from: https://aisel.aisnet.org/wi2019/track09/papers/5/.
  18. Mazepa, S., et. al. (2022, November). Relationships Knowledge Graphs Construction Between Evidence Based on Crime Reports. IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT) (pp. 165–171). IEEE. DOI: 10.1109/CSIT56902.2022.10000587
  19. Zhou, Z.; Guan, H.; Bhat, M. & Hsu, J. (2019). Fake News Detection via NLP is Vulnerable  to Adversarial Attacks. Proceedings of the International Conference on Agents and Artificial Intelligence, 2, 794–800. https://doi.org/10.5220/0007566307940800.
  20. Jain, A., Shakya, A., Khatter, H., & Gupta, A. K. (2019). A smart system for fake news detection using machine learning. In 2019 International conference on issues and challenges in intelligent computing techniques (ICICT), 1, pp. 1–4. https://doi.org/10.1109/ICICT46931.2019.8977659.
  21. Mahir, E. M., Akhter, S., & Huq, M. R. (2019). Detecting fake news using machine learning and deep learning algorithms. International conference on smart computing & communications (ICSCC), 1–5. IEEE. https://doi.org/10.1109/ICSCC.2019.8843612.
  22. Wang, W. Y. (2017). “liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648. https://doi.org/10.48550/arXiv.1705.00648
  23. Girgis, S., Amer, E., & Gadallah, M. (2018, December). Deep learning algorithms for detecting fake news in online text. In 2018 13th international conference on computer engineering and systems (ICCES) (pp. 93– 97). IEEE. DOI: 10.1109/ICCES.2018.8639198
  24. Olivieri, A., Shabani, S., Sokhn, M., & Cudré-Mauroux, P. (2019). Creating task-generic features for fake news detection. https://aisel.aisnet.org/hicss-52/in/truth_and_lies_on_the_internet/3/
  25. Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017, September). Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 2931–2937). DOI: 10.18653/v1/D17-1317
  26. Khan, J. Y., et. al. (2021). A benchmark study of machine learning models for  online  fake  news detection. Machine Learning with Applications, 4, 100032. https://doi.org/10.1016/j.mlwa.2021.100032
  27. Roy, A., Basak, K., Ekbal, A., & Bhattacharyya, P. (2018). A deep ensemble framework for fake news detection and classification. arXiv preprint arXiv:1811.04670. https://doi.org/10.48550/arXiv.1811.04670
  28. Long, Y., Lu, Q., Xiang, R., Li, M., & Huang, C. R. (2017, November). Fake news detection through multi-perspective speaker profiles. In Proceedings of the eighth international joint conference on natural language processing (volume 2: Short papers) (pp. 252–256). https://aclanthology.org/I17-2043/
  29. Trueman, T. E., Kumar, A., Narayanasamy, P., & Vidya, J. (2021). Attention-based C-BiLSTM for fake news detection. Applied Soft Computing, 110, 107600. https://doi.org/10.1016/j.asoc.2021.107600
  30. Bahdanau, D. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.                   https://formacion.actuarios.org/wp-content/uploads/2024/05/1409.0473-Neu... Translation-By-Jointly-Learning-To-Align-And-Translate.pdf
  31. Martseniuk, M., et. al. (2023). Analysis of methods for detecting misinformation in social networks using machine learning. Electronic Professional Scientific Journal “Cybersecurity: Education, Science, Technique”, 2(22), 148–155. https://doi.org/10.28925/2663-4023.2023.22.148155.
  32. Li, W., et. al. (2022). Span identification and technique classification of  propaganda  in  news articles. Complex & Intelligent Systems, 8(5), 3603–3612. https://doi.org/10.1007/s40747-021-00393-y.
  33. Sprenkamp, K., Jones, D. G., & Zavolokina, L. (2023). Large language models for propaganda detection. arXiv preprint arXiv:2310.06422.https://doi.org/10.48550/arXiv.2310.06422.
  34. Polonijo, B., Šuman, S., & Šimac, I. (2021). Propaganda detection using sentiment aware ensemble deep learning. International Convention on Information, Communication and Electronic Technology, 199-204. https://doi.org/10.23919/MIPRO52101.2021.9596654.
  35. Han, Y., Karunasekera, S., & Leckie, C. (2020). Graph neural networks with continual learning for fake news detection from social media. arXiv preprint arXiv:2007.03316. https://doi.org/10.48550/arXiv.2007.03316.
  36. Altiti, O., Abdullah, M., & Obiedat, R. (2020). Just at semeval-2020 task 11: Detecting propaganda techniques using bert pre-trained model. Proceedings of the Fourteenth Workshop on Semantic Evaluation, 1749– 1755. https://doi.org/10.18653/v1/2020.semeval-1.229.
  37. Barrón-Cedeño, А., et. al. (2019). Proppy: Organizing the news based on their propagandistic content. Retrieved from: https://wwwsciencedirectcom/science/article/abs/pii/S0306457318306058:16.
  38. Nouh, M., Nurse, J. R., & Goldsmith, M. (2019, July). Understanding the radical mind: Identifying signals to detect extremist content on  twitter.  International  conference  on  intelligence  and  security  informatics (ISI) (pp. 98–103). IEEE. https://doi.org/10.1109/ISI.2019.8823548.
  39. Da San Martino, G., et. al. (2019). Fine-grained analysis of propaganda in news article. Proceedings of the conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 5636–5646). Association for Computational Linguistics.