Intelligent Fake News Prediction System Based on NLP and Machine Learning Technologies

Victoria Vysotska; Lyubomyr Chyrun; Sofia Chyrun; Roman Romanchuk; Dmytro Svyshch

The article describes a study of identification of fake news based on natural language processing, big data analysis and deep learning technology. The developed system automatically checks the news for signs of fake news, such as the use of manipulative language, unverified sources and unreliable information. Data visualization is implemented on the basis of a friendly user interface that displays the results of news analysis in a convenient and understandable format. For news classification, a neural network was developed using LSTM bidirectional recurrent neural network (BRNN) and bidirectional layers in the model. The study demonstrates better performance of news analysis based on LSTM with 8 epochs compared to similar works with 3–4 epochs (99% vs. 85-96%). Deep learning models such as bidirectional LSTM are highly accurate in recognizing patterns in textual data, providing better results. The model showed high accuracy on the test sample, which indicates its ability to effectively recognize fake news. The confusion matrix showed that all the news items were classified correctly. The classification report confirmed high accuracy, completeness and F1 score for both classes (real and fake news).

fake news identification

fake news recognition

natural language processing

Shupta, A., Barmak, O., Wierzbicki, A., & Skrypnyk, T. (2023). An Adaptive Approach to Detecting Fake News Based on Generalized Text Features. CEUR Workshop Proceedings, Vol-3387, 300–310.
Dar, R. A., & Hashmy, R. (2023). A Survey on COVID-19 related Fake News Detection using Machine Learning Models. CEUR Workshop Proceedings, Vol-3426, 36–46.
Mykytiuk, A., et. al. (2023). Technology of Fake News Recognition Based on Machine Learning Methods. CEUR Workshop Proceedings, Vol-3387, 311–330.
Afanasieva, I., Golian, N., Golian, V., Khovrat, A., & Onyshchenko, K. (2023, April). Application of Neural Networks to Identify of Fake News. CEUR Workshop Proceedings, Vol-3396, 346–358.
Vysotska, V., et. al. (2022, November). NLP tool for extracting relevant information from criminal reports or fakes/propaganda content. In 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT) (pp. 93–98). IEEE. DOI: 10.1109/CSIT56902.2022.10000563.
Oliinyk, V. A., et. al. (2020). Propaganda Detection in Text Data Based on NLP and Machine Learning. CEUR workshop proceedings, Vol-2631, 132–144.
Ahmed, S., Hinkelmann, K., & Corradini, F. (2022). Development of fake news model using machine learning through natural language processing. arXiv preprint arXiv:2201.07489.
Sharifani, K., Amini, M., Akbari, Y., & Aghajanzadeh Godarzi, J. (2022). Operating machine learning across natural language processing techniques for improvement of fabricated news model. International Journal of Science and Information System Research, 12(9), 20–44.
Prachi, N. N., et. al. (2022). Detection of Fake News Using Machine Learning and Natural Language Processing Algorithms [J]. Journal of Advances in Information Technology, 13(6). DOI: 10.12720/jait.13.6.652-661.
Alghamdi, J., Lin, Y., & Luo, S. (2022). A comparative study of machine learning and deep learning techniques for fake news detection. Information, 13(12), 576. DOI: 10.3390/info13120576
Lai, C. M., Chen, M. H., Kristiani, E., Verma, V. K., & Yang, C. T. (2022). Fake news classification based on content level features. Applied Sciences, 12(3), 1116. DOI: 10.3390/app12031116
Capuano, N., Fenza, G., Loia, V., & Nota, F. D. (2023). Content-based fake news detection with machine and deep learning: a systematic review. Neurocomputing, 530, 91–103. DOI: 10.1016/j.neucom.2023.02.005.
Vysotska, V., Chyrun, L., Chyrun, S., & Holets, I. (2024). Information technology for identifying disinformation sources and inauthentic chat users’ behaviours based on machine learning. CEUR Workshop Proceedings, Vol. 3723, 466–483. https://ceur-ws.org/Vol-3723/paper24.pdf.
Vysotska, V. (2024). Modern State and Prospects of Information Technologies Development for Natural Language Content Processing. CEUR Workshop Proceedings, 198–234. https://ceur-ws.org/Vol-3668/paper15.pdf.
Vysotska, V. (2024). Computer Linguistic Systems Design and Development Features for Ukrainian Language Content Processing. CEUR Workshop Proceedings, 229–271. https://ceur-ws.org/Vol-3688/paper18.pdf.
Vysotska, V., Chyrun, L., Chyrun, S., & Soltys, M. (2024). Information technology for textual content author’s gender and age determination based on machine learning. CEUR Workshop Proceedings, https://ceur- ws.org/Vol-3723/paper27.pdf.
Reuter, C., et. al. (2019). Fake news perception in Germany: A representative study of people’s attitudes and approaches to counteract disinformation. Retrieved from: https://aisel.aisnet.org/wi2019/track09/papers/5/.
Mazepa, S., et. al. (2022, November). Relationships Knowledge Graphs Construction Between Evidence Based on Crime Reports. IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT) (pp. 165–171). IEEE. DOI: 10.1109/CSIT56902.2022.10000587
Zhou, Z.; Guan, H.; Bhat, M. & Hsu, J. (2019). Fake News Detection via NLP is Vulnerable to Adversarial Attacks. Proceedings of the International Conference on Agents and Artificial Intelligence, 2, 794–800. https://doi.org/10.5220/0007566307940800.
Jain, A., Shakya, A., Khatter, H., & Gupta, A. K. (2019). A smart system for fake news detection using machine learning. In 2019 International conference on issues and challenges in intelligent computing techniques (ICICT), 1, pp. 1–4. https://doi.org/10.1109/ICICT46931.2019.8977659.
Mahir, E. M., Akhter, S., & Huq, M. R. (2019). Detecting fake news using machine learning and deep learning algorithms. International conference on smart computing & communications (ICSCC), 1–5. IEEE. https://doi.org/10.1109/ICSCC.2019.8843612.
Wang, W. Y. (2017). “liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648. https://doi.org/10.48550/arXiv.1705.00648
Girgis, S., Amer, E., & Gadallah, M. (2018, December). Deep learning algorithms for detecting fake news in online text. In 2018 13th international conference on computer engineering and systems (ICCES) (pp. 93– 97). IEEE. DOI: 10.1109/ICCES.2018.8639198
Olivieri, A., Shabani, S., Sokhn, M., & Cudré-Mauroux, P. (2019). Creating task-generic features for fake news detection. https://aisel.aisnet.org/hicss-52/in/truth_and_lies_on_the_internet/3/
Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017, September). Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 2931–2937). DOI: 10.18653/v1/D17-1317
Khan, J. Y., et. al. (2021). A benchmark study of machine learning models for online fake news detection. Machine Learning with Applications, 4, 100032. https://doi.org/10.1016/j.mlwa.2021.100032
Roy, A., Basak, K., Ekbal, A., & Bhattacharyya, P. (2018). A deep ensemble framework for fake news detection and classification. arXiv preprint arXiv:1811.04670. https://doi.org/10.48550/arXiv.1811.04670
Long, Y., Lu, Q., Xiang, R., Li, M., & Huang, C. R. (2017, November). Fake news detection through multi-perspective speaker profiles. In Proceedings of the eighth international joint conference on natural language processing (volume 2: Short papers) (pp. 252–256). https://aclanthology.org/I17-2043/
Trueman, T. E., Kumar, A., Narayanasamy, P., & Vidya, J. (2021). Attention-based C-BiLSTM for fake news detection. Applied Soft Computing, 110, 107600. https://doi.org/10.1016/j.asoc.2021.107600
Bahdanau, D. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. https://formacion.actuarios.org/wp-content/uploads/2024/05/1409.0473-Neu... Translation-By-Jointly-Learning-To-Align-And-Translate.pdf
Martseniuk, M., et. al. (2023). Analysis of methods for detecting misinformation in social networks using machine learning. Electronic Professional Scientific Journal “Cybersecurity: Education, Science, Technique”, 2(22), 148–155. https://doi.org/10.28925/2663-4023.2023.22.148155.
Li, W., et. al. (2022). Span identification and technique classification of propaganda in news articles. Complex & Intelligent Systems, 8(5), 3603–3612. https://doi.org/10.1007/s40747-021-00393-y.
Sprenkamp, K., Jones, D. G., & Zavolokina, L. (2023). Large language models for propaganda detection. arXiv preprint arXiv:2310.06422.https://doi.org/10.48550/arXiv.2310.06422.
Polonijo, B., Šuman, S., & Šimac, I. (2021). Propaganda detection using sentiment aware ensemble deep learning. International Convention on Information, Communication and Electronic Technology, 199-204. https://doi.org/10.23919/MIPRO52101.2021.9596654.
Han, Y., Karunasekera, S., & Leckie, C. (2020). Graph neural networks with continual learning for fake news detection from social media. arXiv preprint arXiv:2007.03316. https://doi.org/10.48550/arXiv.2007.03316.
Altiti, O., Abdullah, M., & Obiedat, R. (2020). Just at semeval-2020 task 11: Detecting propaganda techniques using bert pre-trained model. Proceedings of the Fourteenth Workshop on Semantic Evaluation, 1749– 1755. https://doi.org/10.18653/v1/2020.semeval-1.229.
Barrón-Cedeño, А., et. al. (2019). Proppy: Organizing the news based on their propagandistic content. Retrieved from: https://wwwsciencedirectcom/science/article/abs/pii/S0306457318306058:16.
Nouh, M., Nurse, J. R., & Goldsmith, M. (2019, July). Understanding the radical mind: Identifying signals to detect extremist content on twitter. International conference on intelligence and security informatics (ISI) (pp. 98–103). IEEE. https://doi.org/10.1109/ISI.2019.8823548.
Da San Martino, G., et. al. (2019). Fine-grained analysis of propaganda in news article. Proceedings of the conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 5636–5646). Association for Computational Linguistics.