DECISION SUPPORT SYSTEM FOR DISINFORMATION, FAKES AND PROPAGANDA DETECTION BASED ON MACHINE LEARNING

2024;
: 105–116
https://doi.org/10.23939/ujit2024.02.105
Received: September 09, 2024
Accepted: November 19, 2024
1
Lviv Polytechnic National University, Information Systems and Networks Department; Osnabrück University, Institute of Computer Science
2
Lviv Polytechnic National University, Information Systems and Networks Department

Due to the simplification of the processes of creating and distributing news via the Internet, as well as due to the physical impossibility of checking large volumes of information circulating in the network, the volume of disinformation and fake news distribution has increased significantly. A decision support system for identifying disinformation, fakes and propaganda based on machine learning has been built. The method of news text analysis for identifying fakes and predicting the detection of disinformation in news texts has been studied. Due to the simplification of the processes of creating and distributing news via the Internet, as well as due to the physical impossibility of checking large volumes of information circulating in the network, the volume of disinformation and fake news distribution has increased significantly. In this regard, detection of fake news becomes a critical task. This not only ensures the provision of verified and reliable information to users, but also helps prevent manipulation of public consciousness. Strengthening control over the credibility of news is important for maintaining a reliable ecosystem of the information space. The combination of IR and NLP allows systems to automatically analyse and track information to detect potential misinformation or fake news. It is also important to consider context, source of information, and other factors to accurately determine credibility. Such automated methods can help in real-time detection and resolution of problems related to the spread of misinformation in social networks. For our experiment, we use a dataset with a total number of 20,000 articles: 10,000 entries for fake news and 10,000 for non-fake news. Most of the articles are related to politics. For both subsets of the data, basic text cleaning procedures such as changing text to lowercase, removing punctuation marks, cleaning location and author tags, and removing stop words, etc., were performed. After cleaning, tokenization and lemmatization were performed. For better lemmatization results, each token is labelled with a POS tag. Using POS tags helps perform lemmatization more accurately. For both subsets of the data, bigrams and trigrams were created to better understand the context of the articles in the data set. It was found that non-fake news uses a more formal language style. Next, we performed sentiment analysis on both subsets of the data. The results show that the fake sub-dataset contains more negative scores, while the non-false sub-dataset has mostly positive scores. Subsets of the data were combined before building the prediction model. BOW and Logistic Regression functions were used for the forecast model. The F1 score is 0.98 for both fake/non-fake classes.

1. Tyshchenko, V., & Muzhanova, T. (2022). Disinformation and fake news: features and methods of detection on the internet. Cybersecurity: education, science, technique, 2(18), 175 186. https://doi.org/10.28925/2663-4023.2022.18.175186
https://doi.org/10.28925/2663-4023.2022.18.175186
2. Myronyuk, O. (2024). Misinformation: how to recognize and combat it [Misinformation: how to recognize and combat it]. Retrieved from: https://law.chnu.edu.ua/dezinformatsiia-yak-rozpiznaty-ta-borotysia/
3. Reuter, C., Hartwig, K., Kirchner, J., & Schlegel, N. (2019). Fake news perception in Germany: A representative study of people's attitudes and approaches to counteract disinformation. Retrieved from: https://aisel.aisnet.org/wi2019/track09/papers/5/
4. Luchko, Y. I. (2023). The role of artificial intelligence technologies in spreading and combating disinformation [The role of artificial intelligence technologies in spreading and combating disinformation]. Countermeasures against disinformation in the conditions of Russian aggression against Ukraine: challenges and prospects: theses addendum. participants of the international science and practice conf. (Ann Arbor - Kharkiv, December 12-13, 2023), 104-106. https://doi.org/10.32782/PPSS.2023.1.26
https://doi.org/10.32782/PPSS.2023.1.26
5. Комісарів, М. (2023). Проблема поширення недостовірної інформації у ЗМІ та соціальних медіа. Retrieved from: https://www.osce.org/representative-on-freedomof-media.
6. Marchi, R. (2012). With facebook, blogs, and fake news, teens reject journalistic "objectivity". Journal of communication inquiry, 36(3), 246 262. https://doi.org/10.1177/0196859912458700
https://doi.org/10.1177/0196859912458700
7. Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake News Detection on Social Media: A Data Mining Perspective. SIGKDD Explorations Newsletter, 19(1), 22 36. https://doi.org/10.1145/3137597.3137600
https://doi.org/10.1145/3137597.3137600
8. Zhou, Z.; Guan, H.; Bhat, M. & Hsu, J. (2019). Fake News Detection via NLP is Vulnerable to Adversarial Attacks. Proceedings of the International Conference on Agents and Artificial Intelligence, 2, 794-800. https://doi.org/10.5220/0007566307940800
https://doi.org/10.5220/0007566307940800
9. Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., Nyhan, B., Pennycook, G., Rothschild, D., Schudson, M., Sloman, S. A., Sunstein, Cass. R., Thorson, E. A., Watts, D. J., & Zittrain, J. L. (2018). The science of fake news. Science, 359(6380), 1094-1096. https://doi.org/10.1126/science.aao299
https://doi.org/10.1126/science.aao2998
10. Vosoughi, S, Roy, D, & Aral, S. (2018). The spread of true and false news online. Science, 359, 1146-1151. https://doi.org/10.1126/science.aap9559
https://doi.org/10.1126/science.aap9559
11. Zuo, C., Karakas, A. I., & Banerjee, R. (2018). A hybrid recognition system for check-worthy claims using heuristics and supervised learning. CEUR workshop proceedings, 2125. Retrieved from: https://ceur-ws.org/Vol-2125/paper_143.pdf
12. Hansen, C., Hansen, C., Simonsen, J. G., & Lioma, C. (2018). The Copenhagen Team Participation in the Check-Worthiness Task of the Competition of Automatic Identification and Verification of Claims in Political Debates of the CLEF-2018 CheckThat! Lab. CEUR Workshop Proceedings, 2125. Retrieved from: https://ceur-ws.org/Vol-2125/paper_81.pdf
13. Thorne, J, & Vlachos, A. (2018). Automated fact checking: Task formulations, methods and future directions. arXiv preprint arXiv:180607687. https://doi.org/10.48550/arXiv.1806.07687.
14. Mihaylova, T, Karadjov, G, Atanasova, P, Baly, R, Mohtarami, M, & Nakov, P. (2019). SemEval- 2019 task 8: Fact checking in community question answering forums. arXiv preprint arXiv:190601727. https://doi.org/10.48550/arXiv.1906.01727
https://doi.org/10.18653/v1/S19-2149
15. O'Brien, N. (2018). Machine learning for detection of fake news. Retrieved from: https://dspace.mit.edu/handle/1721.1/1197279
16. Canini, K. R, Suh, B, & Pirolli, P. L. (2011). Finding credible information sources in social networks based on content and social structure. International Conference on Privacy, Security, Risk and Trust and International Conference on Social Computing, Boston, MA, USA, 1-8. https://doi.org/10.1109/PASSAT/SocialCom.2011.91
https://doi.org/10.1109/PASSAT/SocialCom.2011.91
17. Derczynski, L., Bontcheva, K., Liakata, M., Procter, R., Hoi, G. W. S., & Zubiaga, A. (2017). SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours. arXiv preprint arXiv:170405972. https://doi.org/10.48550/arXiv.1704.05972
https://doi.org/10.18653/v1/S17-2006
18. Baly, R., Karadzhov, G., Alexandrov, D., Glass, J., & Nakov, P. (2018). Predicting factuality of reporting and bias of news media sources. arXiv preprint arXiv:181001765. https://doi.org/10.48550/arXiv.1810.01765
https://doi.org/10.18653/v1/D18-1389
19. Hardalov, M., Koychev, I., & Nakov, P. (2016). In Search of Credible News. In: Dichev, C., Agre, G. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2016. Lecture Notes in Computer Science, 9883. Springer, Cham. https://doi.org/10.1007/978-3-319-44748-3_17
https://doi.org/10.1007/978-3-319-44748-3_17
20. Enayet, O., & El-Beltagy, S. R. (2017). NileTMRG at SemEval-2017 task 8: Determining rumour and veracity support for rumours on Twitter. https://doi.org/10.18653/v1/S17-2082
https://doi.org/10.18653/v1/S17-2082
21. Juola, P. (2012). An Overview of the Traditional Authorship Attribution Subtask. CEUR Workshop Proceedings, 1178. Retrieved from: https://ceur-ws.org/Vol-1178/CLEF2012wn-PAN-Juola2012.pdf.
22. Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology, 60(3), 538-556. https://doi.org/10.1002/asi.21001
https://doi.org/10.1002/asi.21001
23. Popat, K., Mukherjee, S., Strötgen, J., & Weikum, G. (2017). Where the truth lies: Explaining the credibility of emerging claims on the web and social media. Proceedings of the 26th international conference on world wide web companion, 1003-1012. https://doi.org/10.1145/3041021.3055133
https://doi.org/10.1145/3041021.3055133
24. Aphiwongsophon, S., & Chongstitvatana, P. (2018). Detecting fake news with machine learning method. International conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON), 528-531. https://doi.org/10.1109/ECTICon.2018.8620051
https://doi.org/10.1109/ECTICon.2018.8620051
25. Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., & Stein, B. (2017). A stylometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:1702.05638. https://doi.org/10.48550/arXiv.1702.05638
https://doi.org/10.18653/v1/P18-1022
26. Koppel, M., Schler, J., & Bonchek-Dokow, E. (2007). Measuring differentiability: Unmasking pseudonymous authors. Journal of Machine Learning Research, 8(6). Retrieved from: https://www.jmlr.org/papers/volume8/koppel07 a/koppel07 a.pdf.
27. Horne, B., & Adali, S. (2017). This Just In: Fake News Packs A Lot In Title, Uses Simpler, Repetitive Content in Text Body, More Similar To Satire Than Real News. Proceedings of the International AAAI Conference on Web and Social Media, 11(1), 759-766. https://doi.org/10.1609/icwsm.v11i1.14976
https://doi.org/10.1609/icwsm.v11i1.14976
28. Jain, A., Shakya, A., Khatter, H., & Gupta, A. K. (2019). A smart system for fake news detection using machine learning. In 2019 International conference on issues and challenges in intelligent computing techniques (ICICT), 1, 1-4. https://doi.org/10.1109/ICICT46931.2019.8977659
https://doi.org/10.1109/ICICT46931.2019.8977659
29. Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017). Truth of varying shades: Analyzing language in fake news and political fact-checking. Proceedings of the conference on empirical methods in natural language processing, 2931-2937. https://doi.org/10.18653/v1/D17-1317
https://doi.org/10.18653/v1/D17-1317
30. Shao, C., Ciampaglia, G. L., Varol, O., Flammini, A., & Menczer, F. (2017). The spread of fake news by social bots. arXiv preprint arXiv:1707.07592, 96(104), 14. Retrieved from: https://cs.furman.edu/~tallen/csc271/source/viralBot.pdf
31. Horne, B., Khedr, S., & Adali, S. (2018). Sampling the News Producers: A Large News and Feature Data Set for the Study of the Complex Media Landscape. Proceedings of the International AAAI Conference on Web and Social Media, 12(1). https://doi.org/10.1609/icwsm.v12i1.14982
https://doi.org/10.1609/icwsm.v12i1.14982
32. Mahir, E. M., Akhter, S., & Huq, M. R. (2019). Detecting fake news using machine learning and deep learning algorithms. International conference on smart computing & communications (ICSCC), 1-5. IEEE. https://doi.org/10.1109/ICSCC.2019.8843612
https://doi.org/10.1109/ICSCC.2019.8843612
33. Da San Martino, G., Seunghak, Y., Barrón-Cedeno, A., Petrov, R., & Nakov, P. (2019). Fine-grained analysis of propaganda in news article. Proceedings of the conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), 5636-5646. Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1565
https://doi.org/10.18653/v1/D19-1565
34. Nouh, M., Nurse, J. R., & Goldsmith, M. (2019, July). Understanding the radical mind: Identifying signals to detect extremist content on twitter. International conference on intelligence and security informatics (ISI). IEEE, 98-103. https://doi.org/10.1109/ISI.2019.8823548
https://doi.org/10.1109/ISI.2019.8823548
35. Barrón-Cedeño, А., Jaradat, I., Da San Martino, G., & Nakov, P. (2019). Proppy: Organizing the news based on their propagandistic content. Retrieved from: https://wwwsciencedirectcom/science/article/abs/pii/S0306457318306058:16. https://doi.org/10.1016/j.ipm.2019.03.005
https://doi.org/10.1016/j.ipm.2019.03.005
36. Oliinyk, V. A., Vysotska, V., Burov, Y., Mykich, K., & Basto-Fernandes, V. (2020). Propaganda Detection in Text Data Based on NLP and Machine Learning. CEUR Workshop Proceedings, 2631, 132-144. Retrieved from: https://ceur-ws.org/Vol-2631/paper10.pdf
37. Altiti, O., Abdullah, M., & Obiedat, R. (2020). Just at semeval-2020 task 11: Detecting propaganda techniques using bert pre-trained model. Proceedings of the Fourteenth Workshop on Semantic Evaluation, 1749-1755. https://doi.org/10.18653/v1/2020.semeval-1.229
https://doi.org/10.18653/v1/2020.semeval-1.229
38. Han, Y., Karunasekera, S., & Leckie, C. (2020). Graph neural networks with continual learning for fake news detection from social media. arXiv preprint arXiv:2007.03316. https://doi.org/10.48550/arXiv.2007.03316
39. Polonijo, B., Šuman, S., & Šimac, I. (2021). Propaganda detection using sentiment aware ensemble deep learning. International Convention on Information, Communication and Electronic Technology, 199-204. https://doi.org/10.23919/MIPRO52101.2021.9596654
https://doi.org/10.23919/MIPRO52101.2021.9596654
40. Sprenkamp, K., Jones, D. G., & Zavolokina, L. (2023). Large language models for propaganda detection. arXiv preprint arXiv:2310.06422. https://doi.org/10.48550/arXiv.2310.06422
41. Li, W., Li, S., Liu, C., Lu, L., Shi, Z., & Wen, S. (2022). Span identification and technique classification of propaganda in news articles. Complex & Intelligent Systems, 8(5), 3603-3612. https://doi.org/10.1007/s40747-021-00393-y
https://doi.org/10.1007/s40747-021-00393-y
42. Martseniuk, M., Kozachok, V., Bohdanov, O., & Brzhevska, Z. (2023). Analysis of methods for detecting misinformation in social networks using machine learning. Electronic Professional Scientific Journal "Cybersecurity: Education, Science, Technique", 2(22), 148 155. https://doi.org/10.28925/2663-4023.2023.22.148155
https://doi.org/10.28925/2663-4023.2023.22.148155
43. Ravichandiran, S. (2021). Getting Started with Google BERT: Build and train state-of-the-art natural language processing models using BERT. Packt Publishing Ltd.
44. Xiao, Y., & Jin, Z. (2021). Summary of research methods on pre-training models of natural language processing. Open Access Library Journal, 8(7), 1-7. https://doi.org/10.4236/oalib.1107602
https://doi.org/10.4236/oalib.1107602
45. Ibrahim, M., & Murshed, M. (2016). From tf-idf to learning-to-rank: An overview. Handbook of research on innovations in information retrieval, analysis, and management, 62-109. https://doi.org/10.4018/978-1-4666-8833-9.ch003
https://doi.org/10.4018/978-1-4666-8833-9.ch003
46. Omar, M., Choi, S., Nyang, D., & Mohaisen, D. (2022). Robust natural language processing: Recent advances, challenges, and future directions. IEEE Access, 10, 86038-86056. https://doi.org/10.1109/ACCESS.2022.3197769
https://doi.org/10.1109/ACCESS.2022.3197769
47. Verma, P. K., Agrawal, P., & Prodan, R. (2021). WELFake Dataset for Fake News Detection in Text Data (Version: 0.1) [Data Set]. Genéve, Switzerland: Zenodo.