Інтелектуальна система передбачення фейкових новин на основі технологій NLP та машинного навчання

Вікторія Висоцька; Любомир Чирун; Софія Чирун; Роман Романчук; Дмитро Свищ

У статті описано дослідження ідентифікації фейкових новин на основі опрацювання природної мови, аналізу великих даних і технології глибокого навчання. Розроблена система автоматично перевіряє новини на наявність ознак фейкових новин, таких як використання маніпулятивної мови, неперевірених джерел і недостовірної інформації. Візуалізація даних реалізована на основі дружнього інтерфейсу користувача, який відображає результати аналізу новин у зручному та зрозумілому форматі. Для класифікації новин розроблена нейронна мережа з використанням двонаправленої рекурентної нейронної мережі LSTM (BRNN) і двонаправлені шари в моделі. Дослідження демонструє кращі показники аналізу новин на основі LSTM з 8 епохами порівняно з аналогічними роботами з 3-4 епохами (99% проти 85–96%). Моделі глибокого навчання, такі як двонаправлений LSTM, мають високу точність у розпізнаванні шаблонів у текстових даних, що забезпечує кращі результати. Модель показала високу точність на тестовій вибірці, що свідчить про її здатність до ефективного розпізнавання фейкових новин. Матриця плутанини показала, що всі новини класифіковані правильно. Класифікаційний звіт підтвердив високу точність, повноту та F1-оцінку для обох класів (справжні та фейкові новини).

ідентифікація фейкових новин

розпізнавання фейкових новин

фейкові новини, машинне навчання

опрацювання природної мови

аналіз великих даних

дезінформація, пропаганда

пропаганда

фейк

нейронна мережа

Shupta, A., Barmak, O., Wierzbicki, A., & Skrypnyk, T. (2023). An Adaptive Approach to Detecting Fake News Based on Generalized Text Features. CEUR Workshop Proceedings, Vol-3387, 300–310.
Dar, R. A., & Hashmy, R. (2023). A Survey on COVID-19 related Fake News Detection using Machine Learning Models. CEUR Workshop Proceedings, Vol-3426, 36–46.
Mykytiuk, A., et. al. (2023). Technology of Fake News Recognition Based on Machine Learning Methods. CEUR Workshop Proceedings, Vol-3387, 311–330.
Afanasieva, I., Golian, N., Golian, V., Khovrat, A., & Onyshchenko, K. (2023, April). Application of Neural Networks to Identify of Fake News. CEUR Workshop Proceedings, Vol-3396, 346–358.
Vysotska, V., et. al. (2022, November). NLP tool for extracting relevant information from criminal reports or fakes/propaganda content. In 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT) (pp. 93–98). IEEE. DOI: 10.1109/CSIT56902.2022.10000563.
Oliinyk, V. A., et. al. (2020). Propaganda Detection in Text Data Based on NLP and Machine Learning. CEUR workshop proceedings, Vol-2631, 132–144.
Ahmed, S., Hinkelmann, K., & Corradini, F. (2022). Development of fake news model using machine learning through natural language processing. arXiv preprint arXiv:2201.07489.
Sharifani, K., Amini, M., Akbari, Y., & Aghajanzadeh Godarzi, J. (2022). Operating machine learning across natural language processing techniques for improvement of fabricated news model. International Journal of Science and Information System Research, 12(9), 20–44.
Prachi, N. N., et. al. (2022). Detection of Fake News Using Machine Learning and Natural Language Processing Algorithms [J]. Journal of Advances in Information Technology, 13(6). DOI: 10.12720/jait.13.6.652-661.
Alghamdi, J., Lin, Y., & Luo, S. (2022). A comparative study of machine learning and deep learning techniques for fake news detection. Information, 13(12), 576. DOI: 10.3390/info13120576
Lai, C. M., Chen, M. H., Kristiani, E., Verma, V. K., & Yang, C. T. (2022). Fake news classification based on content level features. Applied Sciences, 12(3), 1116. DOI: 10.3390/app12031116
Capuano, N., Fenza, G., Loia, V., & Nota, F. D. (2023). Content-based fake news detection with machine and deep learning: a systematic review. Neurocomputing, 530, 91–103. DOI: 10.1016/j.neucom.2023.02.005.
Vysotska, V., Chyrun, L., Chyrun, S., & Holets, I. (2024). Information technology for identifying disinformation sources and inauthentic chat users’ behaviours based on machine learning. CEUR Workshop Proceedings, Vol. 3723, 466–483. https://ceur-ws.org/Vol-3723/paper24.pdf.
Vysotska, V. (2024). Modern State and Prospects of Information Technologies Development for Natural Language Content Processing. CEUR Workshop Proceedings, 198–234. https://ceur-ws.org/Vol-3668/paper15.pdf.
Vysotska, V. (2024). Computer Linguistic Systems Design and Development Features for Ukrainian Language Content Processing. CEUR Workshop Proceedings, 229–271. https://ceur-ws.org/Vol-3688/paper18.pdf.
Vysotska, V., Chyrun, L., Chyrun, S., & Soltys, M. (2024). Information technology for textual content author’s gender and age determination based on machine learning. CEUR Workshop Proceedings, https://ceur- ws.org/Vol-3723/paper27.pdf.
Reuter, C., et. al. (2019). Fake news perception in Germany: A representative study of people’s attitudes and approaches to counteract disinformation. Retrieved from: https://aisel.aisnet.org/wi2019/track09/papers/5/.
Mazepa, S., et. al. (2022, November). Relationships Knowledge Graphs Construction Between Evidence Based on Crime Reports. IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT) (pp. 165–171). IEEE. DOI: 10.1109/CSIT56902.2022.10000587
Zhou, Z.; Guan, H.; Bhat, M. & Hsu, J. (2019). Fake News Detection via NLP is Vulnerable to Adversarial Attacks. Proceedings of the International Conference on Agents and Artificial Intelligence, 2, 794–800. https://doi.org/10.5220/0007566307940800.
Jain, A., Shakya, A., Khatter, H., & Gupta, A. K. (2019). A smart system for fake news detection using machine learning. In 2019 International conference on issues and challenges in intelligent computing techniques (ICICT), 1, pp. 1–4. https://doi.org/10.1109/ICICT46931.2019.8977659.
Mahir, E. M., Akhter, S., & Huq, M. R. (2019). Detecting fake news using machine learning and deep learning algorithms. International conference on smart computing & communications (ICSCC), 1–5. IEEE. https://doi.org/10.1109/ICSCC.2019.8843612.
Wang, W. Y. (2017). “liar, liar pants on fire”: A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648. https://doi.org/10.48550/arXiv.1705.00648
Girgis, S., Amer, E., & Gadallah, M. (2018, December). Deep learning algorithms for detecting fake news in online text. In 2018 13th international conference on computer engineering and systems (ICCES) (pp. 93– 97). IEEE. DOI: 10.1109/ICCES.2018.8639198
Olivieri, A., Shabani, S., Sokhn, M., & Cudré-Mauroux, P. (2019). Creating task-generic features for fake news detection. https://aisel.aisnet.org/hicss-52/in/truth_and_lies_on_the_internet/3/
Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017, September). Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 2931–2937). DOI: 10.18653/v1/D17-1317
Khan, J. Y., et. al. (2021). A benchmark study of machine learning models for online fake news detection. Machine Learning with Applications, 4, 100032. https://doi.org/10.1016/j.mlwa.2021.100032
Roy, A., Basak, K., Ekbal, A., & Bhattacharyya, P. (2018). A deep ensemble framework for fake news detection and classification. arXiv preprint arXiv:1811.04670. https://doi.org/10.48550/arXiv.1811.04670
Long, Y., Lu, Q., Xiang, R., Li, M., & Huang, C. R. (2017, November). Fake news detection through multi-perspective speaker profiles. In Proceedings of the eighth international joint conference on natural language processing (volume 2: Short papers) (pp. 252–256). https://aclanthology.org/I17-2043/
Trueman, T. E., Kumar, A., Narayanasamy, P., & Vidya, J. (2021). Attention-based C-BiLSTM for fake news detection. Applied Soft Computing, 110, 107600. https://doi.org/10.1016/j.asoc.2021.107600
Bahdanau, D. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. https://formacion.actuarios.org/wp-content/uploads/2024/05/1409.0473-Neu... Translation-By-Jointly-Learning-To-Align-And-Translate.pdf
Martseniuk, M., et. al. (2023). Analysis of methods for detecting misinformation in social networks using machine learning. Electronic Professional Scientific Journal “Cybersecurity: Education, Science, Technique”, 2(22), 148–155. https://doi.org/10.28925/2663-4023.2023.22.148155.
Li, W., et. al. (2022). Span identification and technique classification of propaganda in news articles. Complex & Intelligent Systems, 8(5), 3603–3612. https://doi.org/10.1007/s40747-021-00393-y.
Sprenkamp, K., Jones, D. G., & Zavolokina, L. (2023). Large language models for propaganda detection. arXiv preprint arXiv:2310.06422.https://doi.org/10.48550/arXiv.2310.06422.
Polonijo, B., Šuman, S., & Šimac, I. (2021). Propaganda detection using sentiment aware ensemble deep learning. International Convention on Information, Communication and Electronic Technology, 199-204. https://doi.org/10.23919/MIPRO52101.2021.9596654.
Han, Y., Karunasekera, S., & Leckie, C. (2020). Graph neural networks with continual learning for fake news detection from social media. arXiv preprint arXiv:2007.03316. https://doi.org/10.48550/arXiv.2007.03316.
Altiti, O., Abdullah, M., & Obiedat, R. (2020). Just at semeval-2020 task 11: Detecting propaganda techniques using bert pre-trained model. Proceedings of the Fourteenth Workshop on Semantic Evaluation, 1749– 1755. https://doi.org/10.18653/v1/2020.semeval-1.229.
Barrón-Cedeño, А., et. al. (2019). Proppy: Organizing the news based on their propagandistic content. Retrieved from: https://wwwsciencedirectcom/science/article/abs/pii/S0306457318306058:16.
Nouh, M., Nurse, J. R., & Goldsmith, M. (2019, July). Understanding the radical mind: Identifying signals to detect extremist content on twitter. International conference on intelligence and security informatics (ISI) (pp. 98–103). IEEE. https://doi.org/10.1109/ISI.2019.8823548.
Da San Martino, G., et. al. (2019). Fine-grained analysis of propaganda in news article. Proceedings of the conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 5636–5646). Association for Computational Linguistics.