Machine Learning of the Classifier of Authors of Social Network Messages

Authors:

Serhii Holub^a, Nataliia Khymytsia^b, Maria Holub^a and Oleksandr Mоrushko^b

^a Cherkasy State Technological University

^b Lviv Polytechnic National University

The results of research into the process of grouping authors of printed text messages in social networks are presented. The hypothesis about the possibility of grouping authors based on the results of the classification of their text messages has been confirmed. For this purpose, the virtual robot builds an intelligent monitoring agent for grouping the authors of social network text messages. The peculiarity of these messages is that they are short texts. In this regard, machine learning of classifier models was carried out on observation points that described message windows of 100 characters. In the process of this training, the method of profiled formation of the primary description of the text message is used. The structure of a virtual robot that performed the monitoring task of grouping authors with common properties is described. An example of building a structural element of a virtual robot - an agent model of a classifier is given. Messages from two authors belonging to one of the expertly created classes were used as a benchmark. The analysis of the results of the work of the virtual robot allows us to state that the authors whose texts are recognized as similar to the standard have similarities in the form and content of their statements. The results of the research can be used to identify groups of authors who engage in joint destructive activities in social networks to the detriment of Ukraine (trolls).

Intelligent monitoring

[1] Yue Kang, Zhao Cai, Chee-Wee Tan, Qian Huang and Hefu Liu, Natural language processing (NLP) in management research: A literature review, Journal of Management Analytics, 7:2, (2020) pp.139-172. doi: 10.1080/23270012.2020.1756939

[2] Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, An Introduction to Information Retrieval (2012) Draft. Online edition. Cambridge University Press.

[3] K. S. Jones, A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, MCB University, MCB University Press (2004), Т. 60, № 5. pp.493-502.

[4] Understanding TF-IDF (Term Frequency-Inverse Document Frequency), 2022. URL: https://www.geeksforgeeks.org/understanding-tf-idf-term-frequency-invers...

[5] Cavnar, William B. and Trenkle, John M. "N-Gram-Based Text Categorization." Paper presented at the meeting of the Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, US, 1994. URL:http://citeseer.ist.psu.edu/68861.html

[6] R. Socher, B. Huval, Chistopher D. Manning, and Andrew Y. Ng, Semantic Compositionality through Recursive Matrix-Vector Spaces, in: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, Association for Computational Linguistics, 2012, pp.1201–1211. URL:https://aclanthology.org/D12-1110.

[7] Matt Post and Shane Bergsma, Explicit and Implicit Syntactic Features for Text Classification, in: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Sofia, Bulgaria, Association for Computational Linguistics, 2013, pp. 866–872, URL:https://aclanthology.org/P13-2150.

[8] Fang Wang, Zhongyuan Wang, Zhoujun Li, and Ji-Rong Wen, Concept-based Short Text Classification and Ranking, in: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM '14), Association for Computing Machinery, New York, NY, USA, 2014, pp.1069–1078. URL:https://doi.org/10.1145/2661829.2662067.

[9] Tian Shi, Kyeongpil Kang, Jaegul Choo, and Chandan K. Reddy, Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations, in: Proceedings of the 2018 World Wide Web Conference (WWW '18), International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 2018. pp.11051114. URL: https://doi.org/10.1145/3178876.3186009.

[10] Lai, S., Xu, L., Liu, K., and Zhao, J., Recurrent Convolutional Neural Networks for Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 29 (1), 2015. URL: https://doi.org/10.1609/aaai.v29i1.9513.

[11] Singh, Rajat, Nurendra Choudhary, Ishita Bindlish and Manish Shrivastava. "Neural Network Architecture for Credibility Assessment of Textual Claims." ArXiv abs/1803.10547 (2018): n. pag.

[12] Zhang, Xiang, Junbo Jake Zhao and Yann LeCun. "Character-level Convolutional Networks for Text Classification." ArXiv abs/1509.01626 (2015): n. pag. URL:https://doi.org/10.48550/arXiv.1509.01626.

[13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics. 2019, pp. 4171–4186. doi:10.18653/v1/N19-1423.

[14] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising Sequence-to-Sequence Pretraining for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics. 2020, pp. 7871–7880. doi:10.18653/v1/2020.acl-main.703.

[15] Yang Zhilin, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov and Quoc V. Le. "XLNet: Generalized Autoregressive Pretraining for Language Understanding." NeurIPS (2019). URL: https://doi.org/10.48550/arXiv.1906.08237

[16] Chenliang Li, Yu Duan, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. "Enhancing Topic Modeling for Short Texts with Auxiliary Word Embeddings." ACM Trans. Inf. Syst. 36, 2, Article 11 (April 2018). URL: https://doi.org/10.1145/3091108.

[17] Yang, Yi, Hongan Wang, Jiaqi Zhu, Yunkun Wu, Kailong Jiang, Wenli Guo and Wandong Shi. "Dataless Short Text Classification Based on Biterm Topic Model and Word Embeddings." IJCAI (2020). URL: https://doi.org/10.24963/ijcai.2020/549

[18] Y. Zuo, C. Li, H. Lin and J. Wu."Topic Modeling of Short Texts: A Pseudo-Document View with Word Embedding Enhancement," in IEEE Transactions on Knowledge and Data Engineering, doi: 10.1109/TKDE.2021.3073195.

[19] Kshitij Tayal, Nikhil Rao, Saurabh Agarwal, Xiaowei Jia, Karthik Subbian, and Vipin Kumar, Regularized Graph Convolutional Networks for Short Text Classification, in: Proceedings of the 28th International Conference on Computational Linguistics: Industry Track, Online, International Committee on Computational Linguistics, 2020, pp. 236–242. doi:10.18653/v1/2020.colingindustry.22

[20] Kipf, Thomas and Max Welling. "Semi-Supervised Classification with Graph Convolutional Networks." ArXiv abs/1609.02907 (2017): n. pag. URL: https://doi.org/10.48550/arXiv.1609.02907

[21] Meng, Y., Shen, J., Zhang, C., аnd Han, J. Weakly-Supervised Hierarchical Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), (2019). 6826-6833. URL: https://doi.org/10.1609/aaai.v33i01.33016826

[22] S. Holub, N. Khymytsia, M. Holub, S.Fedushko, The intelligent monitoring of messages on social networks, CEUR Workshop Proceedings. Vol 2616: Proceedings of the 2nd International Workshop on Control, Optimisation and Analytical Processing of Social Networks (COAPSN2020), Lviv, Ukraine, May 21, 2020, p. 308-317. URL: http://ceur-ws.org/Vol-2616/paper26.pdf

[23] V.P. Kazmirenko, V.M. Dukhnevych, O.YU.Osadʹko ta in., Zasady kohnityvnoyi psykholohiyi spilkuvannya, za nauk red. V.P. Kazmirenka, Natsionalʹna akademiya pedahohichnykh nauk Ukrayiny, Instytut sotsialʹnoyi ta politychnoyi psykholohiyi, Kirovohrad, Imeks-LTD, 2013.

[24] S. Holub, S. Kunytska, V. Grechaninov, Agent Functionals in Monitoring Information Systems. In: Shkarlet S. et al. (eds) Mathematical Modeling and Simulation of Systems. MODS 2021. Lecture Notes in Networks and Systems, vol 344. pp 227-237. Springer, Cham. URL: https://doi.org/10.1007/978-3-030-89902-8_18.

[25] A.G. Ivakhnenko, Induktivnyy metod samoorganizatsii modeley slozhnykh sistem. A.G. Ivakhnenko. K. Nauk. dumka, 1981.

[26] N.O. Khymytsia, S.V. Holub, Intellectual analysis of the results of the cliometric monitoring. Matematychni mashyny i systemy, № 4, 2019, pp. 87-92. URL: http://doi.org/10.34121/10289763-2019-4-87-92.

[27] M.S. Holub, Formuvannya masyvu vkhidnykh danykh pry klasyfikatsiyi tekstiv v tekhnolohiyi informatsiynoho monitorynhu. Matematychni mashyny i systemy, № 1, 2018, pp. 59-66. ISSN 1028-9763.