Machine learning of software agents for classification sociotypes

2024;
Authors: 

Serhii Holub1, Oleksandr Mоrushko2, Nataliia Khymytsia2, Maria Holub1, and Marek Aleksander3 

1. Cherkasy State Technological University 

2. Lviv Polytechnic National University

3. State Higher Vocational School in Nowy Sacz

For the first time, it was proposed to use the information technology of intellectual monitoring to classify the authors of printed texts by sociotypes. The relevance of the work is determined by the need to automate the processes of describing the properties of the authors of messages, in particular in social networks, in order to develop controlling influences on a group of individuals and individuals. There is a relatively detailed description of the properties of a person belonging to one or another sociotype. This allows for the development of effective management influences. The use of this tool of information warfare aims to protect the information space of Ukraine from hostile actions. Research tasks are formalized as the task of classifying authors of printed texts. Sociotypes were used as classes. Each of the classes contained texts by 3 authors who had already passed the examination and their belonging to one of the sociotypes was established. To build text classifier models, one of the methods of machine learning is used - the method of group consideration of arguments (Group Method of Data Handling). To conduct the experiment, a monitoring software agent was built to perform Text Mining tasks. For each class of messages, a separate dictionary of features was formed and an array of numerical characteristics of texts based on these features was built, which is called the Array of Input Data. The agent was trained on massive input data describing the texts of 2 authors belonging to each class. Testing of agent models was based on the numerical descriptions of the 3rd author of each of the classes. In this way, the testing of models took place on the basis of data that did not participate in the creation of models. As a result of testing, all texts and, accordingly, their authors have been correctly classified. The hypothesis about the possibility of classifying people according to sociotypes through the intellectual analysis of their text messages has been experimentally confirmed. 

[1] T. S. Yatsenko, The problem of verbalisation and cognition of systemic characteristics of the unconscious content of the subject's psyche. Professional education: pedagogy and psychology (1999) 391 - 402.

[2] E. Stamatatos, A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology, Vol. 60, no 3, (2009), 538-556. doi:10.1002/asi.21001.

 [3] R. Zheng, Li. J. Chen, Z. Huang, A framework for authorship identification of online messages: Writing‐style features and classification techniques. Journal of the American society for information science and technology, Vol. 57. №. 3, (2006), pp. 378-393. doi:10.1002/asi.20316.  

[4] M. AlSallal, R. Iqbal, V. Palade, S. Amin, V. Chang, An integrated approach for intrinsic plagiarism detection. Future Generation Computer Systems, Vol. 96., (2019) pp. 700-712. doi:10.1016/j.future.2017.11.023.  

[5] B. Alhijawi, S. Hriez, A. Awajan, Text-based authorship identification - A survey. Paper presented at the 5th International Symposium on Innovation in Information and Communication Technology, ISIICT 2018. (2018), pp. 1-7. doi:10.1109/ISIICT.2018.8613287.

 [6] O. O. Morushko, N.O. Khymytsia, R.O. Holoshchuk, Modeling of social information exchange: study guide.  Lviv, Publishing House of Lviv Polytechnic, (2023).

[7] V. Papish, et al. Origins and development of the theory of psychoaccentuation in East Slavic linguistics: a fragmentary review (end of the 20th–beginning of the 21st centuries). Acta Polono-Ruthenica, (2021), 3.XXVI: 101-118.

[8] O. A. Galustyan, L.M. Zakharenko, V.O.   Kazmirenko (under the editorship of O.I. Motlyakh), Compilation of a psychological profile of an unidentified person based on the characteristics of his written text. Kyiv, (2020).

[9] C. C. Aggarwal, Text classification: basic models. In Machine Learning for Text, Springer, Cham. (2022). https://doi.org/10.1007/978-3-030-96623-2_5

[10] N. Khymytsia, S. Holub, M. Holub, O. Mоrushko, Personality prediction from social networks: a review of works, CEUR Workshop Proceedings,  (2022), 3296, 72–82.

[11] О. Morushko, N. Khymytsia, N. Shakhovska, Determining the psychological portrait of members of web communities through socionic analysis, CEUR Workshop Proceedings, (2020), 2616, 112–124.

[12] О. Morushko, N. Khymytsia, V. Teslyuk, Remote selection of staff based on socionic analysis of social network, Content CEUR Workshop Proceedings, (2022), 3171, 138–149.

 [13] S. V. Holub, H. I Martynova, M. S. Holub, Modelyuvannya dialektnoho tekstu v tekhnolohiyi bahatorivnevoho informatsiynoho monitoringu, Matematychni mashyny i systemy, (2016), № 4, 76-83.

[14] S. Holub, N. Khymytsia, M. Holub, O. Mоrushko: Machine learning of the classifier of authors of social network messages, SCIA-2022, 1st International Workshop on Social Communication and Information Activity in Digital Humanities, October 20, (2022), Lviv, Ukraine. Сeur-ws.org.Vol-3296.paper11.

[15] D. Tolbatov, S. Holub, Construction of models of monitoring agents on several reference forms. CEUR Workshop Proceedings Volume 3126, (2021), 108-112, Short Paper Proceedings of the 2nd International Conference on Intellectual Systems and Information Technologies (ISIT 2021) co-located with 1st International Forum "Digital Reality" (DRForum 2021) Odesa, Ukraine, September 13-19, 2021.

[16] A.G. Ivakhnenko, Inductive method of self-organization of models of complex systems. Kiev, Naukova Dumka, (1981). M. Holub, O. Piven, Classification of texts in the technology of multilevel information monitoring. Inzynier XXI wieku, Monografia, Wydawnictwo naukowe Akademii Techniczno-Humanistycznej w Bielsku-Bialej, (2016), 119-122.