Intellectual Analysis of Textual Data in Social Networks Using BERT and XGBoost

2025;
: pp. 44 - 60
1
Lviv Polytechnic National University, Information Systems and Networks Department, Lviv, Ukraine
2
Lviv Polytechnic National University Information Systems and Networks Department, Lviv, Ukraine

This article presents a comprehensive approach to sentiment analysis in social networks by leveraging modern text processing methods and machine learning algorithms. The primary focus is the integration of the Sentence-BERT model for text vectorization and XGBoost for sentiment classification. Using the Sentiment140 dataset, an extensive study of text messages labeled with sentiment annotations was conducted. The Sentence-BERT model enables the generation of high-quality vector representations of textual data, preserving both lexical and contextual relationships between words. This contributes to a more accurate semantic understanding of messages, thereby enhancing classification performance. The results of the study demonstrate the high efficacy of the proposed model, achieving an overall classification accuracy of 90%. The ROC curve (AUC) value of 0.88 further confirms the model’s capability to distinguish between sentiment classes effectively. The Precision-Recall curve analysis highlights a strong balance between precision and recall, which is particularly crucial for handling imbalanced datasets. Additionally, calibration curves indicate a high degree of consistency between predicted probabilities and actual outcomes, while the cosine similarity matrix validates the model’s ability to capture semantic proximity between texts. Beyond classification, the study also examines the F1-score at various threshold levels, enabling the identification of the optimal operational range for the model. The cumulative gain chart illustrates the progressive improvement in classification performance, emphasizing the model’s stability when processing large-scale textual data. The proposed approach serves as a versatile tool for sentiment analysis, text clustering, and trend identification in social networks. The findings of this study have practical implications in fields such as marketing, public opinion analysis, automated content moderation, and social trend prediction.

  1. Aggarwal, P., & Mahajan, R. (2024). Shielding Social Media: BERT and SVM Unite for Cyberbullying Detection and Classification. Journal of Information Systems and Informatics, 6(2), 607–623. DOI: https://doi.org/10.51519/journalisi.v6i2.692
  2. Al-Qudah, D. A., Al-Zoubi, A. M., Cristea, A. I., Merelo-Guervós, J. J., Castillo, P. A., & Faris, H. (2025). Prediction of sentiment polarity in restaurant reviews using an ordinal regression approach based on evolutionary XGBoost. PeerJ Computer Science, 11, e2370–e2370. DOI: https://doi.org/10.7717/peerj-cs.2370
  3. Atmaja, A. I., Maimunah, M., & Sukmasetya, P. (2024). Analysis of Labeling and Class-Balancing Effects on Clash of Champions Sentiment Using LSTM and BERT. Journal of Information Systems and Informatics, 6(4), 2868–2891. DOI: https://doi.org/10.51519/journalisi.v6i4.929
  4. Aziz, K., Ji, D., Chakrabarti, P., Chakrabarti, T., Iqbal, M. S., & Abbasi, R. (2024). Unifying aspect-based sentiment analysis BERT and multi-layered graph convolutional networks for comprehensive sentiment dissection. Scientific Reports, 14(1). DOI: https://doi.org/10.1038/s41598-024-61886-7
  5. Batiuk, T., & Dosyn, D. (2023). Intellectual system for clustering users of social networks derived from the message sentiment analysis. Journal of Lviv Polytechnic National University Information Systems and Networks, 13, 121–138. DOI: https://doi.org/10.23939/sisn2023.13.121
  6. Batiuk, T., & Dosyn, D. (2024). Realization of the decision-making support system for twitter users’ publications analysis. Radio Electronics Computer Science Control, 1(24), 175-187. DOI: https://doi.org/10.15588/1607-3274- 2024-1-16
  7. He, L. (2024). Enhanced twitter sentiment analysis with dual joint classifier integrating RoBERTa and BERT architectures. Frontiers in Physics, 12. DOI: https://doi.org/10.3389/fphy.2024.1477714
  8. Ivokhin, E., & Oletsky, O. (2022). Restructuring of the Model “State–Probability of Choice” Based on Products of Stochastic Rectangular Matrices. Cybernetics and Systems Analysis, 58(2), 242-250. DOI: https://doi.org/10.1007/ s10559-022-00456-z
  9. Khan, A., Majumdar, D., & Mondal, B. (2025). Sentiment analysis of emoji fused reviews using machine learning and Bert. Scientific Reports, 15(1). DOI: https://doi.org/10.1038/s41598-025-92286-0
  10. Najeem Olawale Adelakun, & Abimbola Baale Adebisi. (2024). Sentiment analysis of financial news using the BERT model. ITEGAM-Journal of Engineering and Technology for Industrial Applications (ITEGAM-JETIA), 10(48). DOI: https://doi.org/10.5935/jetia.v10i48.1029
  11. Ogunleye, B., Sharma, H., & Shobayo, O. (2024). Sentiment Informed Sentence BERT-Ensemble Algorithm for Depression Detection. Big Data and Cognitive Computing, 8(9), 112. DOI: https://doi.org/10.3390/bdcc8090112
  12. Oletsky, O. (2021). Exploring Dynamic Equilibrium Of Alternatives On The Base Of Rectangular Stochastic Matrices. Modern Machine Learning Technologies and Data Science Workshop, MoMLeT&DS 2021, 5-6 June 2021, Lviv-Shatsk, Ukraine, 2917, 151-160. http://ceur-ws.org/Vol-2917/
  13. Roumeliotis, K. I., Tselikas, N. D., & Nasiopoulos, D. K. (2024). Leveraging Large Language Models in Tourism: A Comparative Study of the Latest GPT Omni Models and BERT NLP for Customer Review Classification and Sentiment Analysis. Information, 15(12), 792. DOI: https://doi.org/10.3390/info15120792
  14. Setiawan, M. J., & Vinna Rahmayanti Setyaning Nastiti. (2024). DANA App Sentiment Analysis: Comparison of XGBoost, SVM, and Extra Trees. Jurnal Sisfokom (Sistem Informasi Dan Komputer), 13(3), 337–345. DOI: https://doi.org/10.32736/sisfokom.v13i3.2239
  15. Singh, D., Barve, S., & Dwivedi, A. K. (2025). OptiASAR: Optimized Aspect Sentiment Analysis with BiLSTM- GRU  and  NER-BERT  in  Healthcare  Decision-making.  IEEE  Access,  1–1.  DOI:  https://doi.org/10.1109/access.2025.3549303
  16. Wang, Z. (2025). Sentiment Analysis of Mobile Phone Reviews Using XGBoost and Word Vectors. ITM Web of Conferences, 70, 03018. DOI: https://doi.org/10.1051/itmconf/20257003018