Mathematical modeling of multi-label classification of job descriptions using transformer-based neural networks

This article presents the mathematical modeling of the multi-label classification task of job description texts aimed at the automatic detection of working conditions and social benefits, which can enhance communication efficiency between employers and job seekers.  The proposed approach is based on the use of the transformer-based BERT neural network, pre-trained on a multilingual corpus.  The dataset was constructed by collecting job postings from the three largest Ukrainian job search platforms: Work.ua, Robota.ua, and Jooble.org.  The collected texts were augmented with artificially generated examples using large language models to ensure class balance.  An architecture was implemented for fine-tuning the BERT model in a multi-label classification mode using the Binary Cross-Entropy loss function.  To determine the optimal training configuration, a comparative analysis of four popular optimizers (SGD, AdaGrad, RMSprop, AdamW) was conducted under various learning rate values.  The model's performance was evaluated using precision, recall, and F1-score metrics. The experimental results demonstrated that the highest classification quality was achieved using the AdamW optimizer with an appropriately selected learning rate.  The novelty of the study lies in combining transformer architecture with an applied task in the field of job description text processing, which enables increased informativeness of postings and automation of preliminary analysis of working conditions.  The proposed approach can serve as a foundation for developing tools in HR systems and can be integrated into recruitment platforms to improve the relevance of job postings to the needs of target audiences.

  1. Devlin J., Chang M.-W., Lee K., Toutanova K.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.  Preprint arXiv:1810.04805 (2018).
  2. Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V.  RoBERTa: A Robustly Optimized BERT Pretraining Approach.  Preprint arXiv:1907.11692 (2019).
  3. Leon F., Gavrilescu M., Floria S.-A., Minea A. A.  Hierarchical Classification of Transversal Skills in Job Advertisements Based on Sentence Embeddings.  Information.  15 (3), 151 (2024).
  4. Al-Smadi B. S.  DeBERTa-BiLSTM: A Multi-Label Classification Model of Arabic Medical Questions Using Pre-Trained Models and Deep Learning.  Computers in Biology and Medicine.  170, 107921 (2024).
  5. Chen Q., Du J., Allot A., Lu Z.  LitMC-BERT: Transformer-Based Multi-Label Classification of Biomedical Literature with an Application on COVID-19 Literature Curation.  IEEE/ACM Transactions on Computational Biology and Bioinformatics.  19 (5), 2584–2595 (2022).
  6. Tran H. T., Vo H. H. P., Luu S. T.  Predicting Job Titles from Job Descriptions with Multi-label Text Classification.  Preprint arXiv:2112.11052 (2021).
  7. Bhola A., Halder K., Prasad A., Kan M.-Y.  Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework.  Proceedings of the 28th International Conference on Computational Linguistics.  5832–5842 (2020).
  8. Cheng X., Lin H., Wu X., Shen D., Yang F., Liu H., Shi N.  MLTR: Multi-Label Classification with Transformer.  Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME).  1–6 (2022).
  9. Chang W.-C., Yu H.-F., Zhong K., Yang Y., Dhillon I.  Taming Pretrained Transformers for Extreme Multi-label Text Classification.  KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.  3163–3171 (2020).
  10. Zhang D., Liu J., Zhu H., Liu Y., Wang L., Wang P., Xiong H.  Job2Vec: Job Title Benchmarking with Collective Multi-View Representation Learning.  Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM '19).  2763–2771 (2019).
  11. Zhang W., Jiang Y., Fang Y., Pan S.  Hierarchical Contrastive Learning for Multi-label Text Classification.  Scientific Reports.  15, 14101 (2025).
  12. Wang G., Du Y., Jiang Y., Liu J., Li X., Chen X., Gao H., Xie C., Lee Y.  Label-text Bi-attention Capsule Networks Model for Multi-label Text Classification.  Neurocomputing.  588, 127671 (2024).
  13. Brown T. B., Mann B., Ryder N., Subbiah M., Kaplan J., Dhariwal P., et al.  Language Models are Few-Shot Learners.  Preprint arXiv:2005.14165 (2020).
  14. Tarekegn A. N., Ullah M., Cheikh F. A.  Deep Learning for Multi-Label Learning: A Comprehensive Survey.  Preprint arXiv:2401.16549 (2024).
  15. Nissa N. K., Yulianti E.  Multi-label text classification of Indonesian customer reviews using bidirectional encoder representations from transformers language model.  International Journal of Electrical and Computer Engineering.  13 (5), 5641–5652 (2023).
  16. Li B., Li S., Qiu W., Liu T.  Bert-based model for Ceramic Emotional Text Recognition and Multi-label Classification.  Computer Life.  12 (1), 12–20 (2024).
  17. Morales-Hernández R. C., Jagüey J. G., Becerra-Alonso D.  A Comparison of Multi-Label Text Classification Models in Research Articles Labeled with Sustainable Development Goals.  IEEE Access.  10, 123456–123469 (2022).
  18. Lee M. C. H., Braet J., Springael J.  Performance Metrics for Multilabel Emotion Classification: Comparing Micro, Macro, and Weighted F1-Scores.  Applied Sciences.  14 (21), 9863 (2024).
  19. Liu J., Chang W.-C., Wu Y., Yang Y.  Deep Learning for Extreme Multi-label Text Classification.  {SIGIR '17}: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.  115–124 (2017).
  20. Loshchilov I., Hutter F.  Decoupled Weight Decay Regularization.  Preprint arXiv:1711.05101 (2017).
  21. Touvron H., Cord M., Douze M., Massa F., Sablayrolles A., Hervé J.  Training Data-Efficient Image Transformers & Distillation Through Attention.  Preprint arXiv:2012.12877 (2021).
  22. Liu S., Zhang L., Yang X., Su H., Zhu J.  Query2Label: A Simple Transformer Way to Multi-Label Classification.  Preprint arXiv:2107.10834 (2021).
  23. Tian X., Qin Y., Huang R., Chen Y.  A Label Information Aware Model for Multi-label Text Classification.  Neural Processing Letters.  56, 242 (2024).
  24. Google Developers. Classification: Accuracy, Recall, Precision, and Related Metrics. Google Machine Learning Crash Course (2024). https://developers.google.com/machine-learning/crash-course/classificati....