Intelligent Automated System for Parsing and Ranking Resumes

Resume parsing is a method used to extract key information from resumes, allowing for further actions such as candidate selection and ranking.  In traditional recruitment processes, companies often handle thousands of resumes manually or require applicants to follow a pre-defined template.  However, the evolving recruitment environment calls for more advanced technological solutions and efficient resume analysis methods.  Although various basic techniques can analyze structured documents, they are inadequate for processing unstructured formats such as PDF, DOC, and DOCX.  The current methods for resume parsing primarily rely on techniques such as BERT, Natural Language Processing (NLP), keyword-based models, and named entity recognition (NER) models.  In response to this, the proposed system introduces a new approach that uses Computer Vision through YOLOv8 and Large Language Models (LLMs) for enhanced performance and broader API integration.  YOLOv8 is used for resume segmentation, while Tesseract OCR extracts relevant information in variable text format.  The extracted data are then processed by two LLMs using the Gemini and OpenAI APIs, which compute similarity scores and rank candidates according to specific criteria.

  1. Kang J., Zhao L., Wang K., Zhang K.  Research on an improved YOLOv8 image segmentation model for crop pests.  Advances in Computer, Signals and Systems.  7 (3), 1–8 (2023).
  2. Garai S. K., Paul O., Dey U., Ghoshal S., Biswas N., Mondal D. S.  A Novel Method for Image to Text Extraction Using Tesseract-OCR.  American Journal of Electronics & Communication.  3 (2), 8–11 (2024).
  3. Silva K., Frommholz I., Can B., Blain F., Sarwar R., Ugolini L.  Forged-GAN-BERT: Authorship Attribution for LLM-Generated Forged Novels.  Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop. 325–337 (2024).
  4. Aboah A., Wang B., Bagci U., Adu-Gyamfi Y.  Real-Time Multi-Class Helmet Violation Detection Using Few-Shot Data Sampling Technique and YOLOv8.  Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.  5350–5358 (2023).
  5. Wang P., Deng H., Guo J., Ji S., Meng D., Bao J., Zuo P.  Leaf Segmentation Using Modified YOLOv8-Seg Models.  Life.  14 (6), 780 (2024).
  6. Yilmaz B., Kutbay U.  YOLOv8 Based Drone Detection: Performance Analysis and Optimization.  Computers.  13 (9), 234 (2024).
  7. Joshi K.  Study of Tesseract OCR.  GLS KALP: Journal of Multidisciplinary Studies.  1 (2), 41–50 (2021).
  8. Benaissa A., Bahri A., Allaoui A. E., Salahddine M. A.  Build a Trained Data of Tesseract OCR Engine for Tifinagh Script Recognition.  Data and Metadata.  2, 185–185 (2023).
  9. Sporici D., Cușnir E., Boiangiu C.-A.  Improving the Accuracy of Tesseract 4.0 OCR Engine Using Convolution-Based Preprocessing.  Symmetry.  12 (5), 715 (2020).
  10. Xu S., Wu Z., Zhao H., Shu P., Liu Z., Liao W., Li S., Sikora A., Liu T., Li X.  Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis.  Preprint arXiv:2402.11398 (2024).
  11. Yahiaoui F., Limouni E.  Génération et annotation de corpus pour l'entrainement et l'évaluation de modèles d'extraction de relations: utilisation de bibliothèques de génération de données et de LLMs.  Institut des sciences informatiques et de leurs interactions – CNRS Sciences informatiques. hal-04678383 (2024).
  12. Résumé automatique de textes d'enquêtes judiciaires: retour d'expérience. Institut des sciences informatiques et de leurs interactions – CNRS Sciences informatiques. hal-04678366 (2024).
  13. De Murcia G., El-Allali I., Meineri L., Gillard L., Lastmann S.  Rapport de Participation de Smart Tribune à EvalLLM2024: Quelques Usages de LLMs dans l'Univers de la Reconnaissance d'Entités Nommées.  Atelier sur l'evaluation des modeles generatifs (LLM) et challenge d'extraction d'information few-shot. hal-04678371 (2024).
  14. Octavany O., Wicaksana A.  Cleveree: an artificially intelligent web service for Jacob voice chatbot.  TELKOMNIKA Telecommunication, Computing, Electronics and Control.  18 (3), 1422–1432 (2020).
  15. Anwar A.  What is Average Precision in Object Detection & Localization Algorithms and how to calculate it? https://is.gd/Jbgn1P.
  16. Wiik L.  OpenAI's GPT-4o vs. Gemini 1.5 Context Memory Evaluation.  Medium (2024).
  17. Zahour O., Benlahmar E. H., Eddaoui A., Ouchra H., Hourrane O.  A system for educational and vocational guidance in Morocco: Chatbot E-Orientation.  Procedia Computer Science.  175, 554–559 (2020).