A Graph-Based Model for Automatic Test Case Generation from Textual Requirements with Hierarchical Coverage

The automation of software test case generation from natural language requirements remains a critical challenge in software engineering.  While large language models (LLMs) demonstrate impressive generation capabilities, they suffer from high discrepancy rates (up to 57% for direct generation), hallucinated test steps, and lack formal verification mechanisms for safety-critical constraints.  This paper presents a novel algorithmic framework that addresses these limitations through five principal contributions.  First, we introduce the Neuro-Symbolic Requirements Graph (NSRG) model $GNS = (V, E, w, \Gamma, \Psi)$, which integrates transformer-derived semantic dependencies with Linear Temporal Logic (LTL) constraints ($\Psi$) and control flow subgraphs ($\Gamma$).  Second, we derive the Extended Minimum Coverage Theorem establishing theoretical lower bounds for hierarchical coverage.  Third, we propose the Hybrid Semantic-Temporal Coverage (HSTC) metric utilizing Determinantal Point Processes (DPP) to systematically optimize test suite diversity.  Fourth, we develop the MCTS-Guided Neuro-Symbolic Traversal (MCTS-NST) algorithm based on Monte Carlo Tree Search.  Fifth, we implement Dynamic HNSW Indexing reducing graph construction complexity from $\mathcal{O}(|V|^2)$ to $\mathcal{O}(|V|\log |V|)$.  Experimental evaluation on four benchmark datasets (3,581 requirements) demonstrates: 96.8% precision, 98.4% recall, 99.1% LTL satisfaction rate for negative requirements, and 0.94% discrepancy rate.  The framework achieves practitioner acceptance scores of 4.85/5.

  1. Navarro J., Ibarra R.  Automatic Test Case Generation Using NLP: A Systematic Mapping Study.  Information and Software Technology.  189, 107929 (2025).
  2. Yang Z., Cui C., Li T., Huang R., Niu N., Towey D., Guo S.  LLMCFG-TGen: Using LLM-Generated Control Flow Graphs to Automatically Create Test Cases from Use Cases.  Preprint arXiv:2512.06401 (2025).
  3. Cunha A., Macedo N.  Validating Formal Specifications with LLM-generated Test Cases.  Preprint arXiv:2510.23350 (2025).
  4. Chaudhuri S., Ellis K., Polozov O., Singh R., Solar-Lezama A., Yue Y.  Neurosymbolic Programming.  Foundations and Trends in Programming Languages.  7 (3), 158–243 (2021).
  5. Kocsis L., Szepesvári C.  Bandit Based Monte-Carlo Planning.  Machine Learning: ECML 2006. 282–293 (2006).
  6. Kulesza A., Taskar B.  Determinantal Point Processes for Machine Learning.  Foundations and Trends in Machine Learning.  5 (2–3), 123–286 (2012).
  7. Malkov Yu. A., Yashunin D. A.  Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.  IEEE Transactions on Pattern Analysis and Machine Intelligenc.  42 (4), 824–836 (2020).
  8. Alshahwan N., Chheda J., Finegenova A., Gokkaya B., Harman M., Harper I., Marginean A., Sengupta S., Wang E.  Automated Unit Test Improvement using Large Language Models at Meta.  Preprint arXiv:2402.09171 (2024).
  9. Chu B., Feng Y., Liu K., Guo Z., Zhang Y., Shi H., Nan Z., Xu B.  Large Language Models for Unit Test Generation: Achievements, Challenges, and Opportunities.  Preprint arXiv:2511.21382 (2025).
  10. Camacho A., Baier J. A., Muise C., McIlraith S. A.  Finite LTL Synthesis as Planning.  Proceedings of the International Conference on Automated Planning and Scheduling.  28, 29–38 (2018).
  11. Zhou A., Yan K., Shlapentokh-Rothman M., Wang H., Wang Y.-X.  Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models.  Preprint arXiv:2310.04406 (2024).
  12. Song X., Zhang S., Yu T.  ReKG-MCTS: Reinforcing LLM Reasoning on Knowledge Graphs via Training-Free Monte Carlo Tree Search.  Findings of the Association for Computational Linguistics: ACL 2025.  9288–9306 (2025).
  13. Antoniades A., Orwall A., Zhang K., Xie Y., Goyal A., Wang W.  SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement.  Preprint arXiv:2410.20285 (2025).
  14. Ferrari A., Spagnolo G. O., Gnesi S.  PURE: A Dataset of Public Requirements Documents.  2017 IEEE 25th International Requirements Engineering Conference (RE).  502–505 (2017).
  15. Jain K., Synnaeve G., Rozière B.  TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark.  Preprint arXiv:2410.00752 (2024).
  16. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., Polosukhin I.  Attention is All you Need.  NeurIPS Proceedings.  5998–6008 (2017).
  17. Reimers N., Gurevych I.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.  Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3982–3992 (2019).
  18. Johnson D. S.  Approximation algorithms for combinatorial problems.  Journal of Computer and System Sciences.  9 (3), 256–278 (1974).
  19. Cleland-Huang J., Settimi R., Zou X., Solc P.  The Detection and Classification of Non-Functional Requirements with Application to Early Aspects.  14th IEEE International Requirements Engineering Conference (RE'06).  39–48 (2006).