Hybridizing Large Language Models and Markov Processes: a New Paradigm for Autonomous Penetration Testing

2025;
: pp. 146 - 150
1
Lviv Polytechnic National University, Ukraine
2
Lviv Polytechnic National University, Ukraine

The article explores a hybrid framework for autonomous penetration testing that integrates Large Language Models (LLMs) with Markov decision processes (MDP/POMDP) and reinforcement learning (RL). Conventional penetration testing is increasingly insufficient for modern, complex cyber threats. LLMs are utilized for high-level strategic planning, generating potential attack paths, while MDP/POMDP models combined with RL execute low-level actions under uncertainty. A feedback loop allows outcomes to refine strategies in dynamic and partially observable environments. A conceptual hybrid architecture has been proposed, accompanied by a workflow diagram and an illustrative table showing potential decision outcomes. This paradigm enhances automation, adaptability, efficiency, and scalability, providing a pathway toward next-generation AI-driven cybersecurity assessment tools.

[1] Kozlovska, M., Piskozub, A., & Khoma, V. (2025). Artificial intelligence in penetration testing: Leveraging AI for advanced vulnerability detection and exploitation. Artificial   Intelligence,    10(1).   DOI: https://doi.org/10.23939/acps2025.01.065

[2] Zhuravchak, A. Yu., Piskozub, A. Z., & Zhuravchak, D. Yu. (2025). Analysis of penetration testing automation using Markov decision processes. Modern Information Protection, (1), 82–88. DOI: https://doi.org/10.31673/2409-7292.2025.017625

[3] Adawadkar, A. M. K., & Kulkarni, N. (2022).  Cyber- security  and  reinforcement  learning   –   a   brief survey. Engineering  Applications  of   Artificial Intelligence, 114, 105116. DOI: https://doi.org/10.1016/j.engappai.2022.105116

[4] Sewak, M., Sahay, S. K., & Rathore, H. (2023). Deep reinforcement learning in the advanced cybersecurity threat detection and protection. Information Systems Frontiers, 25(2),   589-611. DOI:https://doi.org/10.1201/9781351006620

[5]     Xu, H., Wang, S., Li, N., Wang, K., Zhao, Y., Chen, K.,  ... & Wang, H. (2024). Large language models for cyber security: A systematic literature review. DOI: https://doi.org/10.48550/arXiv.2405.04760

[6] Kurniawati, H. (2022). Partially  observable  markov decision processes and robotics. Annual Review of Control, Robotics, and Autonomous Systems, 5(1), 253-277. DOI: https://doi.org/10.1146/annurev-control-042920-092451

[7] Casper, S., Davies, X., Shi, C., Gilbert, T. K., Scheurer, J., Rando, J., ... & Hadfield-Menell, D. (2023). Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217. DOI: https://doi.org/10.48550/arXiv. 2307.15217

[8] Sivakoumar, R., & MP, S. R. (2025, March). Next-Gen Penetration Testing: AI, Automation & Beyond. In 2025 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/ICDSAAI65575.2025.11011887

[9] Gioacchini, L., Mellia, M., Drago, I., Delsanto, A., Siracusano, G., & Bifulco, R. (2024). Autopenbench: Benchmarking  generative  agents   for   penetration testing. arXiv preprint arXiv:2410.03225. DOI: https://doi.org/10.48550/arXiv.2410.03225

[10] Muzsai, L., Imolai, D., & Lukács, A. (2024). Hacksynth: Llm agent and evaluation framework for autonomous penetration testing. arXiv preprint arXiv:2412.01778. DOI: https://doi.org/10.48550/arXiv.2412.01778

[11] Pagan, N., Baumann, J., Elokda, E., De Pasquale, G., Bolognani, S., & Hannák, A. (2023, October). A classification of feedback loops and their relation to biases in automated decision-making systems. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms,  Mechanisms,  and  Optimization (pp.  1-14). DOI: https://doi.org/10.1145/3617694.3623227

[12]  Kong, H., Hu, D., Ge, J., Li, L., Li, T., & Wu, B. (2025). Vulnbot: Autonomous penetration testing for a multi-agent collaborative framework. arXiv   preprint arXiv:2501.13411. DOI:https://doi.org/10.48550/arXiv.2501.13411

[13] Greco, C., Fortino, G., Crispo, B., & Choo, K.  K.  R. (2023). AI-enabled IoT penetration testing: state-of-the-art and research challenges. Enterprise Information Systems, 17(9),  2130014. DOI:https://doi.org/10.1080/17517575.2022.2130014

[14] McKee, K. R., Leibo, J. Z., Beattie, C., & Everett, R. (2022). Quantifying the effects of environment and population diversity in multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems, 36(1), 21. DOI: https://doi.org/10.48550/arXiv.2102.08370