Hybridizing Large Language Models and Markov Processes: a New Paradigm for Autonomous Penetration Testing

Mariia Kozlovska; Andrian Piskozub

The article explores a hybrid framework for autonomous penetration testing that integrates Large Language Models (LLMs) with Markov decision processes (MDP/POMDP) and reinforcement learning (RL). Conventional penetration testing is increasingly insufficient for modern, complex cyber threats. LLMs are utilized for high-level strategic planning, generating potential attack paths, while MDP/POMDP models combined with RL execute low-level actions under uncertainty. A feedback loop allows outcomes to refine strategies in dynamic and partially observable environments. A conceptual hybrid architecture has been proposed, accompanied by a workflow diagram and an illustrative table showing potential decision outcomes. This paradigm enhances automation, adaptability, efficiency, and scalability, providing a pathway toward next-generation AI-driven cybersecurity assessment tools.

Autonomous Penetration Testing

Large Language Models

Markov Processes

reinforcement learning

cybersecurity

Hybrid AI Architectures

[1] Kozlovska, M., Piskozub, A., & Khoma, V. (2025). Artificial intelligence in penetration testing: Leveraging AI for advanced vulnerability detection and exploitation. Artificial Intelligence, 10(1). DOI: https://doi.org/10.23939/acps2025.01.065

[2] Zhuravchak, A. Yu., Piskozub, A. Z., & Zhuravchak, D. Yu. (2025). Analysis of penetration testing automation using Markov decision processes. Modern Information Protection, (1), 82–88. DOI: https://doi.org/10.31673/2409-7292.2025.017625

[3] Adawadkar, A. M. K., & Kulkarni, N. (2022). Cyber- security and reinforcement learning – a brief survey. Engineering Applications of Artificial Intelligence, 114, 105116. DOI: https://doi.org/10.1016/j.engappai.2022.105116

[4] Sewak, M., Sahay, S. K., & Rathore, H. (2023). Deep reinforcement learning in the advanced cybersecurity threat detection and protection. Information Systems Frontiers, 25(2), 589-611. DOI:https://doi.org/10.1201/9781351006620

[5] Xu, H., Wang, S., Li, N., Wang, K., Zhao, Y., Chen, K., ... & Wang, H. (2024). Large language models for cyber security: A systematic literature review. DOI: https://doi.org/10.48550/arXiv.2405.04760

[6] Kurniawati, H. (2022). Partially observable markov decision processes and robotics. Annual Review of Control, Robotics, and Autonomous Systems, 5(1), 253-277. DOI: https://doi.org/10.1146/annurev-control-042920-092451

[7] Casper, S., Davies, X., Shi, C., Gilbert, T. K., Scheurer, J., Rando, J., ... & Hadfield-Menell, D. (2023). Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217. DOI: https://doi.org/10.48550/arXiv. 2307.15217

[8] Sivakoumar, R., & MP, S. R. (2025, March). Next-Gen Penetration Testing: AI, Automation & Beyond. In 2025 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/ICDSAAI65575.2025.11011887

[9] Gioacchini, L., Mellia, M., Drago, I., Delsanto, A., Siracusano, G., & Bifulco, R. (2024). Autopenbench: Benchmarking generative agents for penetration testing. arXiv preprint arXiv:2410.03225. DOI: https://doi.org/10.48550/arXiv.2410.03225

[10] Muzsai, L., Imolai, D., & Lukács, A. (2024). Hacksynth: Llm agent and evaluation framework for autonomous penetration testing. arXiv preprint arXiv:2412.01778. DOI: https://doi.org/10.48550/arXiv.2412.01778

[11] Pagan, N., Baumann, J., Elokda, E., De Pasquale, G., Bolognani, S., & Hannák, A. (2023, October). A classification of feedback loops and their relation to biases in automated decision-making systems. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (pp. 1-14). DOI: https://doi.org/10.1145/3617694.3623227

[12] Kong, H., Hu, D., Ge, J., Li, L., Li, T., & Wu, B. (2025). Vulnbot: Autonomous penetration testing for a multi-agent collaborative framework. arXiv preprint arXiv:2501.13411. DOI:https://doi.org/10.48550/arXiv.2501.13411

[13] Greco, C., Fortino, G., Crispo, B., & Choo, K. K. R. (2023). AI-enabled IoT penetration testing: state-of-the-art and research challenges. Enterprise Information Systems, 17(9), 2130014. DOI:https://doi.org/10.1080/17517575.2022.2130014

[14] McKee, K. R., Leibo, J. Z., Beattie, C., & Everett, R. (2022). Quantifying the effects of environment and population diversity in multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems, 36(1), 21. DOI: https://doi.org/10.48550/arXiv.2102.08370