OPTIMIZATION OF DISASTER RECOVERY PROCESSES OF INFORMATION INFRASTRUCTURE SERVICES

Marian Kyryk; Zablotskyi S.; Pohranychnyi V.; Tarasenko A.

The article describes the optimization of the process of disaster recovery of information infrastructure services by implementing the ability to restore service functionality without requiring a full recovery from backup storage. It outlines the criteria and parameters of the network that have a critical impact on recovery in the event of emergencies, allowing for the assessment of the solution's effectiveness in post-disaster recovery. A modification to the MTTR (Mean Time To Recovery) parameter is proposed for cases involving system element recovery from a backup or through service configuration restoration via infrastructure as code, with data necessary for service operation retrieved from backup storage, thereby accelerating the recovery process of a failed information infrastructure service. The article presents a scheme for infrastructure recovery organization by creating a backup location (Cold Site) for local infrastructure using dedicated cloud providers. The proposed solution utilizes Proxmox Backup Server capabilities for regular backups of critically important data center components. Following the development of a flowchart for the service recovery method from the Cold Site, research findings indicated that, for some services, reinstating configurations from code is more advantageous and speeds up the recovery process more than complete service restoration from backup storage.

reliability

fault-tolerant system

[1] J. Blough. “5 Basics for Disaster Recovery in the Data Center.” ServiceExpress.com. https://serviceexpress.com/resources/5-basics-disaster-recovery-preparat... (accessed May 15, 2023).

[2] NAKIVO Team. “Data Center Disaster Recovery: A Complete Guide.” nakivo.com. https://www.nakivo.com/blog/data-center-disaster-recovery-a-complete-guide/ (accessed Jun. 5, 2023).

[3] B. Brazil, “Alerting,” in Prometheus: Up & Running: Infrastructure and Application Performance Monitoring, 1th ed. Sebastopol, CA, USA: O’Reilly media, 2018, pp. 291-303.

[4] M., Pokharel, S., Lee, J. S. Park, “Disaster recovery for sys-tem architecture using cloud computing,” 2010 IEEE/IPSJ 10th International Symposium on Applications and the Internet (SAINT), 2010, pp. 303-308. doi: 10.1109/SAINT.2010.23.

[5] S. Hochstetler, O. Magroski and P. Glasmacher, “High availability solutions,” in Deploying Mission Critical Applications with Linux on POWER, 1th ed. Armonk, NY, USA: IBM Redbook, 2007, pp. 55-82. [Online]. Available: https://www.redbooks.ibm.com/redbooks/pdfs/sg247286.pdf

[6] S. Peterson and J. Hilliard. “Network disaster recovery plan.” techtarget.com. https://www.techtarget.com/searchdisasterrecovery/definition/Network-dis... (accessed May 22, 2023).

[7] K. Elgdamsi, M. Embarak, “Implementing a Disaster Recovery Solution for Datacenters Using VMware Site Recovery Manager,” TUJES, vol. 04, no. 01, June 2023.

[8] W. Ahmed, “Chapter 10. Proxmox High Availability,” in Mastering Proxmox - Third Edition: Build virtualized environments using the Proxmox VE hypervisor, 3th ed. Birmingham, United Kingdom: Packt, 2017, pp. 491-520.

[9] B. Meijer, L. Hochstein and R. Moser, “Chapter 22. CI/CD and Ansible,” in Ansible: Up and Running, 3th ed. Sebastopol, CA, USA: O’Reilly media, 2022, pp. 567-589.

[10] M. Heap, “Chapter 7: Orchestrating AWS,” in Ansible From Beginner to Pro, 1th ed. New York, NY, USA: Apress Media, 2016, pp. 99-124.

[11] J. Turnbull, “Scaling and Reliability,” in Monitoring with Prometheus, 1th ed. Brooklyn, NY, USA: Turnbull Press, 2018, pp. 217-237.

[12] Proxmox VE Administration Guide RELEASE 8.1.3, 2023. Proxmox Server Solutions GmbH.