NEURAL NETWORK BASED CONTROL MODEL FOR WALKING PLATFORMS

M. I. Matsiuk; T. V. Teslyuk

This article presents a comprehensive study of a control model for legged robotic platforms, particularly hexapods, based on the application of deep reinforcement learning techniques. The relevance of employing artificial neural networks to form adaptive robot behavior in undefined conditions is substantiated, enabling greater flexibility and robustness in dynamic environments.

The study includes a detailed analysis of modern simulation tools, including MuJoCo, PyBullet, Webots, Unity ML- Agents, Gazebo, and Nvidia Isaac Gym. Based on criteria such as computational efficiency, compatibility with popular deep learning frameworks, and scalability, the Nvidia Isaac Gym simulator is justified as the primary environment for agent training and simulation.

A model was developed for the controlsystem that integrates data from multiple sensors – proprioceptive sensors, inertial measurement units (IMU), and stereo cameras. A data preprocessing pipeline is proposed, involving filtering, normalization, and the generation of a height map of the surrounding environment in the form of a fixed-size tensor. This approach enhances the generalization capabilities and stability of the trained neural network when transitioning from simulated to real-world environments.

A supervisory module has been introduced to perform short-term prediction of neural network actions and validate their outputs against the platform’s physical constraints using a mathematical model of the robot. This real-time mechanism allows for the early detection of potentially hazardous behaviors and facilitates proactive mitigation of dangerous situations.

The scientific novelty of the research lies in the comprehensive development of a hybrid control system that combines deep learning techniques with elements of classical control, thereby improving the reliability and practical applicability of the system in real-world scenarios. The practical significance of the work is determined by the scalability and adaptability of the proposed architecture for deployment in autonomous mobile robotic systems operating in complex, dynamic, and unpredictable environments.

The conclusions highlight the importance of the hybrid approach that combines the strengths of classical control algorithms with the adaptability of deep learning. Future research directions include enhancing safety mechanisms and conducting real-world testing of the proposed system.

reinforcement learning

[1] Antonova, R., Ramos, F., Possas, R., & Fox, D. (2021). BayesSimIG: Scalable parameter inference for adaptive domain randomization with IsaacGym. arXiv preprint arXiv:2107.04527. https://doi.org/10.48550/arXiv.2107.04527

[2] Azayev, T., & Zimmerman, K. (2020). Blind hexapod locomotion in complex terrain with gait adaptation using deep reinforcement learning and classification. Journal of Intelligent & Robotic Systems, 99(3–4), 659–671. https://doi.org/10.1007/s10846-020-01162-8

[3] Campos, R., Matos, V., & Santos, C. (2010). Hexapod Locomotion: A nonlinear Dynamical Systems approach. IECON 2010 – 36th Annual Conference on IEEE Industrial Electronics Society, 48, 1546–1551. https://doi.org/10.1109/iecon.2010.5675454

[4] Espinosa, J., Gorrostieta, E., Vargas-Soto, E., & Ramos- Arreguín, J. M. (2020). Reinforcement learning applied to hexapod robot locomotion: An overview. In Telematics and Computing (pp. 185–201). Springer International Publishing. https://doi.org/10.1007/978-3-030-62554-2_14

[5] Ferigo, D., Traversaro, S., Metta, G., & Pucci, D. (2020). Gym-Ignition: Reproducible robotic simulations for reinforcement learning. 2020 IEEE/SICE International Symposium on System Integration (SII), 885–890. https://doi.org/10.1109/SII46433.2020.9025951

[6] Hong, J., Tang, K., & Chen, C. (2017). Obstacle avoidance of hexapod robots using fuzzy Q-learning. 2017 IEEE Symposium Series on Computational Intelligence (SSCI). https://doi.org/10.1109/SSCI.2017.8280907

[7] Jain, D., Iscen, A., & Caluwaerts, K. (2019). Hierarchical reinforcement learning for quadruped locomotion. arXiv preprint arXiv:1905.08926. https://doi.org/10.48550/arXiv.1905.08926

[8] Juliani, A., Berges, V.-P., Esh Vckay, Gao, Y., Henry, H., Mattar, M., & Lange, D. (2018). Unity: A General Platform for Intelligent Agents. ArXiv (Cornell University). https://doi.org/10.48550/arxiv.1809.02627

[9] Kasaei, M., Abreu, M., Lau, N., Pereira, A., & Reis, L. P. (2021). Robust biped locomotion using deep reinforcement learning on top of an analytical control approach. Robotics and Autonomous Systems, 146, 103900. https://doi.org/10.1016/j.robot.2021.103900

[10] Katyal, K., Wang, I. J., & Burlina, P. (2017). Leveraging deep reinforcement learning for reaching robotic tasks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. https://doi.org/10.1109/CVPRW.2017.71

[11] Kohl, N., & Stone, P. (2004). Policy gradient reinforcement learning for fast quadrupedal locomotion. Proceedings of the IEEE International Conference on Robotics and Automation. https://doi.org/10.1109/robot.2004.1307456

[12] Körber, M., Lange, J., Rediske, S., Steinmann, S., & Glück, R. (2021). Comparing popular simulation environments in the scope of robotics and reinforcement learning. arXiv preprint arXiv:2103.04616. https://doi.org/10.48550/arXiv.2103.04616

[13] Lele, A. S., Fang, Y., Ting, J., & Raychowdhury, A. (2020). Learning to walk: Spike-based reinforcement learning for hexapod robot central pattern generation. 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), 208–212. https://doi.org/10.1109/AICAS48895.2020.9073987

[14] Li, Z., Cheng, X., Peng, X. B., Abbeel, P., Levine, S., Berseth, G., & Sreenath, K. (2021). Reinforcement learning for robust parameterized locomotion control of bipedalrobots. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2103.14295

[15] Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., Hoeller, D., Rudin, N., Allshire, A., Handa, A., & State, G. (2021). Isaac Gym: High Performance GPU-Based Physics Simulation for Robot Learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2108.10470

[16] Morimoto, J., & Cheng, G. (2004). A simple reinforcement learning algorithm for biped walking. Transactions of the Japan Society of Mechanical Engineers. https://doi.org/10.1299/jsmermd.2004.32_1

[17] Narang, Y., Sundaralingam, B., Macklin, M., Mousavian, A., & Fox, D. (2021). Sim-to-Real for Robotic Tactile Sensing via Physics-Based Simulation and Learned Latent Projections. 2021 IEEE International Conference on Robotics and Automation. https://doi.org/10.1109/icra48506.2021.9561969

[18] Naya, K., Kutsuzawa, K., Owaki, D., & Hayashibe, M. (2021). Spiking neural network discovers energy-efficient hexapod motion in deep reinforcement learning. IEEE Access, 9, 150345 – 150354. https://doi.org/10.1109/ACCESS.2021.3126311

[19] NVIDIA (n. d.). Isaac Gym Documentation. Retrieved from https://github.com/NVIDIA-Omniverse/IsaacGymEnvs

[20] NVIDIA (n. d.). Isaac Sim Documentation. Retrieved from https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsi m/overview.html

[21] Ouyang, W., Chi, H., Pang, J., Liang, W., & Ren, Q. (2021). Adaptive locomotion control of a hexapod robot via Bio- Inspired learning. Frontiers in Neurorobotics, 15. https://doi.org/10.3389/fnbot.2021.627157

[22] Rojas, M., Hermosilla, G., Yunge, D., & Farias, G. (2022). An easy to use deep reinforcement learning library for AI mobile robots in Isaac SiM. Applied Sciences, 12(17), 8429. https://doi.org/10.3390/app12178429

[23] Schilling, M., Konen, K., Ohl, F. W., & Korthals, T. (2020). Decentralized deep reinforcement learning for a distributed and adaptive locomotion controller of a hexapod robot. 2020 IEEE/RSJ International Conference on Intelligent Robots andSystems (IROS), 5335–5342. https://doi.org/10.1109/IROS45743.2020.9341754

[24] Shahriari, M., & Khayyat, A. A. (2013). Gait analysis of a six-legged walking robot using fuzzy reward reinforcement learning. 2013 13th Iranian Conference on Fuzzy Systems (IFSC), 1–4. https://doi.org/10.1109/IFSC.2013.6675621

[25] Yang, T., Zhang, T., Luu, L., Ha, S., Tan, J., & Yu, W. (2022). Safe reinforcement learning for legged locomotion. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2203.02638

[26] Youcef, Z., & Pierre, C. (2004). Control of the trajectory of a hexapod robot based on distributed Q-learning. 2004 IEEE International Symposium on Industrial Electronics, vol. 1. 277–282. https://doi.org/10.1109/isie.2004.1571820

[27] Zamora, I., Lopez, N. G., Vilches, V. M., & Cordero, A. H. (2016). Extending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS and Gazebo. arXiv (Cornell University). https://doi.org/10.48550/arXiv.1608.05742