Maskable PPO

ACTION-MASKED REINFORCEMENT LEARNING TECHNOLOGY FOR ORDER SCHEDULING

The problem of high-performance and efficient order scheduling is a common combinatorial optimization problem in various industrial contexts. Creation of a model capable of generating schedules balanced in terms of quality and computational time poses a significant challenge due to the large action space. This study proposes a high-performant environment and a reinforcement learning model for allocating orders to resources using a mechanism of invalid action masking.