4/7/2022: CVRP Through Reinforcement Learning, Julio Godoy, University of Concepción
Reinforcement learning (RL) is a branch of machine learning which is gaining more attraction from the research community, as well as from the industry, given its recent application in challenging problems (nuclear fusion stabilization, protein folding, etc..).
RL requires that the problem to be applied on is formulated as a Markov Decision Problem. An RL agent is situated in one of many possible states, it has a set of actions at its disposal and receives rewards for its behavior. The goal of an RL is to maximize its sum of rewards through interaction with the environment. There are many types of RL algorithms, and a common distinction is between value-based and policy-based methods. Value-based methods try to estimate a value for each state or state/action pair and use these values to compute a policy (which defines its behavior). A policy-based method directly modifies an initial policy, aiming at increasing the rewards that the agent receives. Recent advances in the field of Deep Learning have allowed RL to scale to large problems. Specifically, it has also been applied for combinatorial optimization problems such as the CVRP (Lu et al 2020). In Lu et al., the authors formulate the problem of choosing the best improvement operator to apply to a given CVRP solution as an RL problem and apply a specific algorithm called REINFORCE for this task. Results show that using the proposed approach, the authors have achieved a new state-of-the-art result for the CVRP of 20, 50, and 100 locations, while using much less compute time than existing heuristic approaches.
Reference: Hao Lu, Xingwen Zhang and Shuang Yang, “A learning-based iterative method for solving vehicle routing problems”. International Conference on Learning Representations, 2020