Logical Team Q-learning: An approach towards factored policies in cooperative MARL
Reinforcement learning (RL) is a powerful machine learning paradigm that studies the interaction between a single agent with an unknown environment. The way this interaction works is as follows: at any given time the agent finds itself in a given state and has a number of actions to choose from, it then chooses an action and as a consequence the environment provides a reward and a new state to which the agent transitions to. This interaction may go on forever or for a limited amount of time. The goal of the agent is to learn a policy (a policy is a function that determines what actions are chosen in each of the possible states) so as to maximize the long term cumulative rewards. A plethora of applications fit into the RL framework, however, in many cases of interest, a team of agents will need to interact with the environment and with each other to achieve a common goal. This is the object study of collaborative multi-agent RL (MARL). Addressing the problem of collaborative MARL introduces many challenges, one of which is that the number of possible actions that the team can choose from grows exponentially with the number of agents. Due to this exponential growth of the joint action set, learning a "joint team policy" using conventional single-agent RL algorithms becomes unfeasible. Therefore, it is necessary to rely on learning factored policies instead. In this talk, I will further clarify the challenge of learning factored policies in cooperative MARL, explain why it is an important problem to study and I will introduce the Logical Team Q-learning algorithm, which is one possible solution to this problem.
Zoom video available at: https://uniroma1.zoom.us/j/83592522402?pwd=Yzhuak4zQnYvNGthQlZCeGdjWTBkUT09
Lucas Cassano received his Electronics Engineer degree from Buenos Aires Institute of Technology in 2013 and then joined Satellogic as a full-time engineer developing and implementing star tracker algorithms for micro-satellites. Afterwards he went to UCLA where he received his M.S. and Ph.D. degrees in 2015 and 2020, respectively. After obtaining his PhD he joined EPFL a postdoctoral researcher. The main focus of his academic research has been cooperative multi-agent reinforcement learning. Currently, he holds an Applied Scientist position at Amazon where he works on ranking.