论文标题
在分散合作多机构增强学习中学习公平的政策
Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning
论文作者
论文摘要
我们考虑(深)合作多代理增强学习(MARL)中学习公平政策的问题。我们以原则性的方式将其正式化为优化福利功能的问题,该功能明确编码了公平的两个重要方面:效率和公平性。作为解决方案方法,我们提出了一种新型的神经网络体系结构,该结构由专门考虑公平的两个方面设计的两个子网络组成。在实验中,我们证明了两个子网络对公平优化的重要性。我们的整体方法是一般的,因为它可以容纳任何(子)可区分的福利功能。因此,它与文献中提出的各种公平概念兼容(例如,词素最大化,广义的Gini社会福利功能,比例公平)。我们的解决方案方法是通用的,可以在各种MAL设置中实现:集中式培训和分散执行或完全分散。最后,我们在实验中验证了我们在各个领域的方法,并表明它可以比以前的方法更好。
We consider the problem of learning fair policies in (deep) cooperative multi-agent reinforcement learning (MARL). We formalize it in a principled way as the problem of optimizing a welfare function that explicitly encodes two important aspects of fairness: efficiency and equity. As a solution method, we propose a novel neural network architecture, which is composed of two sub-networks specifically designed for taking into account the two aspects of fairness. In experiments, we demonstrate the importance of the two sub-networks for fair optimization. Our overall approach is general as it can accommodate any (sub)differentiable welfare function. Therefore, it is compatible with various notions of fairness that have been proposed in the literature (e.g., lexicographic maximin, generalized Gini social welfare function, proportional fairness). Our solution method is generic and can be implemented in various MARL settings: centralized training and decentralized execution, or fully decentralized. Finally, we experimentally validate our approach in various domains and show that it can perform much better than previous methods.