A Deep Reinforcement Learning Methods based on Deterministic Policy Gradient for Multi-Agent Cooperative Competition

Xuan Zuo


Deep reinforcement learning in multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in single agent settings. This paper considers deterministic policy gradient algorithms for multi-agent control and expands the Actor-Critic method that consider action policies of other agents to cooperative competition mission. Our method is applicable not only to cooperative settings with shared rewards, but also adversarial settings including attacking and defending. In the computer experiments, agents are divided into attacking agents and defending agents. The results show that attacking agents which play the roles of deceivers can attract most of defending agents and help the other attacking agents to reach their targets successfully. Choosing appropriate length of training could help agents learn better action policy. The experiments results reveal that the number of agents has an effect on the performance of our proposed method. Increasing the number of deceivers in attacking agents can significantly increase the mission success of attacking party, but the computational complexity will rise and more episodes are needed to train agents.


Machine learning, reinforcement learning, multi-agent, cooperative competition, artificial intelligence.

Full Text: PDF