Enable JavaScript to see more content
Recently hyped ML content linked in one simple page
Sources: reddit/r/{MachineLearning,datasets}, arxivsanity, twitter, kaggle/kernels, hackernews, awesomedatasets, sota changes
Made by: Deep Phrase HK Limited
1


[1910.13012] Multiplayer AlphaZero
We define measures of success that can be applied in future AlphaZero research, and create an independent AlphaZero reimplementation with multiplayer modification
Abstract The AlphaZero algorithm has achieved superhuman performance in twoplayer, deterministic, zerosum games where perfect information of the game state is available. This success has been demonstrated in Chess, Shogi, and Go where learning occurs solely through selfplay. Many realworld applications (e.g., equity trading) require the consideration of a multiplayer environment. In this work, we suggest novel modifications of the AlphaZero algorithm to support multiplayer environments, and evaluate the approach in two simple 3player games. Our experiments show that multiplayer AlphaZero learns successfully and consistently outperforms a competing approach: Monte Carlo tree search. These results suggest that our modified AlphaZero can learn effective strategies in multiplayer game scenarios. Our work supports the use of AlphaZero in multiplayer games and suggests future research for more complex environments.
‹Figure 1: The change in neural network structure with novel multiplayer approach. (Multiplayer extensions)Figure 2: The state representation of a TicTacMo board passed into the neural network. Size is 3x5x6. Each player owns one piece location plane and turn indicator plane. Figure 3: The loss of our SENet steadily converges. Figure 4: State representation and loss curve for TicTacMo. (Multiplayer TicTacToe)Figure 5: AlphaZero and opponent scores accumulated over six games as opponent rollouts increase. AlphaZero’s rollouts remain fixed at 50, while its MCTS opponents use an increasing number of rollouts. The two MCTS agents have identical performance across each match. Figure 6: Score difference as opponent rollouts increase for AlphaZero and a control MCTS using the same number of rollouts. Score difference is the difference between our score and the opponent’s score. Score differences less than 0 indicate more games lost than won, score differences greater than 0 indicate more games won than lost, and score differences of 0 indicate equal wins and losses. Figure 7: TicTacMo experiments againsts MCTS opponents of increasing strength. (Multiplayer TicTacToe)Figure 8: The state representation of a Connect 3x3 board passed into the neural network. Size is 6x7x6. Each player owns one piece location plane and turn indicator plane. Figure 9: The loss of our SENet is stable. Figure 10: State representation and loss curve for Connect 3x3. (Multiplayer Connect 4)Figure 11: AlphaZero and opponent scores over six games as opponent rollouts increase. Figure 12: Score difference as opponent rollouts increase for AlphaZero and a control. Figure 13: Connect 3x3 experiments againsts MCTS opponents of increasing strength. (Multiplayer Connect 4)›



Related: TFIDF
[1903.01747] Towards Understanding Chinese Checkers with Heuristics, Monte Carlo Tree Search, and Deep Reinforcement Learning[1907.11703] Action Guidance with MCTS for Deep Reinforcement Learning[1903.12328] Improved Reinforcement Learning with Curriculum[1808.04794] Improving Hearthstone AI by Combining MCTS and Supervised Learning Algorithms[1904.05759] Safer Deep RL with Shallow MCTS: A Case Study in Pommerman[1904.03646] Policy Gradient Search: Online Planning and Expert Iteration without Search Trees[1802.05944] Monte Carlo Qlearning for General Game Playing[1812.00045] Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL[1903.09569] Monte Carlo Neural Fictitious SelfPlay: Approach to Approximate Nash equilibrium of ImperfectInformation Games[1806.00683] Deep Pepper: Expert Iteration based Chess agent in the Reinforcement Learning Setting

Related: TFIDF
[1903.01747] Towards Understanding Chinese Checkers with Heuristics, Monte Carlo Tree Search, and Deep Reinforcement Learning[1907.11703] Action Guidance with MCTS for Deep Reinforcement Learning[1903.12328] Improved Reinforcement Learning with Curriculum[1808.04794] Improving Hearthstone AI by Combining MCTS and Supervised Learning Algorithms[1904.05759] Safer Deep RL with Shallow MCTS: A Case Study in Pommerman[1904.03646] Policy Gradient Search: Online Planning and Expert Iteration without Search Trees[1802.05944] Monte Carlo Qlearning for General Game Playing[1812.00045] Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL[1903.09569] Monte Carlo Neural Fictitious SelfPlay: Approach to Approximate Nash equilibrium of ImperfectInformation Games[1806.00683] Deep Pepper: Expert Iteration based Chess agent in the Reinforcement Learning Setting