The Impact of MuZero and How AI Could Change Strategy Making
Until a few years ago, the only AI agent capable of strategy making was OpenAI’s multi-agent system. OpenAI Five was capable of playing the complex strategy game Dota2 at superhuman levels. While the OpenAI Five engineers coded the various features, rules and reward functions of the game, it learned by playing against itself; 180 years per day. It was a remarkable experience in collaboration and strategy making by an AI agent.
Recently, the engineers of Deepmind went a step further in the development of a general-purpose algorithm.
Five years in the making, starting with AlphaGo in 2016, at the end of 2020, Deepmind published a paper in Nature describing a computer program that can learn to play games without knowing the rules. This ground-breaking development will most likely be as fundamental as the development of AlphaGo back in 2016. The algorithm can plan winning strategies in unknown environments and can simply learn by doing.
The algorithm, called MuZero, managed to outperform all prior algorithms in the 57 Atari games while matching the superhuman capabilities of AlphaGo in games like Go, Chess or Shogi. These prior algorithms all relied on knowledge embedded by the developers on the dynamics and rules of the environment.
Reinforcement Learning Capabilities
The earlier algorithms are not very good at transfer learning as one game’s environment is very different from the other. Consequently, it would not perform well in complex environments which are hard to grasp in simple rules, such as real-life environments.
MuZero is a significant step forward as it can develop winning strategies in a completely unknown environment. Deepmind’s research showed that its playing strength increased as the time available to plan a move increased as it could consider more simulations per move.
The extensive reinforcement learning capabilities of MuZero shows that algorithms can also learn from messy real-life environments when the rules are not very clear. This could help organisations tackle new (strategic) challenges ranging from logistics, manufacturing, robotics to self-driving cars. The better we become at creating algorithms that can learn from their environment, the closer we get to general-purpose algorithms (which is still a far shot from Artificial General Intelligence).
The Impact of AI on Strategy
The better algorithms become, the more significant the impact it will have on organisations. AI can significantly impact simple environments, when the rules are clear or at least easy to learn. Such environments include, for example, warehouses where robots can be employed for products picking autonomously. Within a warehouse, the options for an algorithm are limited, and the boundaries are clear, enabling organisations to automate their processes.
However, suppose we want to apply AI in strategy-making. In that case, it becomes a lot more difficult as the rules are often not very clear, and there are many dependencies, i.e. it is a fuzzy environment. Executives that need to make strategic decisions need to base those decisions on many information streams, ranging from customer data, changes in the market or supply chains, macroeconomic conditions, etc. With the information overload that we already experience, making the most optimal decisions can be difficult.
With the advances made by Deepmind with MuZero, applying AI within strategy-making can become more common. Although, I would avoid having AI make autonomous strategic decisions. Instead, organisations should focus on human-machine symbiosis, where AI gives recommendations, and humans make the decisions. Strategy-making is, therefore, a clear example where human-machine collaboration could thrive, resulting in better strategic decisions and outcomes for the business.
Developing AI agents with planning capabilities in complex environments with many unknowns is one of the holy grails of artificial intelligence. The development of MuZero shows that it is indeed possible to develop such an AI agent in controlled, but complex environments. The next steps of MuZero would be to apply the innovation to real-world cases. And as we saw with AlphaGo, we can expect some fascinating applications of MuZero in the years to come.
If I managed to retain your attention to this point, please leave a comment or subscribe to my weekly newsletter to receive more of this content:
Dr Mark van Rijmenam is The Digital Speaker and he offers inspirational (virtual) keynotes on the future of work, either in-person, as an avatar or as a hologram, bringing your event to the next level: