In March of 2016, Google’s AlphaGo beat the world champion Lee Sedol at the game of Go, a feat hailed as an important milestone for Artificial Intelligence (AI). It was also a big deal with Deep Learning and Reinforcement Learning. But what was the big deal?
Let’s start with a simple game of Noughts and Crosses, also known as Tic-Tac-Toe. A game played on a 3×3 grid by 2 players placing O (noughts) and X (crosses) in turns with the objective of getting 3 noughts or crosses in a row.
Naive counting leads to 19,683 possible board layouts (39 since each of the nine spaces can be X, O or blank), and 362,880 (i.e., 9!) possible games (different sequences for placing the Xs and Os on the board). – Wikipedia
Now we (i.e. humans) play the game without enumerating all possible board layouts or exploring all possible games. However, that is how computers are typically programmed to play the game. After each move, computers would generate the tree of moves, where each branch represents the sequence of moves. The computer generates the tree to a certain depth and then identify the ‘branches’ most likely to lead to a victory, and selects that as its next move. The process is repeated after the other players make a move, until the computer or human wins. This brute-force search and prune strategy are fundamentally how Deep Blue beat Garry Kasporav in 1997, and the game of chess has about 1043 number of legal positions.
Now let us look at the game of Go.
There are 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000 possible positions—that’s more than the number of atoms in the universe, and more than a googol times larger than chess. – Google Blog
With 10171 possible positions in Go (I counted the zeros), a brute-force search used by Deep Blue to beat Kasparov in chess, just won’t work with Go. So AlphaGo had to use intuition when selecting between moves, a bit like the way humans do.
In this context, intuition means working with limited information and taking a shortcut in selecting only a tiny subset of options to arrive at a good move. The idea is similar to the idea of “thin-slicing” that Malcolm Gladwell discusses in his book Blink [ kindle ]. Intuition is a function of experience and practice – leading to both desirable efficiencies and undesirable biases into our daily decision making, but that is a different post for another day.
During the match against Fan Hui, AlphaGo evaluated thousands of times fewer positions than Deep Blue did in its chess match against Kasparov4; compensating by selecting those positions more intelligently, using the policy network, and evaluating them more precisely, using the value network—an approach that is perhaps closer to how humans play. Furthermore, while Deep Blue relied on a handcrafted evaluation function, the neural networks of AlphaGo are trained directly from gameplay purely through general-purpose supervised and reinforcement learning methods. – Nature Paper
This means, that given data for supervised learning (i.e. deep learning) and time to practice (i.e. reinforcement learning) AlphaGo could continue to get better and better, and not rely on human experts to come up with heuristics or evaluation functions.
This was an extremely promising sign for Artificial Intelligence (AI) and getting us one tiny tiny (did I mention tiny?) step towards general purpose AI. Deep learning is used to reduce the feature engineering that would typically go in setting up a machine learning algorithm. Reinforcement learning mimics the idea of practice or experimentation that we as humans use in learning how to play tennis, play an instrument or make a drawing.
Learning to mimic intuition, without explicit feature engineering with deep learning and practicing it via reinforcement learning offers a very interesting template for teaching machines how to deal with kind of problems that humans excel at with their intuition. It also represents a way for machines to operate on large search space problems (i.e. traveling salesman problems, scheduling, and optimizations) and get reasonably good solutions given a fair trade-off of time and compute resources. This is why AlphaGo is a big deal.
Credits & References: