Monte-Carlo Tree Search
Intro
Monte-Carlo Tree Search (MCTS) is an algorithm you can use to explore a tree of states by sampling random episodes with a guidance policy that can be thought of like an heuristic. When we sample a lot of episodes, many of them will have the same beginning (because the number of possiblilities grows exponentially in the number of timesteps, depending on the state branching factor) and theirfore the beginning of episodes is probably shared by multiple episodes.
You can think of Monte-Carlo Tree Search as generating the state-tree with guidance of a policy $\pi^s$, similar to heuristic-driven search like $A^{*}$.
Monte-Carlo Tree Search Algorithm:
- Node selection:
- use a (stochastic) selection policy $\pi^{s}$ to select nodes until we are at a node with no children
- Expansion:
- expand this node (add one or more children according to the policy $\pi^{s}$)
- Simulation:
- sample one episode according to $\pi$ (could be another policy than $\pi^s$)
- observe the value/reward (could be the reward or outcome of a game, i.e. if you won or lost)
- Backpropagation:
- update $V^{\pi}$ (or $Q^{\pi}$) for all nodes along the trajectory starting from the bottom (= bootstrapping)
- i.e. start with $\Delta V^{\pi}(s_8) = r_9 + \gamma V^{\pi}(s_9)$ for updating the value of state $s_8$
Parallelization
There are three main approaches to parallelization, and the one that sounds most interesting to me (because of simplicity) is number one:
leaf parallelization
:- execute multiple playouts (=rollouts) from one leaf node in parallel.
For completeness, the other two approaches are:
root parallelization
:- build independent trees in parallel and use all the roots (e.g. mean of all $s’$ one step from the root with $S_t = s_{\text{root}}, S_{t+1} = s’$) to make the final decision
- probably takes too much memory for a small project
tree parallelization
:- build one tree in parallel, using mutually exclusive write-access (i.e. a mutex)
- this is hard to implement
References
- Thumbnail from Wikipedia: MCTS_Algorithm
- Tree example from Scheffer, T. Intelligent Data Analysis 2: Reinforcement Learning 2
- Sutton & Barto: Reinforcement Learning
- Parallelization techniques from Wikipedia: Monte Carlo tree search.