Main Results
        The SR+CR MCTS Scheme
        
          - • Selects an action at the current root
              suitable for minimizing the simple regret.
          
 
          - • Then selects actions according to UCB, that approximately minimizes the cumulative regret.
 
        
         1 Rollout(node, depth=1)
 2   if IsLeaf(node, depth)
 3     return 0
 4   else
 5     if depth=1 then action $\gets$ FirstAction(node)
 6     else action $\gets$ NextAction(node)
 7     next $\gets$ NextState(node, action)
 8     reward $\gets$ Reward(node, action, next)
 9               + Rollout(next, depth+1)
10    UpdateStats(node, action, reward)
11    return reward
        Sampling for Simple Regret
        
          - $\varepsilon$-greedy sampling $\left(\varepsilon=\frac 1 2\right)$.
 
          - Modified version of UCB (optimized for simple regret).
 
          - VOI-aware sampling:
          $$VOI_\alpha\approx\frac {\overline X_\beta} {n_\alpha+1}\exp\left(-2(\overline X_\alpha - \overline X_\beta)^2 n_\alpha\right)$$
$$VOI_i\approx\frac {1-\overline X_\alpha} {n_i+1}\exp\left(-2(\overline X_\alpha - \overline X_i)^2 n_i\right),\; i\ne\alpha$$