A constraint satisfaction problem (CSP) is defined by:
The net VOI $V(S_j)$ of action $S_j$ is the intrinsic VOI $\Lambda_j$ less the cost of $S_j$: \[V(S_j)=\Lambda(S_j)-C(S_j)\]
The intrinsic VOI $\Lambda(S_j)$ is the expected difference between the intrinsic expected utilities of the new and the old selected base-level action, computed after the meta-level action is taken: \[\Lambda(S_j)=\mathbb{E}\left(\mathbb{E}(U(A_\alpha^j))-\mathbb{E}(U(A_\alpha))\right)\]
CSP benchmarks from CSP Solver Competition 2005 were used. 14 benchmarks were solved for $\gamma=0$ and the exponential range $\gamma \in \{10^{-7},$ $10^{-6},$ $...,$ $1\}$, as well as with the minimum-conflicts heuristic and the pAC heuristic.
The maximum improvement is achieved when the solution count is estimated only in a small fraction of occasions selected using rational metareasoning.
Based on the results on benchmarks, we chose $\gamma=10^{-3}$, and applied it to two sets of 100 problem instances. Exhaustive deployment, rational deployment, the minimum conflicts heuristic, and the pAC heuristic were compared.
The value of $\gamma$ chosen based on a small set of hard instances gave good results on a set of instances with different parameters and of varying hardness.
Value ordering heuristics provide information about:
The expected remaining search time in the subtree under $X_k$ for ordering $\omega$ is given by: \[T^{s|\omega}=T_{\omega(1)}+\sum_{i=2}^{|D_k|}T_{\omega(i)}\prod_{j=1}^{i-1}p_{\omega(j)}\]
The intrinsic VOI $\Lambda_i$ of invoking the heuristic can be approximated as: \[\Lambda_i\approx \mathbb{E}\left[(T_1-T_i)|D_k|\,\Big|\; T_i < T_1 \right]\]