David Tolpin, Jan Willem van de Meent, Brooks Paige, Frank Wood
University of Oxford Oxford, UK
Path Finding under Uncertainty through Probabilistic Inference
Preliminaries
Probabilistic Program
A program with random computations.
Distributions are conditioned by `observations'.
Values of certain expressions are `predicted'.
(let [;; Model
dist (sample (categorical
[[normal 1/2]
[gamma 1/2]]))
a (sample (gamma 1 1))
b (sample (gamma 1 1))
d (dist a b)]
;; Observations
(observe d 1) (observe d 2)
(observe d 4) (observe d 7)
;; Explanation
(predict :d (type d))
(predict :a a) (predict :b b)))
Inference Objectives
Continuously and infinitely generate a sequence of samples.
Approximately compute integral of the
form $\Phi=\int_{-\infty}^{\infty} \varphi(x)p(x) dx$
Suggest most probable explanation (MPE) - most likely assignment to random variables. ✓
Canadian Traveller Problem
CTP is a problem finding the shortest travel distance in a graph
where some edges may be blocked.
Given
Undirected weighted graph $G=(V,E)$.
The initial and the final location nodes $s$ and $t$.
find the shortest travel distance from $s$ to $t$ — the sum
of weights of all traversed edges.
References
Frank Wood, Jan Willem van de Meent, and Vikash Mansinghka. A new approach to probabilistic programming inference. In AISTATS-2014.
Bar-Noy, A., and Schieber, B. 1991. The Canadian trav- eller problem. In Proc. of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’91, 261–270.
Eyerich, P.; Keller, T.; and Helmert, M. 2010. High-quality policies for the Canadian traveler’s problem. In Proc. of AAAI-2010.
Shimony, S. E., and Charniak, E. 1991. A new algorithm for finding MAP assignments to belief networks. In Proc. of UAI ’90, 185–196.
Kappen H.J., Gómez V., Opper M. Optimal control as a graphical model inference problem. Machine Learning, vol. 87, no. 2, pp. 159-182, 2012
Case Study: Canadian Traveller Problem
Policy
Depth-first search based policy:
the agent traverses $G$ in depth-first order.
the policy specifies the probabilities of selecting each adjacent edge in every node.
Probabilistic Program
require CTP$(G,s,t,w,p)$
for $v \in V$
$policy(v)$ $\gets$ Draw(Dirichlet($\pmb{1}^{\deg(v)}$))
end for
repeat
$instance$ $\gets$ Draw(CTP$(G,w,p)$)
($reached$, $distance$) $\gets$ StDFS($instance$, $policy$)
until $reached$
Observe(1, Bernoulli$\left(e^{-distance}\right)$)
Print($policy$)
Learned Policies
Acknowledgments
This material is based on research sponsored by DARPA through the U.S.
Air Force Research Laboratory under Cooperative Agreement number
FA8750-14-2-0004.
The log probability of the output policy is
$$\log p_{\mathcal{P}}(policy) = -cost(policy) + \log p_{Policies}(policy) + C$$
When policies are drawn uniformly
$$\log p_{\mathcal{P}}(policy) = - cost(policy) + C'$$
Connection between MAP and Shortest Path
In a probabilistic program $\mathcal{P}$:
$F_i$ is the prior distribution of random variable $x_i$.
$G_j$ is the conditional distribution of observation $y_j$.
Maximizing the (logarithm of) trace probability
$$\log p_{\mathcal{P}}(\pmb{x}) = \sum_{i=1}^{\left|\pmb{x}\right|}
\log p_{F_i}(x_i) +
\sum_{j=1}^{\left|\pmb{y}\right|}\log p_{G_j}(y_{j}) + C$$
corresponds to finding the shortest path in a graph $G=(V,E)$:
$V=\{(F_i, x_i)\} \cup \{(G_j, y_j)\}$.
Edge costs are $-\log p_{F_i}(x_i)$ or $-\log p_{H_j} (y_j)$.
Marginal MAP as Policy Learning
In Marginal MAP, assignment of a part of the trace $\pmb{x}^\theta$
is inferred.
Determining $\pmb{x}_{MAP}^\theta$ corresponds to learning a
policy $\pmb{x}^\theta$ which minimizes the expected path length
\begin{equation*}
\pmb{x}^\theta=\arg \min_{\pmb{x}^\theta}\mathbb{E}_{\pmb{x}\setminus\pmb{x}^\theta}\left[-\sum_{i=1}^{\left|\pmb{x}^\theta\right|} \log p_{F_i^\theta}(x_i^\theta)
-\sum_{j=1}^{\left|\pmb{y}\right|}\log p_{G_j}(y_{j})\right]
\end{equation*}
By constructing an appropriate probabilistic model, policies can be learned by probabilistic inference.
Contributions
Bilateral correspondence between probabilistic inference and policy learning for path finding.
A new approach to policy learning.
A realization of the approach for the Canadian traveller problem
Future Work
Online policy learning.
Better (faster and more robust) inference algorithms for MMAP.