Output-Sensitive Adaptive Metropolis Hastings for Probabilistic Programs

Preliminaries

Probabilistic Program

A program with random computations.
Distributions are conditioned by `observations'.
Values of certain expressions are `predicted'.

(let [;; Model
      dist (sample (categorical 
                    [[normal 1/2]
                     [gamma 1/2]]))
      a (sample (gamma 1 1))
      b (sample (gamma 1 1))
      d (dist a b)]
    ;; Observations
    (observe d 1) (observe d 2)
    (observe d 4) (observe d 7)
    ;; Explanation
    (predict :d (type d))
    (predict :a a) (predict :b b)))

Inference Objectives

Continuously and infinitely generate a sequence of samples.
Approximately compute integral of the form $\Phi=\int_{-\infty}^{\infty} \varphi(x)p(x) dx$
Suggest most probable explanation (MPE) - most likely assignment to random variables. ✓

Canadian Traveller Problem

CTP is a problem finding the shortest travel distance in a graph where some edges may be blocked.

Given

Undirected weighted graph $G=(V,E)$.
The initial and the final location nodes $s$ and $t$.
Edge weights $w: E\rightarrow\mathcal{R}$.
Traversability probabilities: $p_o: E\rightarrow(0,1]$.

find the shortest travel distance from $s$ to $t$ — the sum of weights of all traversed edges.

References

Frank Wood, Jan Willem van de Meent, and Vikash Mansinghka. A new approach to probabilistic programming inference. In AISTATS-2014.
Bar-Noy, A., and Schieber, B. 1991. The Canadian trav- eller problem. In Proc. of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’91, 261–270.
Eyerich, P.; Keller, T.; and Helmert, M. 2010. High-quality policies for the Canadian traveler’s problem. In Proc. of AAAI-2010.
Shimony, S. E., and Charniak, E. 1991. A new algorithm for finding MAP assignments to belief networks. In Proc. of UAI ’90, 185–196.
Kappen H.J., Gómez V., Opper M. Optimal control as a graphical model inference problem. Machine Learning, vol. 87, no. 2, pp. 159-182, 2012

Case Study: Canadian Traveller Problem

Policy

Depth-first search based policy:

the agent traverses $G$ in depth-first order.
the policy specifies the probabilities of selecting each adjacent edge in every node.

Probabilistic Program

require CTP$(G,s,t,w,p)$ for $v \in V$ $policy(v)$ $\gets$ Draw(Dirichlet($\pmb{1}^{\deg(v)}$)) end for repeat $instance$ $\gets$ Draw(CTP$(G,w,p)$) ($reached$, $distance$) $\gets$ StDFS($instance$, $policy$) until $reached$ Observe(1, Bernoulli$\left(e^{-distance}\right)$) Print($policy$)

Learned Policies

Acknowledgments

This material is based on research sponsored by DARPA through the U.S. Air Force Research Laboratory under Cooperative Agreement number FA8750-14-2-0004.

Policy Learning through Probabilistic Inference

Policy is a probabilistic model:

require $agent, Instances, Policies.$ $instance$ $\gets$ Draw($Instances$) $policy$ $\gets$ Draw($Policies$) $cost$ $\gets$ Run($agent$, $instance$, $policy$) Observe(1, Bernoulli$(e^{-cost}))$ Print($policy$)

The log probability of the output policy is $$\log p_{\mathcal{P}}(policy) = -cost(policy) + \log p_{Policies}(policy) + C$$ When policies are drawn uniformly $$\log p_{\mathcal{P}}(policy) = - cost(policy) + C'$$

Connection between MAP and Shortest Path

In a probabilistic program $\mathcal{P}$:

$F_i$ is the prior distribution of random variable $x_i$.
$G_j$ is the conditional distribution of observation $y_j$.

Maximizing the (logarithm of) trace probability $$\log p_{\mathcal{P}}(\pmb{x}) = \sum_{i=1}^{\left|\pmb{x}\right|} \log p_{F_i}(x_i) + \sum_{j=1}^{\left|\pmb{y}\right|}\log p_{G_j}(y_{j}) + C$$ corresponds to finding the shortest path in a graph $G=(V,E)$:

$V=\{(F_i, x_i)\} \cup \{(G_j, y_j)\}$.
Edge costs are $-\log p_{F_i}(x_i)$ or $-\log p_{H_j} (y_j)$.

Marginal MAP as Policy Learning

In Marginal MAP, assignment of a part of the trace $\pmb{x}^\theta$ is inferred.

Determining $\pmb{x}_{MAP}^\theta$ corresponds to learning a policy $\pmb{x}^\theta$ which minimizes the expected path length \begin{equation*} \pmb{x}^\theta=\arg \min_{\pmb{x}^\theta}\mathbb{E}_{\pmb{x}\setminus\pmb{x}^\theta}\left[-\sum_{i=1}^{\left|\pmb{x}^\theta\right|} \log p_{F_i^\theta}(x_i^\theta) -\sum_{j=1}^{\left|\pmb{y}\right|}\log p_{G_j}(y_{j})\right] \end{equation*}

By constructing an appropriate probabilistic model, policies can be learned by probabilistic inference.

Path Finding under Uncertainty through Probabilistic Inference

Preliminaries

Probabilistic Program

Inference Objectives

Canadian Traveller Problem

References

Case Study: Canadian Traveller Problem

Policy

Probabilistic Program

Learned Policies

Acknowledgments

Policy Learning through Probabilistic Inference

Connection between MAP and Shortest Path

Marginal MAP as Policy Learning

Contributions

Future Work