David Tolpin, Jan Willem van de Meent, Brooks Paige, Frank Wood University of Oxford Oxford, UK
Output-Sensitive Adaptive Metropolis-Hastings for Probabilistic Programs
Preliminaries
Probabilistic Program
• A program with random computations.
• Distributions are conditioned by `observations'.
• Values of certain expressions are `predicted' — the output.
(let [;; Model
dist (sample (categorical [[normal 1/2] [gamma 1/2]]))
a (sample (gamma 1 1))
b (sample (gamma 1 1))
d (dist a b)]
;; Observations
(observe d 1) (observe d 2)
(observe d 4) (observe d 7)
;; Explanation
(predict :d (type d))
(predict :a a) (predict :b b)))
Inference Objectives
• Suggest most probable explanation (MPE) - most likely assignment
for all non-evidence variable given evidence.
• Approximately compute integral of the
form $\Phi=\int_{-\infty}^{\infty} \varphi(x)p(x) dx$
• Continuously and infinitely generate a sequence of samples. ✓
Lighweight Metropolis-Hastings (LMH)
$\mathcal{P}$ — probabilistic program.
$\pmb{x}$ — random variables.
$\pmb{z}$ — output.
Run $\mathcal{P}$ once, remember $\pmb{x}, \pmb{z}$.
Loop
Uniformly select $x_i$.
Propose a value for $x_i$.
Run $\mathcal{P}$, remember $\pmb{x'}, \pmb{z'}$.
Accept ($\pmb{x,z}=\pmb{x',z'}$)
or reject with MH probability.
Output $\pmb{z}$.
end loop
Can we do better?
References
Christophe Andrieu and Johannes Thoms. A tutorial on adaptive MCMC. Statistics and Computing, 18(4):343–373, 2008.
B. Paige, F. Wood, A. Doucet, and Y.W. Teh. Asynchronous anytime sequential Monte Carlo. In NIPS-2014, to appear.
David Wingate, Andreas Stuhlmu ̈ller, and Noah D. Goodman. Lightweight implementations of probabilistic programming languages via transformational compilation. In Proc. of AISTATS-2011.
Frank Wood, Jan Willem van de Meent, and Vikash Mansinghka. A new approach to probabilistic programming inference. In AISTATS-2014.
100 16-dimensional observations,
500 samples after 10,000 samples of burn-in.
Acknowledgments
This material is based on research sponsored by DARPA through the U.S.
Air Force Research Laboratory under Cooperative Agreement number
FA8750-14-2-0004.
Metropolis Hastings with Adaptive Scheduling
• Selects each $x_i$ with a different probability.
• Maintains vector of weights $\pmb{W}$ of random choices:
Initialize $\pmb{W}^0$ to a constant.
Run $\mathcal{P}$ once.
for $t = 1 \ldots \infty$
Select $x^t_i$ with probability $\alpha^t_i= {W_i^t} / \sum\limits_{i=1} ^{|\pmb{x}^t|} W_i^t$.
Propose a value for $x^t_i$.
Run $\mathcal{P}$, accept or reject with MH probability.
if accepted
Compute $\pmb{W}^{t+1}$ based on the program output.else
$\pmb{W}^{t+1} \gets \pmb{W}^{t}$
end ifend for
Quantifying the Influence
• Objective: faster convergence of program output $\pmb{z}$.
• Adaptation parameter: probabilities of selecting random choices for modification.
• Optimization target: maximize the change in the program output: $$R^t = \frac 1 {|\pmb{z}^t|}\sum_{k=1}^{|\pmb{z}^t|} \pmb{1}(z_k^t \ne z_k^{t-1}).$$
$W_i$ reflects the anticipated change in $\pmb{z}$ from modifying $x_i$.
• For each $x_i$, reward $r_i$ and count $c_i$ are kept.
• A history of modified random choices is attached to every $z_j$.
When modification of $x_k$ accepted:
Append $x_k$ to the history.
if $\pmb{z}^{t+1} \ne \pmb{z}^{t}$
$w \gets \frac 1 {|history|}$
for $x_m$ in history
$\overline r_m \gets r_m + w,\; c_m \gets c_m + w$
end for
Flush the history.
else
$c_k \gets c_k + 1$
end if
Convergence:
For any partitioning of $\pmb{x}$, Adaptive LMH selects variables from each partition with non-zero probability.
Contributions
• A scheme of rewarding random samples based on program output.
• An approach to propagation of sample rewards to MH proposal scheduling parameters.
• An application of this approach to LMH, where the probabilities of selecting each variable for modification are adjusted.
Future Work
• Extending the adaptation approach to other sampling methods.
• Reward scheme that takes into account the amount of difference
between samples.
• Acquisition of dependencies between predicted expressions and
random variables.