In a previous post, I summarized the Bayesian approach to model comparison, which requires the calculation of the Bayes factor between two models. Here I will show one computational approach that I use called thermodynamic integration borrowed from molecular dynamics. Recall, that we need to compute the model likelihood function
for each model where is just the parameter dependent likelihood function we used to find the posterior probabilities for the parameters of the model.
The integration over the parameters can be accomplished using the Markov Chain Monte Carlo, which I summarized previously here. We will start by defining the partition function
where is an inverse temperature. The derivative of the log of the partition function gives
which is equal to the ensemble average of . However, if we assume that the MCMC has reached stationarity then we can replace the ensemble average with a time average . Integrating (3) over from 0 to 1 gives
From (1) and (2), we see that , which is what we want to compute and .
Hence, to perform Bayesian model comparison, we simply run the MCMC for each model at different temperatures (i.e. use as the likelihood in the standard MCMC) and then integrate the log likelihoods over at the end. For a Gaussian likelihood function, changing temperature is equivalent to changing the data “error”. The higher the temperature the larger the presumed error. In practice, I usually run at seven to ten different values of and use a simple trapezoidal rule to integrate over . I can even do parameter inference and model comparison in the same MCMC run.
Erratum, 2013-5-2013, I just fixed an error in the final formula
8 thoughts on “Bayesian model comparison Part 2”
nice post though i don’t know many of the terms well. does remind me a (‘it-for’) bit of arxiv.org/abs/0901.4075v1 and arxiv.org/abs/1204.648
i do like using the idea of ‘temperature’ for some reason—‘getting hot i here’. i b(r)ought this up to some economists and they werent too sure. some say temperature is intuitively how we feel at times, or average kinetic energy, or a lagrangian multiplier (curve fitting parameter). yakovenko at u md says something its like the average income in economics — like bentham from the 1800’s). i’m agnostic.
ps i see yakoneko has a review written for foley arxiv.org/abs/1204.6483v1
Very nice post. In practice, does the chosen distribution of theta matter even though it is integrated out via P(theta|M)?
@ishi temperature is like the width of the distribution
@cheng The prior distribution for theta doesn’t matter. The formula assumes it is normalized correctly but you could account for it even if it isn’t.
@cheng Actually, if you don’t set ln Z(0)=0 then it doesn’t matter what the prior is as long as it is integrable.
refreshing memory, temperature says how many microstates are accessible–at T=0, just one, at T=infinity every particle can be in any microstate with equal probability. or, its the inverse of the change in entropy with respect to energy. the view of it as an ‘average’ basically says as everyone in lake wobegon rises up above average, they have more opportunities to be anything so long as the exponential form is maintained—- they can’t be anything all the time . (it might be nice to use some non-archimedian, surreal, or other logic however in which you have an expanding universe with evolving temperature in which people can be eveything all the time (multiverse in ‘the bottle’ (gil scott heron))).
Thanks a lot for this post! If a subset of my parameters belong to Dirichlet distribution (being bounded and having dependence on each other), is there any caution I need to take?
@Icnature The MCMC is not foolproof. If the parameters depend on each strongly so they sit on some low dimensional manifold then the MCMC will fail to converge. However, if you have a good prior then including that will help.