As I have posted before, I never learned any statistics during my education as a theoretical physicist/applied mathematician. However, it became fairly apparent after I entered biology (although I managed to avoid it for a few years) that fitting models to data and estimating parameters is unavoidable. As I have opined multiple times previously, Bayesian inference and the Markov Chain Monte Carlo (MCMC) method is the best way to do this. I thought I would provide the details on how to implement the algorithm since most of the treatments on the subject are often couched in statistical language that may be hard to decipher for someone with a physics or dynamical systems background. I’ll do this in a series of posts and in a nonstandard order. Instead of building up the entire Bayesian formalism, I thought I would show how the MCMC method can be used to do the job and then later show how it fits into the grand scheme of things to do even more.
Suppose you have data , which consists of some measurements (either a scalar or a vector) at discrete time points and a proposed model, which produces a time series , where represents a set of free parameters that changes the function. This model could be a system of differential equations or just some proposed function like a polynomial. The goal is to find the set of parameters that best fits the data and to evaluate how good the model is. Now, the correct way to do this is to use Bayesian inference and model comparison, which can be computed using the MCMC. However, the MCMC can also be used just to get the parameters in the sense of finding the best fit according to some criterion.
The first thing you must do is to choose an error function. The most common is the sum of square errors given by
where the is some estimate of the error of the data. Often, you can use the experimental uncertainty or just make some guess. Generally, the answer doesn’t depend too much on . Next, you need to construct a likelihood function. To fully explain what this means, I would need to go into some details about statistics but suffice it to say that usually a Gaussian works perfectly well so I basically use
Hence, the likelihood is maximal when the error is minimal. The method of Maximum Likelihood is simply finding the set of parameters that maximizes the likelihood function.
I’ll now give the simple Metropolis algorithm to do a maximum likelihood estimation. I’ll give it in words first and then write it out in pseudo code. The first thing is to start with some initial guess for the parameter values and compute . If it is more than one parameter you just make a guess for all of them at once. Next propose a new set of parameter values and calculate and compute the likelihood ratio
The likelihood ratio is a positive real number. You now pick a uniform random number between 0 and 1. If the likelihood ratio is bigger than you replace the current parameter set with the proposed set and then repeat. At each step of the process you store the value of the current set. In some cases, the current set won’t change but you still keep it (this will be useful when we get to the Bayesian inference). You then run this for a long time and pick the parameter set that had the maximum likelihood. Generally, you just save the parameters after some burn in period.
In pseudo Matlab code:
theta1 = zeros(1,p); % initial guess for p parameters solve (y,theta1); % solve gives the vector y at the desired time points % for parameters theta1 chi1= sum((D - y)^2);
for n = 1: total; theta2 =theta1+ a.* randn(1,p) % p is the number of parameters and % a is the guess standard deviation solve(y,theta2); chi2=sum((D-y)^2);
if rand < ratio theta1=theta2; chi1=chi2; end
if n > burntime thetasave(n-burntime)=theta1; chisave(n-burntime)=chi1; end
end Pick thetasave such that chisave is minimal.
Two questions you will have is how long do you run and how do you choose the guesses. Unfortunately, there are no firm answers. A simple thing to do is to pick a Gaussian random variable around the current value as a guess. However, if you have a bound on the parameter, such as it has to be positive, then you need to use the Metropolis-Hastings algorithm, which generalizes to having non-symmetric guesses. As for how long to run, I usually just run the MCMC, increase the time and run again and see how much the answer changes. If the parameters start to converge then you can be confident that you have run long enough. In principle, if you wait long enough the MCMC will sample all of guess space and find the global optimum. In practice, it can get stuck in local minima for a long time before getting out.
Jun 29, 2015 clarification: My definition of is not the same as what is generally used in statistics. For convenience, I defined with a 2 in the denominator such that the likelihood would be given by . However, in statistics, one would not include the factor of 2 in the denominator, and the thus the likelihood would be . This doesn’t really make a difference for the algorithm but could be an issue if you wanted to compare to other results.