MCMC and fitting models to data

As I have posted before, I never learned any statistics during my education as a theoretical physicist/applied mathematician.  However, it became fairly apparent after I entered biology (although I managed to avoid it for a few years) that fitting models to data and estimating parameters is unavoidable.   As I have opined multiple times previously, Bayesian inference and the Markov Chain Monte Carlo (MCMC) method is the best way to do this.   I thought I would provide the details on how to implement the algorithm since most of the treatments on the subject are often couched in statistical language that may be hard to decipher for someone with a physics or dynamical systems background.  I’ll do this in a series of posts and in a nonstandard order.  Instead of building up the entire Bayesian formalism, I thought I would show how the MCMC method can be used to do the job and then later show how it fits into the grand scheme of things to do even more.

Suppose you have data D(t_i), which consists of some measurements (either a scalar or a vector) at discrete time points t_i and a proposed model, which produces a time series y(t | \theta),  where \theta represents a set of free parameters that changes the function.  This model could be a system of differential equations or  just some proposed function like a polynomial.    The goal is to find the set of parameters that best fits the data and to evaluate how good the model is.  Now, the correct way to do this is to use Bayesian inference and model comparison, which can be computed using the MCMC.   However, the MCMC can also be used just to get the parameters in the sense of finding the best fit according to some criterion.

Continue reading

Advertisements

Progress in robotics

The blog Skippy Records has an interesting post on the recent progress of robotics with incredible videos of robots demonstrating amazing capabilities.  I’ve always felt that the robotics community seemed to be way ahead of the computational neuroscience community in terms of presenting deliverables.  I remember attending a talk by Rodney Brooks almost a decade ago that left me astounded.   It made me wonder if reverse engineering the brain is much more difficult then simply engineering a new one.  Just trying to make stuff work and not worrying about anything else is a great advantage.  Computational neuroscience is more of a diffusion driven field where people try out different ideas rather than build on previous work.

The increasing availability of high quality data is also both a blessing and a curse.  It is a blessing because it can help constrain and validate models.  It can be a curse because theorists may become less bold and creative.  I’ve blogged before on my old blog that it may be possible that we already have all the biological mechanisms necessary to understand how the brain works.  That is not to say that new experimental results are unimportant. Biological details are critical when it comes to understanding the genetic basis for brain function or curing diseases.  However,  discovering the main principles for how the brain performs computations may not require knowing all the biological mechanisms.  The trend in neuroscience and biology in general is towards more and more high throughput data acquisition.  The consequence may be that while data driven systems biology will thrive classical computational neuroscience could actually slow.

Talk at Gatsby

I visited the Gatsby Computational Neuroscience Unit in London on Friday. I talked about how the dynamics of many observed neural responses to visual stimuli can be explained by varying just two parameters in a “micro-cortical circuit” at the sub-millimetre level.  The circuit consists of recurrent excitation, lateral inhibition and fatigue mechanisms like synaptic depression.  Recurrently connected pools of neurons inhibit other pools and the competition between pools with fatigue leads to all the varied observed responses we see.  I also covered my recent paper on autism where we describe how perturbing the synaptic balance in the micro-cortical circuit can then lead to alterations in performance of simple saccade tasks that seem to match clinical observations.  My slides for the talk are here.

Slides for Talk

I’m currently in England at the Dendrites, Neurones, and Networks workshop.  The talks have really impressed me.  The field of computational neuroscience has really reached a critical mass where truly excellent work is being done in multiple directions.  I gave a talk on finite system size effects in neural networks.  I mostly covered the work on the Kuramoto model with a little bit on synaptically coupled phase neurons at the end.  My slides are here.

Some numbers for the BP leak

The Deepwater Horizon well is situated 1500 m below the surface of the Gulf of Mexico.  The hydrostatic pressure is approximately given by  the simple formula of P_a+ g\rho h where P_a = 100 \ kPa is the pressure of the atmosphere, \rho = 1 \ g/ml = 1000 \ kg/m^3   is the density of water, and g = 10 \ m/s^2 is the gravitational acceleration.  Putting the numbers together gives 1.5\times 10^7 \ kg/m s^2, which is 15000 \ kPa or about 150 times atmospheric pressure.  Hence, the oil and natural gas must be under tremendous pressure to be able to leak out of the well at all.  It’s no wonder the Top Kill operation, where mud was pumped in at high pressure, did not work.

Currently, it is estimated that the leak rate is somewhere between 10,000 and 100,000 barrels of oil per day.  A barrel of oil is 159 litres or 0.159 cubic metres.  So basically 1600 to 16000 cubic metres of oil is leaking each day.  This amounts to a cube with sides of about 11 metres for the lower value and 25 metres for the upper one, which is about the length of a basketball court.  However, assuming that the oil forms a layer on the surface of the ocean that is 0.001 mm thick, this then corresponds to a slick with an area between 1,600 to 16,000 square kilometres.  Given that the leak has been going on for almost two months and the Gulf of Mexico is 160,000 square kilometres, this implies that the slick is either very thick, oil has started to wash up on shore, or a lot of the oil is still under the surface.