New paper on finite size effects in spiking neural networks

Michael Buice and I have finally published our paper entitled “Dynamic finite size effects in spiking neural networks” in PLoS Computational Biology (link here). Finishing this paper seemed like a Sisyphean ordeal and it is only the first of a series of papers that we hope to eventually publish. This paper outlines a systematic perturbative formalism to compute fluctuations and correlations in a coupled network of a finite but large number of spiking neurons. The formalism borrows heavily from the kinetic theory of plasmas and statistical field theory and is similar to what we used in our previous work on the Kuramoto model (see here and  here) and the “Spike model” (see here).  Our heuristic paper on path integral methods is  here.  Some recent talks and summaries can be found here and here.

The main goal that this paper is heading towards is to systematically derive a generalization of Wilson-Cowan type mean field rate equations (see here for a review) to include the effects of fluctuations and correlations and even higher order cumulants. The germ of the idea for this quest came to me about a decade ago when I sat in an IBSC centre meeting headed by Jay McClelland. The mission of the grant was to build “biologically informed” models of cognition. Jay had gathered a worldwide group of his colleagues to work on a number of projects that tried to better connect connectionist models to biology. He invited me to join the group because of my work on binocular rivalry. What Jay and others had done was to show how models consisting of coupled rate neurons with simple learning rules like Backpropagation and Contrastive Hebbian Learning could impressively reproduce many human cognitive functions like classifying objects. However, it was not clear what neural mechanisms could implement such learning rules in the brain. Backpropagation is notoriously nonbiological because it requires that the weights be updated in a “backwards” direction to how the signal is propagating. Contrastive Hebbian learning seems more biological in that it alternates episodes of learning and “unlearning” where weights are increased and decreased respectively. Spike-timing dependent plasticity (STDP), where weights increase or decrease depending on the precise timing of pre- and post-synaptic spikes, had just been recently discovered and seemed like a good candidate learning mechanism. Howerer, STDP required information about spike times that rate models didn’t have by design. In order to apply STDP, one had to simulate an entire spiking network. The connectionists rate models already maximized the computational power at that time and spiking networks would require even more power. What was needed, I thought, was a rate model that carried correlation information – a set of generalized activity equations. I imagined a self-consistent system where correlations would change the rate and the rate would influence correlations. It didn’t take me very long to realize that this was a difficult task and I needed to develop new machinery to do it.

At about the same time I was sitting in these meetings, I was also helping Eric Hildebrand finish his PhD. Eric was JD Crawford’s graduate student when JD sadly died before his time. He had already done some interesting work looking at some bifurcations in the Kuramoto model and I suggested that he finish his thesis by applying kinetic theory to the Kuramoto model to calculate finite size fluctuations, which was an open problem posed by Steve Strogatz and Rennie Mirollo. I came up with the kinetic theory idea when I visited Wulfram Gerstner at EPFL in 1997 and realized that plasma physics and neural dynamics had a lot in common. In the back of my mind, I knew that the kinetic theory technology Eric and I were developing would be useful for the generalized activity equations. However, the calculations were very unwieldy so I put the project aside.

In 2005, I was invited by Steve Coombes and Gabriel Lord to attend the Mathematical Neuroscience meeting in Edinburgh. Jack Cowan was also invited and he gave a talk on applying quantum field theory methods to neural dynamics. Jack had been on his own quest to generalize his Wilson-Cowan equations to a full probabilistic theory for decades. Michael Buice was his graduate student and was able to make Jack’s ideas more concrete by applying concepts from the theory of critical phenomena and in particular those of John Cardy. One of the great realizations in modern theoretical physics was that quantum field theory and statistical mechanics were actually very similar and that the Feynman path integral and Feynman diagram methods used to calculate quantities in quantum field theory were equally adept at calculating quantities in statistical mechanics. In fact the problems of renormalization in Quantum Field Theory were only fully understood after Ken Wilson formalized the renormalization group  method for critical phenomena using field theory methods. Both theories are about representing probability distributions in terms of “sums over states or histories” known as the path integral. Perturbation theory can then be performed using asymptotic methods such as the method of steepest descents. The method works because the Gaussian integral can be generalized to an arbitrary number (including uncountably infinite) number of dimensions. The terms of the perturbative expansion also obey systematic rules that can be encapsulated in terms of diagrams. When I saw Jack’s talk, I realized right then that this was the formalism I needed. I then asked when Michael was graduating and recruited him to come to NIH.

When Michael arrived in the fall of 2005, it took awhile for us to learn how to communicate to each other. I was mostly steeped in dynamical systems culture and he was a hard-core field theorist using a formalism that I wasn’t acquainted with. So I thought, a good warm up problem would be for him to turn Eric’s thesis into a paper. Eric had left for a job in industry before he finished his thesis and didn’t have the time to write a paper. Also, the calculations didn’t fit the simulations very well. Michael immediately understood what we were doing and found that we missed a factor of 2\pi, which made the fits beautiful. This led to our paper in Phys Rev Letters. Michael also realized that the kinetic theory formalism we were using could be recast in terms of path integrals. (In fact,  any Klimontovich equation can be converted into a path integral using this formalism.) This made the calculations much more tractable. Using this formalism, he was able to show in our Phys Rev E paper that the asynchronous stationary state in the Kuramoto model, which is marginally stable in mean field theory, is stabilized by finite size fluctuations.

After the resounding success of the Kuramoto model, we turned our attention to neural networks. We wanted to apply our method to integrate-and-fire models but ran into some technical hurdles (that are probably not insurmountable – interested anyone?) so we switched our attention elsewhere. One of the things that I thought we could do was to explicitly write down a set of generalized activity equations for the “Spike model” that he and Jack worked on.  This is a simple model where a “neuron” spikes with a rate that depends on inputs through a gain function and then decays with a fixed rate. The full probabilistic model can be written down as a giant intractable Master equation. In his PhD thesis, Michael wrote down the equivalent Path Integral representation for the Master equation. Our goal was to extract a set of self-consistent equations for the first two cumulants (i.e. the mean and covariance). Vipul Periwal, my colleague in the LBM who is a former string theorist, pointed us to the Two Particle Irreducible effective action (2PI) approach, which does exactly what we needed. The result is our paper in Neural Computation. I also suggested that he derive the equations two ways. The first was to use kinetic theory directly and the second was using the 2PI field theory methods. That way, people could better understand the formalism. This turned out to be a daunting task and required heroic calculations on Michael’s part. One of the issues was that Michael had defined mean field theory for his system to be one where all probability distribution of spike counts was exactly Poisson, which means that all the cumulants are equal to the mean. In order to expand around the Poisson state, one needs to construct the expansion in terms of factorial moments, which we called “normal ordered” cumulants in our paper. This was very painful. The field theory automatically does this through what is called a Doi-Peliti-Janssen transformation. The paper is nice but in some sense it side-tracked us from our main goal, and that was deriving the 2PI for a spiking neural network.

I don’t remember what happened after that exactly but after a few failed attempts at other projects, we finally set down in earnest to tackle the generalized activity equations for spiking neurons. We realized that instead of using integrate-and-fire neurons, which have a discontinuity, we should use the Theta neuron model with synaptic coupling. One of the side benefits of our project was that we fully worked out the correct mean field theory for this model. It involves two “fields”, one for the density of neurons at a particular phase and the other for the synaptic drive to the neurons. The resulting path integral representation also has two fields. Since our theory  perturbs around mean field theory, a requirement to calculate quantities is that we must first solve the mean field equations. This is actually quite difficult since it consists of an integro-partial differential equation. Closed form solutions can be obtained for the asynchronous state and numerical solutions can be obtained in other regimes. Our PLoS Computational Biology paper considers the Theta model and an even simpler “phase” model. We explicitly compute the complete time-dependent second moment of the firing rate and synaptic drive including transients. The equations in the paper are not rendered perfectly unfortunately but they should be understandable. Don’t hesitate to ask me or Michael if you have any questions. This paper does not produce the 2PI for the theta model but it shows the way of how that can be achieved.


9 thoughts on “New paper on finite size effects in spiking neural networks

  1. off by 2pi?

    i love the finite size studies (used to work on the flory-stockmayer-goldberg applications of max entropy to the sol-gel transition and immunology (antibody-antigen models). julius gibbs had a finite size model for the phase transition for water (he died when president of amherst college) which we adapted (with alan perelson of the santa fe institute—his colleague (delisi) worked at nih). (some of gibbs papers are online–i think they were published in j physical chemistry or pra).

    good job—good post though i’ll have to go through it again.

    neil barton of population genetics i think is at edinburg—he had a paper in plos / arxiv pointing out he had shown the boltzmann distribution was the same as the fisher-wright solution (in an appendix to a paper with rouani (sic) of iran in ‘genetics’ journal.

    anyway—cold out here.

    mclelland is the ‘sh-t’—-i think he was at carnegie mellon, part of the whole anti-innatist (anti-chomskyian) group for language aquisition (UCSD is also big there—-ellen bates and elman—‘rethinking innateness’ book; throw in andy clark too (arxiv)—showing why ‘ gold’s theorem’ is not really applicable. (herbert simon (economics) was also at CMU; also now shalizi—-3 toed sloth blog)
    i think its cold out here.


  2. I’m a little out of my league here, so I’m hopeful that you could shed a little insight in regards to your paper. Can you now take these methods and create a spiking network that learns through STDP?
    Or, is this just a method to calculate some network statistics of a randomly fluctuating process, but really cannot be utilized to facilitate encoding (or decoding) information in spiking networks?


  3. @ishi So are you for or against the innate theory?

    @Tom That is the goal although the paper is not quite there yet. However, it shows how to get to a self consistent network that can learn through correlations. It does not assume any randomness in the firing patterns. It computes the correlations ab initio in a globabally coupled network of spiking neurons.


  4. i’m basically against the innate theory. one can compare the visual system with language. in both cases some things are genetically determined like the ‘language organ’ chomsky describes as analogous to a heart, or the fact people have five fingers however they are raised.
    however, even vision is programmed to an extent—old experiments raising cats where there were only vertical or horizontal bars shows they could then not see the other kind.
    language to me is more like ‘games’ children can learn games really fast (peekaboo or tag) just as they can learn words/grammar really fast but i think that is due to a general purpose learning device, rather than evidence for poverty of the stimulus/language aquisition device/organ.
    (since this idea arose with chomsky in syntactical structures around 1959 i could add this idea i see as absurd as his view that language did not evolve for communication, but rather represents some sort of innate knowledge of a platonic universe. he once explained to me at a conference that the reason language could not have evolved for communication was because people mostly talk to themselves, so its innate and only later became involved with the external world by some ‘spandrel’ effect (lewontin and gould). i thought most talking to oneself was rehearsing pick up lines, however—in other words, practicing commmunication. he also says animals don’t have language; i see bears marking trees with their claws as a form of language though they dont have our vocal abilities)

    by the way its ‘alexander clark’ (on —phd thesis) not andy, and he writes with shalom lappin on various kinds of computational language aquisition a la mcclelland or elman and bates. shalom lappin has many good general articles on this. there are also the other approaches (agent based modeling)—tones of papers.

    what i find equally interesting is the ‘sociology’ of this—-how the two camps more or less ignore each other (i knew someone who worked for AAAS and he said he had to split the 2 groups into seperate meetings; sortuh like obama versus romney). there are also all kinds of permutations—-eg people (marc hauser who resigned from harvard) who find a ‘moral organ’ etc. i guess it pays the bills.


  5. @Tom Given the recent advances in machine learning it’s hard to see why language acquisition cannot be done statistically. We may have physiology optimized for language but isn’t not necessary for the task.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s