# The gigabit machine

There are plenty of amazing things about the brain but one of the most mind boggling is that it can be coded by a genome of only 3 billion base pairs or 6 gigabits.  If we consider a brain with about $10^{11}$ neurons each receiving something like $10^4$ inputs (i.e. synapses), then we’re talking about something on the order of $10^{15}$ parameters to set.   Hence the genome does not carry enough information to set every connection.  I had an amusing senior moment on the train ride from Edinburgh to London after the Mathematical Neuroscience workshop with Peter Latham, where I took the log of $10^{15}$ and ended up with about 50 bits of information to specify the brain.  Peter had a good laugh at my expense and we’ve now dubbed the double log of the entropy as the number of “super bits”.  My confusion stemmed from the fact that I had only accounted for the amount of information needed to identify a neuron and that amounts to about 50 to 100 bits of information.  For example, if each neuron could secrete any combination of 50 different neurotrophic factors and expressed any combination of 50 different receptors then each neuron would have enough information to connect to any other specific neuron. However, that would not say   which combination of genes are expressed in a given neuron, which is what is necessary to specify all the connections. Thus, one would only need about 100 bits to be able to connect up a random brain but would need $10^{15}$ bits to specify each connection.

# Systematic fluctuation expansion for neural networks slides

I gave a talk today at the Mathematical Neuroscience workshop on my recent work with Michael Buice and Jack Cowan on deriving generalized activity equations for neural networks.  My slides are here.  The talk is based on the paper we uploaded to the arXiv recently and I summarized here.

# Path integral methods for stochastic equations

I’m currently in Edinburgh for a Mathematical Neuroscience workshop. I gave a tutorial today on using field theoretic methods to solve stochastic differential equations (SDE’s). The slides are here.  The methods I presented have been around for decades but as far as I know they haven’t been collated together into a pedagogical review for nonexperts.  Also, there is an entire community of theorists and mathematicians that are unaware of path integral methods.  In particular, I apply the response function formalism stemming from the work of Martin Siggia and Rose.  Field theory and diagrammatic methods are a nice way to organize perturbation expansions for nonlinear SDE’s.  I plan to write a review paper on this topic in the next few months and will post it here.

Addendum: Jan 20, 2011.  The review paper can be found here.

# Why can’t economics be more like physics?

In recent months, physicists have been lamenting that economics needs to be completely revamped to be more like physics (for example see here).  Meanwhile, some nonphysicists are blaming physicists for our current mess (for example see here).  I will argue in this post that economics is not more like physics mostly because it is not like physics.  I also think that everyone on Wall Street from the CEO to the quants must share in the blame for the credit crisis.  However, I don’t blame the models they have used.  All models are based on a set of assumptions and it is up to the modeler to decide when they hold.

The first reason why economics and especially macroeconomics is not like physics is that it is almost impossible to verify theories.  The problem is even worse than in biology because in biology you can at least try to average over cells or organisms.  But in economics you only have one sample.  The analogy to physics would be to ask when a specific spin in a magnet would flip. The theories can only predict what the distribution will be.  For example, we can never really know if the stimulus package recently passed in the US will actually work because we can’t do the controlled experiment where we see what happens if we didn’t have the stimulus.  Even economists agree on this point (for example see what Harvard economist Greg Mankiw says).   So you can have prominent economists vehemently disagreeing on basic points like whether or not more government spending will help and they can always point to reasons for why they are correct.  That is one of the reasons why economics is so mathematical.  Without any strong empirical evidence, the best you can do is to prove theorems.

# The “Welcome to your Brain” Team on Wild Side

Neuroscientists Sam Wang of Princeton and Sandra Aamodt of Nature Neuroscience, are guest bloggers at the Wild Side (Olivia Judson’s blog) on the New York Times today.  They have written about how working memory training can improve IQ.  Here’s an excerpt, but you should read the whole thing, especially the caveats.

A recent paper reported that training on a particularly fiendish version of the n-back task improves I.Q. scores. Instead of seeing a single series of items like the one above, test-takers saw two different sequences, one of single letters and one of spatial locations. They had to report n-back repetitions of both letters and locations, a task that required them to simultaneously keep track of both sequences. As the trainees got better, n was increased to make the task harder. If their performance dropped, the task was made easier until they recovered.

Each day, test-takers trained for 25 minutes. On the first day, the average participant could handle the 3-back condition. By the 19th day, average performance reached the 5-back level, and participants showed a four-point gain in their I.Q. scores.

The I.Q. improvement was larger in people who’d had more days of practice, suggesting that the effect was a direct result of training. People benefited across the board, regardless of their starting levels of working memory or I.Q. scores (though the results hint that those with lower I.Q.s may have shown larger gains). Simply practicing an I.Q. test can lead to some improvement on the test, but control subjects who took the same two I.Q. tests without training improved only slightly. Also, increasing I.Q. scores by practice doesn’t necessarily increase other measures of reasoning ability (Ackerman, 1987).

Since the gains accumulated over a period of weeks, training is likely to have drawn upon brain mechanisms for learning that can potentially outlast the training. But this is not certain. If continual practice is necessary to maintain I.Q. gains, then this finding looks like a laboratory curiosity. But if the gains last for months (or longer), working memory training may become as popular as — and more effective than — games like sudoku among people who worry about maintaining their cognitive abilities.

# Mutual information and GPCRs

A new paper has just been published on PLoS One: “Computing Highly Correlated Positions Using Mutual Information and Graph Theory for G Protein-Coupled Receptors” by Sarosh Fatakia, Stefano Costanzi and myself. G-Protein-Coupled Receptors (GPCRs)  are a very large family of cell surface receptors that are ubiquitous in biological systems.  Examples include olfactory receptors, neuromodulatory receptors like dopamine, and rhodopsin in the eye.  Most drugs in use today target GPCRs.  There are thousands of different types in humans alone.   What we do in this paper is to look for amino acid positions along the GPCR sequence that may be important for structure and function.  Presumably, GPCRs all evolved from a single ancestor protein so important positions may have coevolved, i.e. a mutation at one position would be compensated by mutations at other positions.

The way we looked for these positions was to consider an alignment that was previously computed for three classes of GPCRs.    A GPCR sequence is given by a string of letters corresponding to the 20 amino acids.  An alignment is an arrangement of the strings into a matrix, where the rows of the matrix correspond to strings that are arranged so that the columns can be considered to be equivalent positions. We only considered the transmembrane regions of the receptor so we could assume there were no insertions and deletions.  We then computed the mutual information between each pair of positions (i.e. columns of matrix) j and k.   The mutual information (MI) is given by the expression

$MI(j,k) = \sum_{x,y} p_{j,k}(x,y)\log \frac{p_{j,k}(x,y)}{p_{j}(x)p_{k}(y)}$,

where $p_j(x)$ is the probability of amino acid x appearing at position j, $p_{j,k}(x,y)$ is the probability of amino acids x and y appear at sites j and k, and the sum over x and y is over all the amino acids.  Basically, MI is a measure of the “excess” of probability of the occurrence of amino acid x at position j and amino acid y at position k, over what would have occurred if they were statistically independent.  One of the problems with mutual information is that  you need a lot of data to compute it accurately.  Given that we only had a finite number of sequences in each class, error in the MI estimate was expected.  So what we did was to set a threshold value for significance compared to the null hypothesis of a set of random sequences.

To test our  hypothesis that important positions would co-evolve as a network, we  constructed a graph out of the MI matrix where the vertices were the positions and an edge was drawn between two vertices only if the MI was significant.  We then looked for interconnected subgraphs or cliques.  Finding a clique is an NP complete problem so as a surrogate we looked for high degree (connectivity) positions and ranked the positions according to degree.  We then assessed the degree significance by comparing our MI graph to a random graph. It turned out that the top 10 significant positions formed a clique and also corresponded to the binding cavity for ligands in the three GPCR structures that have been solved thus far.  The method also did not find a binding cavity in one class of GPCRs for which no cavity has been observed experimentally.  The method could be used on any protein family  to search for important positions.

note: updated Mar 7 to correct MI formula

erratum, Dec 13, 2011:  The number of human GPCRs is now thought to number less than a thousand.