The Scientific Worldview

An article that has been making the rounds on the twitter/blogosphere is The Science of Why We Don’t Believe Science by Chris Mooney in Mother Jones.  The article asks why it is that people cling to old beliefs even in the face of overwhelming data against them.  It argues that we basically use values to evaluate scientific facts.  Thus if the facts go against a value system that was built over a lifetime, we will find ways to rationalize away the facts.  This is particularly true for climate change and vaccines causing autism.  The scientific evidence is pretty strong that our climate is changing and vaccines don’t cause autism but adherents to these beliefs simply will not change their minds.

I mostly agree with the article but I would add that the idea that the scientific belief system is somehow more compelling than an alternative belief system may not be on as solid ground as scientists think.  The concept of rationality and the scientific method was a great invention that has improved the human condition dramatically.  However,  I think one of the things that people trained in science forget is how much we trust the scientific process and other scientists.  Often when I watch a science show like  NOVA on paleontology, I am simple amazed that archeologists can determine that a piece of bone that looks like some random rock to me, is a fragment of a finger bone of a primate that lived two million years ago.  However,  I trust them because they are scientists and I presume that they have received  the same rigorous training and constant scrutiny I have received.  I know that their conclusions are based on empirical evidence and a line of thought that I could follow if I took the time.  But if I grew up in a tradition where a community elder prescribed truths from a pulpit, why would I take the word of a scientist over someone I know and trust?  To someone not trained or exposed to science, it would just be the word of one person over another.

Thus, I think it would be prudent for scientists to realize that they possess a belief system that in many ways is no more self-evident than any other system.  Sure, our system has proven to be more useful over the years but ancient cultures managed to build massive architectural structures like the pyramids and invented agriculture without the help of modern science and engineering.   What science prizes is parsimony of explanation but at the risk of being called a post-modern relativist, this is mostly an aesthetic judgement.  The worldview that everything is the way it is because a creator insisted on it is as self-consistent as the scientific view.  The rational scientific worldview takes a lot of hard work and time to master.  Some (many?) people are just not willing to put in the effort it takes to learn it.   We may need to accept that a scientific worldview may not be palatable to everyone.  Understanding this truth may help us devise better strategies for conveying scientific ideas.

Does the cosmos know you exist?

After a year-long public battle with cancer, the writer and cultural critic Christopher Hitchens died this Thursday. Commenting on his early death, Hitchens reportedly told  NPR that he was “dealt a pretty good hand by the cosmos, which doesn’t know I’m here and won’t know when I’m gone.”  Hitchens made this comment because he was a fervid atheist.  However, the statement could be valid even if the universe has a creator.  It all depends on whether you think the universe is computable or not.  (By all accounts, it is at least well approximated by a computable universe.)  If the universe is computable, then in principle it is equivalent to one (or many) of the countably infinite number of possible computer programs.  This implies that it is possible that someone wrote the program that generated our universe and this person would in fact be the Creator.  However, depending on the cardinality of the Creator (by cardinality I mean the size of a set and not a reference to Catholicism), the Creator may or may not know that you or any thing at all exists in her universe.

Let’s take a specific example to make this more concrete.  It has been shown that simple cellular automata (CA) like  Rule 110 are universal computers.  A CA is a discrete dynamical system on a grid where each grid point can be either 1 or 0 (i.e. bits) and there is an update rule where the bits stay the same or flip on the next time step depending on the current state of the bits.  (Rule 110 is a one-dimensional CA where a bit is updated depending on the state of its two nearest neighbours and itself.)  Thus every single possible computation can be generated by simply using every bit string as an initial state of Rule 110.  So the entire history of our universe is encoded by a single string of binary digits together with the bits that encode Rule 110. Note that it doesn’t matter if our universe is quantum mechanical since any quantum mechanical system can be simulated on a classical computer.  Thus, all the Creator needed to do was to write down some string of digits and let the CA run.

Now, what constitutes “you” and any macroscopic thing in the universe is a collection of bits.  These bits need not be contiguous since nothing says that you have to be local at the level of the bits.  Thus you would be one of all the possible subsets of the bits of a binary string.  The set of all these subsets is called the power set. Since, any bit can either be in a subset or not, there are 2^N sets in the power set.  Thus, you are one of an exponential number of possible bit combinations for a finite universe and if the universe is infinitely large then you are one of an uncountably infinite number of possible combinations.  Hence, in order for the Creator to know you exist she has to a) know which subset corresponds to you and be able to find you and b) know when that subset will appear in the universe.  Thanks to the brilliance of Georg Cantor and Alan Turing, we can prove that even if a Creator can solve a) (which is no easy task), she cannot solve b) unless she is more powerful than a classical computer.  The reason is because in order to solve b), she has to predict when a given set of symbols will appear in the computation and this is equivalent to solving the Halting Problem (see here for a recent post I wrote introducing the concepts of computability).  Hence, knowing if “you” will exist, is undecidable.  In a completely self-consistent world where every being is computable, no being can systematically determine if another being exists in their own creation. In such a universe, Hitchens’s is right.  However, the converse is also true so that if there is a universe where there is a Creator that knows about “you”, then that Creator must also be computationally more powerful than you.


Two talks

Last week I gave a talk on obesity at Georgia State University in Atlanta, GA. Tomorrow, I will be giving a talk on the kinetic theory of coupled oscillators at George Mason University in Fairfax, VA. Both of these talks are variations of ones I have given before so instead of uploading my slides, I’ll just point to links to previous talks, papers, and posts on the topics.  For obesity, see here and for kinetic theory, see here, here and here.

In the Times

The New York Times has some interesting articles online right now.   There is a series of interesting essays on the Future of Computing in the Science section and the philosophy blog The Stone has a very nice post by Alva Noe on Art and Neuroscience.  I think Noe’s piece eloquently phrases several ideas that I have tried to get across recently, which is that while mind may arise exclusively from brain this doesn’t mean that looking at the brain alone will explain everything that the mind does.  Neuroscience will not make psychology or art history obsolete.  The reason is simply a matter of computational complexity or even more simply combinatorics.  It goes back to Phillip Anderson’s famous article More is Different (e.g. see here), where he argued that each field has its own set of fundamental laws and rules and thinking at a lower level isn’t always useful.

For example, suppose that what makes me enjoy or like a piece of art is set by a hundred or so on-off neural switches.  Then there are 2^{100} different ways I could appreciate art.  Now, I have no idea if a hundred is correct but suffice it to say that anything above 50 or so makes the number of combinations so large that it will take Moore’s law a long time to catch up and anything above 300 makes it virtually impossible to handle computationally in our universe with a classical computer.  Thus, if art appreciation is sufficiently complex, meaning that it involves a few hundred or more neural parameters, then Big Data on the brain alone will not be sufficient to obtain insight into what makes a piece of art special. Some sort of reduced description would be necessary and that already exists in the form of art history.  That is not to say that data mining how people respond to art may not provide some statistical information on what would constitute a masterpiece.  After all, Netflix is pretty successful in predicting what movies you will like based on what you have liked before and what other people like.  However, there will always be room for the art critic.

New paper on GPCRs

New paper in PloS One:

Fatakia SN, Costanzi S, Chow CC (2011) Molecular Evolution of the Transmembrane Domains of G Protein-Coupled Receptors. PLoS ONE 6(11): e27813. doi:10.1371/journal.pone.0027813


G protein-coupled receptors (GPCRs) are a superfamily of integral membrane proteins vital for signaling and are important targets for pharmaceutical intervention in humans. Previously, we identified a group of ten amino acid positions (called key positions), within the seven transmembrane domain (7TM) interhelical region, which had high mutual information with each other and many other positions in the 7TM. Here, we estimated the evolutionary selection pressure at those key positions. We found that the key positions of receptors for small molecule natural ligands were under strong negative selection. Receptors naturally activated by lipids had weaker negative selection in general when compared to small molecule-activated receptors. Selection pressure varied widely in peptide-activated receptors. We used this observation to predict that a subgroup of orphan GPCRs not under strong selection may not possess a natural small-molecule ligand. In the subgroup of MRGX1-type GPCRs, we identified a key position, along with two non-key positions, under statistically significant positive selection.

The pitfalls of obesity and cancer drugs

The big medical news last week was that the US FDA revoked the use of the drug Avastin for the treatment of breast cancer.  The reason was that any potential efficacy did not outweigh the side effects.  Avastin is an anti-angiogenesis drug that blocks the formation of blood vessels by inhibiting the vascular endothelial growth factor VEGF-A.  This class of drugs is a big money-maker for the biotechnology firm Genentech and has been used in cancer treatments and for macular degeneration where it is called Lucentis.  Avastin will still be allowed for colorectal and lung cancer and physicians can still prescribe it off-label for breast cancer.  The strategy of targeting blood delivery as an anti-tumour strategy was pioneered by Judah Folkman.  He and collaborators also showed that adipose tissue mass (i.e. fat cells) can be regulated through controlling blood vessel growth (Rupnick et al., 2002) and this has been proposed as a potential therapy for obesity (e.g. Kolonin et al, 2004; barnhart et al. 2011).  However, the idea will probably not go very far because of potential severe side effects.

I think this episode illustrates a major problem in developing any type of drug for obesity and to some degree cancer.  I’ve posted  on the basic physiology and physics of weight change multiple times before (see here) so I won’t go into details here but suffice it to say that we get fat because we eat more than we burn.  Consider this silly analogy:  Suppose we have a car with an expandable gas tank and we seem to be overfilling it all the time so that it’s getting really big and heavy.  What should we do to lighten the car?  Well, there are three basic strategies: 1) We can put a hole in the gas tank so as we fill the tank gas just leaks out. 2) We can make the engine more inefficient so it burns gas faster or 3) We can put less gas in the car.  If you look at it this way, the first two strategies seem completely absurd but they are pursued all the time in obesity research.  The drug Orlistat blocks absorption of fat in the intestines, which basically tries to make the gas tank (and your bowels) leaky.  One of the most celebrated recent discoveries in obesity research was the discovery that human adults have brown fat.  This is a type of adipocyte that converts food energy directly into heat.  It is abundant in small mammals like rodents and babies (that’s why your  newborn is nice and warm) but was thought to disappear in adults. Now, various labs are trying to develop drugs that activate brown fat.  In essence they want to make us less efficient and turn us into heaters.   The third strategy of reducing input has also been tried and has failed various times.  Stimulants such as methampthetamines were found very early on to suppress appetite but turning people into speed addicts wasn’t a viable strategy.  A recent grand failure was the cannabinoid receptor CB-1 blocker Rimonabant.  It worked on the principle that since cannabis seems to enhance appetite, blocking it suppresses appetite. It does work but it also caused severe depression and suicidal thoughts.  Also, given that CB-1 is important in governing synaptic strengths, I’m sure there would have been bad long-term effects as well. I won’t bother telling the story of fen-phen.

It’s kind of easy to see why almost all obesity drug therapies will fail because they must target some important component of metabolism or neural function.  While we seem to have some unconscious controls of appetite and satiety, we can also easily override them (as I plan to do tomorrow for Thanksgiving).  Hence, any drug that targets some mechanism will likely either cause bad side effects or  be compensated by other mechanisms.  This also applies to some degree to cancer drugs, which must kill cancer cells while ignoring healthy cells.  This is why I tend not to get overly excited whenever another new discovery in obesity research is announced.

New paper

A new paper in Physical Review E is now available on line here.  In this paper Michael Buice and I show how you can derive an effective stochastic differential (Langevin) equation for a single element (e.g. neuron) embedded in a network by averaging over the unknown dynamics of the other elements. This then implies that given measurements from a single neuron, one might be able to infer properties of the network that it lives in.  We hope to show this in the future. In this paper, we perform the calculation explicitly for the Kuramoto model of coupled oscillators (e.g. see here) but it can be generalized to any network of coupled elements.  The calculation relies on the path or functional integral formalism Michael developed in his thesis and generalized at the NIH.  It is a nice application of what is called “effective field theory”, where new dynamics (i.e. action) are obtained by marginalizing or integrating out unwanted degrees of freedom.  The path integral formalism gives a nice platform to perform this averaging.  The resulting Langevin equation has a noise term that is nonwhite, non-Gaussian and multiplicative.  It is probably not something you would have guessed a priori.

 Michael A. Buice1,2 and Carson C. Chow11Laboratory of Biological Modeling, NIDDK, NIH, Bethesda, Maryland 20892, USA
2Center for Learning and Memory, University of  Texas at Austin, Austin, Texas, USA

Received 25 July 2011; revised 12 September 2011; published 17 November 2011

Complex systems are generally analytically intractable and difficult to simulate. We introduce a method for deriving an effective stochastic equation for a high-dimensional deterministic dynamical system for which some portion of the configuration is not precisely specified. We use a response function path integral to construct an equivalent distribution for the stochastic dynamics from the distribution of the incomplete information. We apply this method to the Kuramoto model of coupled oscillators to derive an effective stochastic equation for a single oscillator interacting with a bath of oscillators and also outline the procedure for other systems.

Published by the American Physical Society

DOI: 10.1103/PhysRevE.84.051120
PACS: 05.40.-a, 05.45.Xt, 05.20.Dd, 05.70.Ln

St. Petersburg Paradox

The St. Petersburg Paradox is a problem in economics that was first proposed by Nicolas Bernoulli in 1713 in a letter.  It involves a lottery where you buy a ticket to play a game where a coin is flipped until heads comes up.  If heads comes up on the nth toss you get 2^{n-1} dollars.  So if heads comes up on the first toss you get one dollar and if it comes up on the fourth you would get 8 dollars.  The question is how much would you pay for a ticket to play this game.  In economics theory, the idea is that you would play if the expectation value of  the payout minus the ticket price is positive. The paradox for this game is that the expectation value of the payout is infinite but most people would pay no more than ten dollars.  The solution to the paradox  has been debated for the past three centuries.  Now, physicist Ole Peters argues that everyone before has missed a crucial  point and provides a new resolution to the paradox.  Peters also shows that a famous paper by Karl Menger in 1934 about this problem contains two critical errors that nullify Menger’s results.  I’ll give a summary of the mathematical analysis below, including my even simpler resolution.

The reason the expectation value of the payout is infinite is that the distribution is not normalizable. This can be seen easily because while the probability of getting n heads in a row decreases exponentially as p(n)=(1/2)^n, the payout increases exponentially as S(n)=2^{n-1}. The product is always 1/2 and never decays.  The expectation value is thus

E[S]=\sum_{n=1}^\infty p(n)S(n) = 1/2+1/2 + \cdots

and diverges.  The first proposed resolution of the paradox was by Daniel Bernoulli in a 1738  paper submitted to the Commentaries of the Imperial Academy of Science of St. Petersburg, from which the paradox received its name.  Bernoulli’s suggestion was that people don’t really value money linearly and proposed a utility function U(S) = \log S, so the utility of money decreases with wealth. Given that this now grows sub-exponentially, the expectation value of U(S) is thus finite and resolves the paradox.  People have always puzzled over this solution because it seems ad hoc.  Why should my utility function be the same as someone else’s?  Menger suggested that he could always come up with an even faster growing payout function to make the expectation value still divergent and declared that all utility functions must be bounded.  According to Peters, this has affected the course of economics for the twentieth century and may have led to more risk taking than warranted mathematically.

Peter’s resolution is that the expectation value of the Bernoulli utility function is actually the time average of  the growth rate in wealth of a person that plays repeatedly.  Hence, if they pay more than a certain price they would certainly become bankrupt.  The proof is quite simple.  The factor by which a person’s wealth at round i changes is given by the expression

r_i = \frac{W_i-C+S_i}{W_i}

where W_i is the wealth, C is the cost to play, and S_i is the payout at round i.  The total fractional change after T rounds is thus \bar{r}_T=(\prod_{i=1}^T r_i)^{1/T}.  Now transform from rounds of the game played into n the number of tosses until the first heads.  This brings in the number of the occurrence of n, k_n to yield

\bar{r}_T= \prod_{n=1}^{n_{\infty}} r_n^{k_n/T}=\prod_{n=1}^{n_{\infty}} r_n^{p_n},

where p_n is the probability for n tosses.  The average growth rate is given by taking the log, which gives the expression

\sum_{n=1}^\infty \left(\frac{1}{2}\right)^n (\ln(W-c+2^{n-1})-\ln W)

which is equivalent to the Bernoulli solution without the need for a utility function.

Now my solution, which has probably been proposed previously, is that we don’t really evaluate the expectation value of the payout but we take the payout of the expected number of tosses, which is a finite amount.  Thus we replace E(S(n)) with S(E(n)), where

E(n) =\sum_{n=1}^\infty n \left(\frac{1}{2}\right)^n=2,

which means we wouldn’t really want to play for more than 2 dollars. This might be a little conservative but it’s what I would do.

Another talk in Marseille

I’m in beautiful Marseille again for a workshop on spike-timing dependent plasticity (STDP). My slides are here. The paper in which this talk is based can be obtained here. This paper greatly shaped how I think about neuroscience. I’ll give a summary of the paper and STDP for the uninitiated later.

Erratum: In my talk I said that I had reduced the models to disjunctive normal form. Actually, I had it backwards. I reduced it to conjunctive normal form. I’ll attribute this mixup to jet lag and lack of sleep.

Guest editorial

I have a guest editorial in the SIAM Dynamical System online magazine DSWeb this month.  The full text of the editorial is below and the link is here.  I actually had several ideas circulating in my head and didn’t really know what would come out until I started to write.   This is how my weekly blog posts often go.  The process of writing itself helps to solidify inchoate ideas.  I think too often, young people want to wait until everything is under control before they write. I try to tell them to never just sit there and stare at a screen.  Just start writing and something will come.

Math in the Twenty First Century

I was a graduate student at MIT in the late eighties. When I started, I wrote Fortran code on an EMACS editor at a VT100 terminal to run simulations on a Digital Equipment Corporation VAX computer. When I didn’t understand something, I would try to find a book or paper on the topic. I spent hours in the library reading and photocopying. Somehow, I managed to find and read everything that was related to my thesis topic. Email had just become widespread at that time. I recall taking to it immediately and found it to be an indispensible tool for keeping in touch with my friends at other universities and even to make dinner plans. Then, I got a desktop workstation running X Windows. I loved it. I could read email and run my code simultaneously. I could also log onto the mainframe computer if necessary. As I was finishing up, my advisor got a fax machine for the office (hard to believe that they are that recent and now obsolete) and used it almost everyday.

I think that immediate integration of technology has been the theme of the past twenty-five years. Each new innovation – email, the desktop computer, the fax machine, the laser printer, the world wide web, mobile phones, digital cameras, power point slides, iPods, and so forth – becomes so quickly enmeshed into our lives that we can’t imagine what life would be like without them. Today, if I want to know what some mathematical term means I can just type it into Google and there will usually be a Wikipedia or Scholarpedia page on it. If I need a reference, I can download it immediately. If I have a question, I can post it on Math Overflow and someone, possibly a Field’s medalist, will answer it quickly (I actually haven’t done this yet myself but have watched it in action). Instead of walking over to the auditorium, I now can sit in my office and watch lectures online. My life was not like this fifteen years ago.

Yet, despite this rapid technological change, we still publish in the same old way. Sure, we can now submit our papers on the web and there are online journals but there has been surprisingly little innovation otherwise. In particular, many of the journals we publish in are not freely available to the public. The journal publishing industry is a monopoly that has surprising lasting power. If you are not behind the cozy firewall of an academic institution, much of the knowledge we produce is inaccessible to you. Math is much better than most other sciences since people post their papers to the arXiv. This is a great thing but it is not the same as a refereed journal. Perhaps, now is the time for us to come up with a new model to present our work – something that is refereed and public. Something new. I don’t know what it should look like, but I do know that when it comes around I’ll wonder how I ever got along without it.

Philosophical confusion about free will

The New York Times has a blog series called the Stone on philosophy.  Unlike some of my colleagues, I actually like philosophy and spend time studying and thinking about it.  However, in some instances I think that philosophers just like to create problems that  don’t exist.  This week’s Stone post is by Notre Dame philosopher Gary Gutting on free will.  It’s in reference to a recent Nature news article on neuroscience versus philosophy on the same topic.  The Nature article tells of some recent research that shows that fMRI scans can predict what people will do before they  “know” it themselves.  This is not new knowledge.  We’ve known for decades that neurons in the brain “know” what an animal will do before the animal does.  Gutting’s post is about how these “new” results don’t settle the question of free will and that a joint effort between philosophers and neuroscientists could get us closer to the truth.

I think this kind of thinking is completely misguided.  When it comes to free will there are only two questions one needs to answer.  The first is do you think mind comes from brain and the second is do you think brain comes from physics. (There is a third implicit question which is do you believe that physics is complete, meaning that it is only defined by a given set of physical laws.)  Because, if you answer yes to these questions then there cannot be free will. All of our actions are determined by processes in your brain, which are governed by a set of prescribed physical laws. There is no need for any more philosophical or biological inquiry, as Gutting suggests.  I would venture that almost all neuroscientists think that brain is responsible for mind. You could argue that physics is not completely understood and there is some mysterious connection between quantum mechanics and consciousness but this doesn’t involve neuroscience and probably not philosophy either. It is a question of physics.

There are many unsolved problems in biology and neuroscience.  We really have little idea of how the brain really works and particularly how genes affect behaviour and cognition.  However, whether or not we have free will is no longer a question of neuroscience or biology.   That is not to say that philosophers and neuroscientists should not stop thinking about free will.  They should simply stop worrying about whether it exists and start thinking about what we should do in a post free will world (see previous blog post).  How does this impact how we think about ethics and crime?  What sort of society is fair given that people have no free will?  Although, we may not have free will we certainly have the perception of free will and we also experience real joy and pain. There are consequences to our actions and there could be ways that we can modify them so that they cause less suffering in the world.

Approaches to theoretical biology

I have  recently  thought about how to classify what theorists actually do and I came up with three broad approaches: 1) Model analysis, 2) Constraint driven modeling and 3) Data driven modeling.   By model, I mean a set of equations (and inequalities) that are proposed to govern or mimic the behavior of some biological system.  Often, a given research project will involve more than one category.  Model analysis is trying to understand what the equations  do. For example, there could exist some set of differential equations and the goal is to figure out what the solutions of these equations are or look like. Constraint driven modeling is trying to explain a phenomenon starting from another set of phenomena.  For example, trying to explain the rhythms in EEG measurements by exploring networks of coupled spiking neurons.  Finally, data driven modeling, is looking directly for patterns in the data itself and not worry about where the data may have come from.  An example would be trying to find systematic differences in the DNA sequence between people with and without a certain disease.

I have spent most of my scientific career in Approach 1).  What I have done a lot in the past is to construct approximate solutions to dynamical systems and then compare them to numerical simulations.  Thus, I never  had to worry too much about data and statistics or even real phenomena itself.  In fact, even when I first moved into biology in the early nineties, I still did mostly the same thing. (The lone exception was my work on posture control, which did involve paying attention to data). Computational neuroscience is a mature enough field that one can focus exclusively on analyzing existing models.    I started moving more towards Approach 2) when I began studying localized persistent neural activity or bumps.  My first few papers on the subject were mostly analyzing models but there was a more exploratory nature to them than my previous work.  Instead of trying to explicitly compute a quantity, it was more about exploring what networks of neurons can do.  The work on binocular rivalry and visual competition were attempts to explain a cognitive phenomenon using the constraints imposed by the properties of neurons and synapses. However, I was still only trying to explain the data qualitatively.

That changed when I started my work on modeling the acute inflammatory response.  Now, I was just given data with very few biological constraints. I basically took what the immunologists told me and constructed the simplest model possible that could account for the data.  Given that my knowledge of statistics was minimal, I simply used the “eye test” as a basis of whether or not the model worked or not.   The model somehow fit the data and did bring insights to the phenomenon but it was not done in a  systematic way.  When I arrived at NIH, I was introduced to Bayesian inference and this really opened my eyes.  I realized that when one doesn’t have strong biological or physical constraints, Approach 2) is not that useful.  It is easy to cobble together a system of differential equations to explain any data.  This is how I ended up moving more towards Approach 3). Instead of just coming up with some set of ODEs that can explain the data, what we did was to explore classes of models that could explain a given set of data and use Bayesian model comparison to decide which was better.  This approach was used in the work on quantifying insulin’s effect on free fatty acid dynamics.  While that work involved some elements of Approach 2) in that we utilized some constraints, my work on protein sequences is almost all within Approach 3).  The work on obesity and body weight change involves all three Approaches. The conservation of energy and the vast separation of time scales put a lot of strong constraints on the dynamics so one can get surprisingly far using Approach 1) and 2).

When I was younger, some my fellow graduate students would lament that they missed out on the glory days of the 1930’s when quantum mechanics was discovered.  It is true that when a field matures, it starts to move from Approach 3) to 2) and 1).  Theoretical physics is almost exclusively in 1) and 2). Even string theory is basically all in Approach 1) and 2).  They are trying to explain all the known forces using the constraints of quantum mechanics and general relativity.   The romantic periods of physics involved Approach 3).  There was Galileo, Kepler and Newton inventing classical mechanics. Lavoisier, Carnot, Thompson and so forth coming up with conservation laws and thermodynamics. Faraday and Maxwell defining electrodynamics. Einstein invented the “Thought experiment” version of Approach 3) to dream up Special and General Relativity.  The last true romantic period  in physics was the invention of quantum mechanics.  Progress since then has basically been in Approaches 1) and 2).  However,  Approach 3) is alive and well in biology and data mining. The theoretical glory days of these fields might be now.

Talk in Marseille

I just returned from an excellent meeting in Marseille. I was quite impressed by the quality of talks, both in content and exposition. My talk may have been the least effective in that it provoked no questions. Although I don’t think it was a bad talk per se, I did fail to connect with the audience. I kind of made the classic mistake of not knowing my audience. My talk was about how to extend a previous formalism that much of the audience was unfamiliar with. Hence, they had no idea why it was interesting or useful. The workshop was on mean field methods in neuroscience and my talk was on how to make finite size corrections to classical mean field results. The problem is that many of the participants of the workshop don’t use or know these methods. The field has basically moved on.

In the classical view, the mean field limit is one where the discreteness of the system has been averaged away and thus there are no fluctuations or correlations. I have been struggling over the past decade trying to figure out how to estimate finite system size corrections to mean field. This led to my work on the Kuramoto model with Eric Hildebrand and particularly Michael Buice. Michael and I have now extended the method to synaptically coupled neuron models. However, to this audience, mean field pertains more to what is known as the “balanced state”. This is the idea put forth by Carl van Vreeswijk and Haim Sompolinsky to explain why the brain seems so noisy. In classical mean field theory, the interactions are scaled by the number of neurons N so in the limit of N going to infinity the effect of any single neuron on the population is zero. Thus, there are no fluctuations or correlations. However in the balanced state the interactions are scaled by the square root of the number of neurons so in the mean field limit the fluctuations do not disappear. The brilliant stroke of insight by Carl and Haim was that a self consistent solution to such a situation is where the excitatory and inhibitory neurons balance exactly so the net mean activity in the network is zero but the fluctuations are not. In some sense, this is the inverse of the classical notion. Maybe it should have been called “variance field theory”. The nice thing about the balanced state is that it is a stable fixed point and no further tuning of parameters is required. Of course the scaling choice is still a form of tuning but it is not detailed tuning.

Hence, to the younger generation of theorists in the audience, mean field theory already has fluctuations. Finite size corrections don’t seem that important. It may actually indicate the success of the field because in the past most computational neuroscientists were trained in either physics or mathematics and mean field theory would have the meaning it has in statistical mechanics. The current generation has been completely trained in computational neuroscience with it’s own canon of common knowledge. I should say that my talk wasn’t a complete failure. It did seem to stir up interest in learning the field theory methods we have developed as people did recognize it provides a very useful tool to solve the problems they are interested in.

Addendum 2011-11-11

Here are some links to previous posts that pertain to the comments above.

Action on a whim

One of the big news stories last week was the publication in Science on the genomic sequence of a hundred year old  Aboriginal Australian.  The analysis finds that the Aboriginal Australians are descendants of an early migration to Asia between 62,000 and 75,000 years ago and this migration is different from the one that gave rise to modern Asians 25,000 to 38,000 years ago.  I have often been amazed that humans were able to traverse over harsh terrain and open water into the complete unknown.  However, I briefly watched a documentary  on CNBC last night about Apocalypse 2012 that made me understand this much better.  Evidently, there is a fairly large group of people who believe the world will end in 2012.  (This is independent of the group that thought the world would end earlier this year.)  The prediction is based on the fact that a large cycle in the Mayan calendar will supposedly end in 2012.  According to some of the believers, the earth’s rotation will reverse and that will cause massive earthquakes and tsunamis.  These believers have thus managed to recruit followers and start building colonies in the mountains to try to survive.  People are taking this extremely seriously.  I think this ability to change the course of one’s entire life on the flimsiest of evidence is what led our ancestors to leave Africa and head into the unknown.  People will get ideas in their head and nothing will stop them from pursuing them.  It’s what led us to populate every corner of the world and reshape much of the surface of the earth.  It also suggests that the best optimization algorithms that seek a global maximum may be ones that have some ‘momentum’ so that they can leave local maxima and head downhill to find higher peaks elsewhere.


Infinite growth on finite resources

At this past summer’s Lindau meeting of Nobel Laureates, Christian Rene de Duve, who is over 90 years old, gave a talk on population growth and its potential dire effects on the world.  Part of his talk was broadcast on the Science Show.  His talk prompted me to think more about growth.  The problem is not that the population is growing per se.  Even if the population were stable, we would still eventually run out of fossil fuels if we consume energy at the same rate.  The crucial thing is that we must progressively get more efficient.  For example, consider a steady population where we consume some finite resource at the rate of t^\alpha.  Then so long as \alpha < -1, we can make that resource last forever since \int_1^\infty t^\alpha is finite.  Now, if the population is growing exponentially then we would have to become exponentially more efficient with time to make the resource last.  However, making the world more efficient will take good ideas and skilled people to execute them and that will scale with the population.  So there might be some optimal growth rate where we ensure the idea generation rate is sufficient to increase efficiency so that we can sustain forever.

Globalization and income distribution

In the past two decades we have seen both an increase in globalization and income inequality. The question is whether the two are directly or indirectly related.  The GDP of the United States is about 14 trillion dollars, which works out to be about 45 thousand per person.  However, the median household income, which is about 50 thousand dollars per household, has not increased over this time period and has even dropped a little this year. The world GDP is approximately 75 trillion dollars (in terms of purchase power parity), which is an amazingly high 11 thousand per person per year given that over a billion people live on under two dollars a day. Thus one way to explain the decline in median income is that the US worker is now competing in a world where per capita GDP has effectively been reduced by a factor of four.  However, does this also explain the concurrent increase in wealth at the top of the income distribution.

I thought I would address this question with an extremely simple income distribution model called the Pareto distribution.  It simply assumes that incomes are distributed according to a power law with a lower cutoff: P(I) = \alpha A I^{-1-\alpha}, for I>L, where A is a normalization constant. Let’s say the population size is N and the GDP is G. Hence, we have the conditions \int_L^\infty P(I) dI = N and \int_L^\infty IP(I) dI = G.  Inserting the Pareto distribution gives the following conditions N=AL^{-\alpha} and G = \alpha N L/(1-\alpha), or A = NL^\alpha and L=(\alpha-1)/\alpha (G/N).   The Pareto distribution is thought to be valid mostly for the tail fo the income distribution so L should only be thought of as an effective minimum income.  We can now calculate the income threshold for the top  1% say.  This is given by the condition F(H) = N-\int_L^H P(I) dI = 0.01N, which results in (L/H)^\alpha=0.01 or  H = L/0.01^{1/\alpha}. For \alpha = 2 then the 99 percentile income threshold is about two hundred thousand dollars, which is a little low, implying that \alpha is less than two.  However, the crucial point is that H scales with the average income G/N.  The median income would have the same scaling, which clearly goes against recent trends where median incomes have stagnated while top incomes have soared.  What this implies is that the top end obeys a different income distribution from the rest of us.


If I had to compress everything that ails us today into one word it would be correlations.  Basically, everything bad that has happened recently from the financial crisis to political gridlock is due to undesired correlations.  That is not to say that all correlations are bad. Obviously, a system without any correlations is simply noise.  You would certainly want the activity on an assembly line in a factory to be correlated. Useful correlations are usually serial in nature like an invention leads to a new company.  Bad correlations are mostly parallel like all the members in Congress voting exclusively along party lines, which reduces an assembly with hundreds of people into just two. A recession is caused when everyone in the economy suddenly decides to decrease spending all at once.  In a healthy economy, people would be uncorrelated so some would spend more when others spend less and the aggregate demand would be about constant. When people’s spending habits are tightly correlated and everyone decides to save more at the same time then there would be less demand for goods and services in the economy so companies must lay people off resulting in even less demand leading to a vicious cycle.

The financial crisis that triggered the recession was due to the collapse of the housing bubble, another unwanted correlated event.  This was exacerbated by collateralized debt obligations (CDOs), which  are financial instruments that were doomed by unwanted correlations.  In case you haven’t followed the crisis, here’s a simple explanation. Say you have a set of loans where you think the default rate is 50%. Hence, given a hundred mortgages, you know fifty will fail but you don’t know which. The way to make a triple A bond out of these risky mortgages is to lump them together and divide the lump into tranches that have different seniority (i.e. get paid off sequentially).  So the most senior tranche will be paid off first and have the highest bond rating.  If fifty of the hundred loans go bad, the senior tranche will still get paid. This is great as long as the mortgages are only weakly correlated and you know what that correlation is. However, if the mortgages fail together then all the tranches will be bad.  This is what happened when the bubble collapsed. Correlations in how people responded to the collapse made it even worse.  When some CDOs started to fail, people panicked collectively and didn’t trust any CDOs even though some of them were still okay. The market for CDOs became frozen so people who had them and wanted to sell them couldn’t even at a discount. This is why the federal government stepped in.  The bail out was deemed necessary because of bad correlations.  Just between you and me, I would have let all the banks just fail.

We can quantify the effect of correlations in a simple example, which will also show the difference between sample mean and population mean. Let’s say you have some variable x that estimates some quantity. The expectation value (population mean) is \langle x \rangle = \mu.  The variance of x, \langle x^2 \rangle - \langle x \rangle^2=\sigma^2 gives an estimate of the square of the error. If you wanted to decrease the error of the estimate then you can take more measurements. So let’s consider a sample of n measurements.  The sample mean is (1/n)\sum_i^n x_i . The expectation value of the sample mean is  (1/n)\sum_i \langle x_i \rangle = (n/n)\langle x \rangle = \mu. The variance of the sample mean is

\langle [(1/n)\sum_i x_i]^2 \rangle - \langle x \rangle ^2 = (1/n^2)\sum_i \langle x_i^2\rangle + (1/n^2) \sum_{j\ne k} \langle x_j x_k \rangle - \langle x \rangle^2

Let C=\langle (x_j-\mu)(x_k-\mu)\rangle be the correlation between two measurements. Hence, \langle x_j x_k \rangle = C +\mu^2. The variance of the sample mean is thus \frac{1}{n} \sigma^2 + \frac{n-1}{n} C.  If the measurements are uncorrelated (C=0) then the variance is \sigma^2/n, i.e. the standard deviation or error is decreased by the square root of the number of samples.  However, if there are nonzero correlations then the error can only be reduced to the amount of correlations C.  Thus, correlations give a lower bound in the error on any estimate.  Another way to think about this is that correlations reduce entropy and entropy reduces information.  One way to cure our current problems is to destroy parallel correlations.



Many of my recent posts centre around the concept of computability so I thought I would step back today and give a review of the topic for the uninitiated.  Obviously, there are multiple text books on the topic so I won’t be bringing anything new.  However, I would like to focus on one small aspect of it that is generally glossed over in books.  The main thing to take away from computability for me is that it involves functions of integers. Essentially, a computation is something that maps an integer to another integer.  The Church-Turing thesis states that all forms of computation are equivalent to a Turing machine.  Hence, the lambda calculus, certain cellular automata, and your MacBook have the same computational capabilities as a Turing machine (actually your MacBook is finite so it has less capability but it could be extended arbitrarily to be Turing complete). The thesis cannot be formally proven but it seems to hold.

The fact that computation is about manipulation of integers has profound consequences, the main one being that it cannot deal directly with real numbers.  Or to put it another way, computation is constrained to countable processes.  If anything requires an uncountable number of operations then it is uncomputable. However, uncomputability or undecidability as it is often called, is generally not presented in such a simple way.  In many popular books like Godel, Escher, Bach, the emphasis is on the magical aspect of it.  The reason is that the proof on uncomputability, which is similar to Godel’s proof  of the First Incompleteness Theorem, relies on demonstrating that a certain self-referential function or program cannot exist by use of Cantor’s diagonal slash argument that the reals are uncountable.  In very simple nonrigorous terms, the proof works by considering a list of all possible computable functions f_i(j) on all the integers j. This is the same as saying you have a list of all possible Turing machines i, of all possible initial states j. Now you suppose that one of the functions f_a(j) takes the output of function j given input j and puts a negative sign in front.  So f_a(j)= -f_j(j).  The problem then comes if you suppose that the function acts on itself because then f_a(a)=-f_a(a), which is a contradiction and thus such a computable function f_a cannot exist.

Continue reading

New paper in The Lancet

The Lancet has just published a series of articles on obesity.  They can be found here.  I am an author on the third paper, which covers the work Kevin Hall and I have been working on for the past seven years.  There was a press conference in London yesterday that Kevin attended and there is a symposium today.  The announcement has since been picked up in the popular press.  Here are some samples:  Science Daily, Mirror, The Australian, and The Chart at CNN.