Is irrationality necessary?

Much has been made lately of the anti-science stance of a large segment of the US population. (See for example Chris Mooney’s book). The acceptance of anthropomorphic climate change or the theory of evolution is starkly divided by political inclinations. However, as I have argued in the past, seemingly irrational behavior can actually make sense from an evolutionary perspective. As I have posted on before, one of the best ways to find an optimal solution to a problem is to search randomly, the Markov Chain Monte Carlo method being the quintessential example. Randomness is useful for searching in places you wouldn’t normally go and in overcoming unwanted correlations, which I recently attributed to most of our current problems (see here). Thus, we may have been evolutionarily selected to have diverse viewpoints and degrees of rational thinking. Given some situation, there is only one rationally optimal response and in the case of incomplete information, which is almost always true, it could be wrong. Thus, when a group of individuals is presented with a challenge, it may be more optimal for the group if multiple strategies, including irrational ones, are tried rather than putting all the eggs into one rational basket. I truly doubt that Australia could have been discovered 60 thousand years ago without some irrationally risky decisions. Even within science, people pursue ideas based on tenuous hunches all the time. Many great discoveries were made because people ignored conventional rational wisdom and did something irrational. Many have failed as a result as well. However, society as a whole is arguably better since generally success goes global while failure stays local.

It is not even necessary to have great differences in cognitive abilities to produce a wide range in rationality. One only needs to have a reward system that is stimulated by a wide range of signals.  So while some children are strongly rewarded by finding self-consistent explanations to questions others are rewarded by acting rashly. Small initial differences would then amplify over time as the children seek environments that maximize their rewards. Sam Wang and Sandra Aamodt covered this in their book, Welcome to Your Brain. Thus you would end up with a society with a wide variety of rationality.

 

 

Talk today at Johns Hopkins

I’m giving a computational neuroscience lunch seminar today at Johns Hopkins.  I will be talking about my work with Michael Buice, now at the Allen Institute, on how to go beyond mean field theory in neural networks. Technically, I will present our recent work on computing correlations in a network of coupled neurons systematically with a controlled perturbation expansion around the inverse network size. The method uses ideas from kinetic theory with a path integral construction borrowed and adapted by Michael from nonequilibrium statistical mechanics.  The talk is similar to the one I gave at MBI in October.  Our paper on this topic will appear soon in PLoS Computational Biology. The slides can be found here.

Von Neumann’s response

Here’s Von Neumann’s response to straying from pure mathematics:

“[M]athematical ideas originate in empirics, although the genealogy is sometimes long and obscure. But, once they are so conceived, the subject begins to live a peculiar life of its own and is better compared to a creative one, governed by almost entirely aesthetic considerations, than to anything else, and, in particular, to an empirical science. There is, however, a further point which, I believe, needs stressing. As a mathematical discipline travels far from its empirical source, or still more, if it is a second and third generation only indirectly inspired by ideas coming from ‘reality’, it is beset with very grave dangers. It becomes more and more purely aestheticising, more and more purely l’art pour l’art. This need not be bad, if the field is surrounded by correlated subjects, which still have closer empirical connections, or if the discipline is under the influence of men with an exceptionally well-developed taste. But there is a grave danger that the subject will develop along the line of least resistance, that the stream, so far from its source, will separate into a multitude of insignificant branches, and that the discipline will become a disorganised mass of details and complexities. In other words, at a great distance from its empirical source, or after much ‘abstract’ inbreeding, a mathematical subject is in danger of degeneration.”

Thanks to James Lee for pointing this out.

Complexity is the narrowing of possibilities

Complexity is often described as a situation where the whole is greater than the sum of its parts. While this description is true on the surface, it actually misses the whole point about complexity. Complexity is really about the whole being much less than the sum of its parts. Let me explain. Consider a television screen with 100 pixels that can be either black or white. The number of possible images the screen can show is 2^{100}. That’s a really big number. Most of those images would look like random white noise. However, a small set of them would look like things you recognize, like dogs and trees and salmon tartare coronets. This narrowing of possibilities, or a reduction in entropy to be more technical, increases information content and complexity. However, too much reduction of entropy, such as restricting the screen to be entirely black or white, would also be considered to have low complexity. Hence, what we call complexity is when the possibilities are restricted but not completely restricted.

Another way to think about it is to consider a very high dimensional system, like a billion particles moving around. A complex system would be if the attractor of this six billion dimensional system (3 for position and 3 for velocity of each particle), is a lower dimensional surface or manifold.  The flow of the particles would then be constrained to this attractor. The important thing to understand about the system would then not be the individual motions of the particles but the shape and structure of the attractor. In fact, if I gave you a list of the positions and velocities of each particle as a function of time, you would be hard pressed to discover that there even was a low dimensional attractor. Suppose the particles lived in a box and they moved according to Newton’s laws and only interacted through brief elastic collisions. This is an ideal gas and what would happen is that the motions of the positions of the particles would be uniformly distributed throughout the box while the velocities would obey a Normal distribution, called a Maxwell-Boltzmann distribution in physics. The variance of this distribution is proportional to the temperature. The pressure, volume, particle number and temperature will be related by the ideal gas law, PV=NkT, with the Boltzmann constant set by Nature. An ideal gas at equilibrium would not be considered complex because the attractor is a simple fixed point. However, it would be really difficult to discover the ideal gas law or even the notion of temperature if one only focused on the individual particles. The ideal gas law and all of thermodynamics was discovered empirically and only later justified microscopically through statistical mechanics and kinetic theory. However, knowledge of thermodynamics is sufficient for most engineering applications like designing a refrigerator. If you make the interactions longer range you can turn the ideal gas into a liquid and if you start to stir the liquid then you can end up with turbulence, which is a paradigm of complexity in applied mathematics. However, the main difference between an ideal gas and turbulent flow is the dimension of the attractor. In both cases, the attractor dimension is still much smaller than the full range of possibilities.

The crucial point is that focusing on the individual motions can make you miss the big picture. You will literally miss the forest for the trees. What is interesting and important about a complex system is not what the individual constituents are doing but how they are related to each other. The restriction to a lower dimensional attractor is manifested by the subtle correlations of the entire system. The dynamics on the attractor can also often be represented by an “effective theory”. Here the use of the word “effective” is not to mean that it works but rather that the underlying microscopic theory is superseded by a macroscopic one. Thermodynamics is an effective theory of the interaction of many particles. The recent trend in biology and economics had been to focus on the detailed microscopic interactions (there is push back in economics in what has been dubbed the macro-wars). As I will relate in future posts, it is sometimes much more effective (in the works better sense) to consider the effective (in the macroscopic sense) theory than a detailed microscopic theory. In other words, there is no “theory” per se of a given system but rather sets of effective theories that are to be selected based on the questions being asked.

Von Neumann

Steve Hsu has a link to a fascinating documentary on John Von Neumann. It’s definitely worth watching.  Von Neumann is probably the last great polymath. Mathematician Paul Halmos laments that Von Neumann perhaps wasted his mathematical gifts by spreading himself too thin. He worries that Von Neumann will only be considered a minor figure in pure mathematics several hundred years hence. Edward Teller believes that Von Neumann simply enjoyed thinking above all else.

News as entertainment

This is an obvious observation but it struck me one day while watching the evening national news on multiple channels that the only thing that can differentiate between different programs cannot be news because true news is reported by everyone by definition. Half of all the news programs are identical and the other half covers random human interest stories. Given that the true news will have likely to have taken place earlier in the day, the actual news stories will not be novel. This clearly indicates that the national nightly news is doomed for extinction. Based on the commercials aired during the programs, the average demographic are senior citizens. Once they are gone, so will the nightly news, “and that’s the way it is“.

Economic growth and reversible computing

In my previous post on the debt, a commenter made the important point that there are limits to economic growth. USCD physicist Tom Murphy has some thoughtful posts on the topic (see here and here). If energy use scales with economic activity then there will be a limit to economic growth because at some point we will use so much energy that the earth will boil to use Murphy’s metaphor. Even if we become energy efficient, if the rate of increase in efficiency is slower than the rate of economic growth, then we will still end up boiling. While I agree that this is true given the current state of economic activity and for the near future, I do wish to point out that it is possible to have indefinite economic growth and not use any more energy. As pointed out by Rick Bookstaber (e.g. see here), we are limited in how much we can consume because we are finite creatures. Thus, as we become richer, much of our excess wealth goes not towards increase consumption but the quality of that consumption. For example, the energy expenditure of an expensive meal prepared by a celebrity chef is not more than that from the local diner. A college education today is much more expensive than it was forty years ago without a concomitant increase in energy use. In some sense, much of modern real economic growth is effective inflation. Mobile phones have not gotten cheaper over the past decade because manufacturers keep adding more features to justify the price. We basically pay more for augmented versions of the same thing. So while energy use will increase for the foreseeable future, especially as the developed world catches up, it may not increase as fast as current trends.

However, the main reason why economic growth could possibly continue without energy growth is that our lives are becoming more virtual. One could conceivably imagine a future world in which we spend almost all of our day in an online virtual environment. In such a case, beyond a certain baseline of fulfilling basic physical needs of nutrition and shelter, all economic activity could be digital. Currently computers are quite inefficient. All the large internet firms like Google, Amazon, and Facebook require huge energy intensive server farms. However, there is nothing in principle to suggest that computers need to use energy at all. In fact, all computation can be done reversibly. This means that it is possible to build a computer that creates no entropy and uses no energy. If we lived completely or partially in a virtual world housed on a reversible computer then economic activity could increase indefinitely without using more energy. However, there could still be limits to this growth because computing power could be limited by other things such as storage capacity and relativistic effects. At some point the computer may need to be so large that information cannot be moved fast enough to keep up or the density of bits could be so high that it creates a black hole.

Debt is relative

One of the memes that is dominating American political discourse is that the US national debt is too high. The debt is currently around 16 trillion dollars which exceeds the current GDP of 15 trillion. This may seem large but keep in mind that the debt-to-GDP ratio in Japan is over two. The federal government will bring in about 2.6 trillion in revenues this year but spend about 3.7 trillion dollars, giving us an annual deficit of a trillion dollars. However, our borrowing costs are also very low. The yield on a 10 year Treasury bond is under 2% and the 30 year yield is under 3%. The general fear of having large debt is that it may cause interests rates to rise because people fear a default. This then leads to higher borrowing costs until you get to a point where you can never repay the debt. This is where Greece is now and where Spain, Portugal, Ireland, and Italy may be heading.

However, an important fact that should be kept in mind when thinking about debt is that the absolute amount is irrelevant. This is because economics, like biology, is about flux and growth.  As long as the nominal GDP growth rate (real GDP growth plus inflation) exceeds the borrowing rate, then the debt ratio will shrink in the future.  In fact the power of exponential growth shows that you can always be in deficit and the debt to GDP ratio can shrink.  We can see this in a simple calculation.  Let D be the debt, I be the annual deficit, and y the borrowing rate.  The debt then grows as\dot{D}=I+y D, which has the solution D(t)=(I+D(0))e^{yt}/y-I/y.  Now suppose that the nominal GDP G grows with rate r so G(t) = G(0)e^{rt}.  So in the short term deficits do matter but as long as r > y, the debt-to-GDP ratio will always shrink in the long run.  In fact, this is what happened after World War II.  The debt was never officially retired.  It just faded away into insignificance because of growth and inflation.  Given the low interest rates, there is an arbitrage opportunity to borrow as much as possible and invest the money in infrastructure to promote future growth.

Prediction requires data

Nate Silver has been hailed in the media as a vindicated genius for correctly predicting the election. He was savagely attacked before the election for predicting that Obama would win handily. Kudos also go to Sam Wang, Pollester.com, electoral-vote.com, and all others who simply took the data obtained from the polls seriously. Hence, the real credit should go to all of the polling organizations for collectively not being statistically biased.  It didn’t matter if single organizations were biased one way or the other as long as they were not correlated in their biases. The true power of prediction in this election was that the errors of the various pollsters were independently distributed. However, even if you didn’t take the data at face value, you could still reasonably predict the election. Obama had an inherent advantage because he had more paths to winning 270 electoral votes. Suppose there were 8 battleground states and Romney needed to win at least 6 of them. Hence, Romney had 28 ways to win while Obama had 228 ways to win. If the win probability was approximately a half in each of these states, which is what a lot of people claimed,  then Romney has slightly more than one in ten chance of winning, which is close to the odds given by Sam. The only way Romney’s odds would increase is if the state results were correlated in his favour. However, it would take a lot of correlated bias to predict that Romney was a favourite.

 

Erratum, Nov 9 2012:  Romney actually has 37 ways and Obama 219 in my example.  The total must add up to 2^8=256.  I forgot to include the fact that Romney could also win 7 of 8 states or all states in his paths to winning.

Predicting the election

The US presidential election on Nov. 6 is expected to be particularly close. The polling has been vigorous and there are many statistical prediction web sites. One of them, the Princeton Election Consortium, is run by neuroscientist Sam Wang at Princeton University. For any non-American readers, the president is not elected directly by the citizens but through what is called the electoral college.  This is a set of 538 electoral voters that are selected by the individual states. The electoral votes are allotted according to the number of congressional districts per state plus two. Hence, low population states are over-represented. Almost all states agree that the candidate that takes the plurality of the votes in that state wins all the electoral votes of that state. Maine and Nebraska are the two exceptions that allot electoral votes according to who wins the congressional district. Thus in order to predict who will win, one must predict who will get at least 270 electoral votes. Most of the states are not competitive so the focus of the candidates (and media) are on a handful of so-called battleground states like Ohio and Colorado. Currently, Sam Wang predicts that President Obama will win the election with a median of 319 votes. Sam estimates the Bayesian probability for Obama’s re-election to be 99.6%. Nate Silver at another popular website (Five Thirty Eight), predicts that Obama will win 305 electoral votes and has a re-election probability of 83.7%.

These estimates are made by using polling data with a statistical model. Nate Silver uses national and state polls along with some economic indicators, although the precise model is unknown. Sam Wang uses only state polls. I’ll describe his method here. The goal is to estimate the probability distribution for the number of electoral votes a specific candidate will receive. The state space consists of 2^{51} possibilities (50 states plus the District of Columbia). I will assume that Maine and Nebraska do not split their votes along congressional districts although it is a simple task to include that possibility. Sam assumes that the individual states are statistically independent so that the joint probability distribution factorizes completely. He then takes the median of the polls for each state over some time window to represent the probability of that given state. The polling data is comprised of the voting preferences of a sample for a given candidate. The preferences are converted into probabilities using a normal distribution. He then computes the probability for all 2^{51} combinations. Suppose that there are just two states with win probabilities for your candidate of p_1 and p_2. The probability of your candidate winning both states is p_1 p_2, state 1 but not state 2 is p_1(1-p_2), and so forth.  If the states have EV_1 and EV_2 electoral votes respectively then if they win both states they will win EV_1+EV_2 votes and so forth. To keep the bookkeeping simple, Sam uses the trick of expressing the probability distribution as a polynomial of a dummy variable x.  So the probability distribution is

(p_1 x^{EV_1} + 1-p_1)(p_2 x^{EV_2} + 1-p_2)

= p_1 p_2 x^{EV_1+EV_2} + p_1(1-p_2) x^{EV_1} + (1-p_1)p_2 x^{EV_2} + (1-p_1)(1-p_2)

Hence, the coefficient of each term is the probability for the number of electoral votes given by the exponent of x.The expression for 51 “states” is  \prod_{i=1}^{51} (p_i x^{EV_i} + 1-p_i) and this can be evaluated quickly on a desktop computer. One can then take the median or mean of the distribution for the predicted  number of electoral votes. The sum of the probabilities for electoral votes greater than 269 gives the winning probability, although Sam uses a more sophisticated method for his predicted probabilities. The model does assume that the probabilities are independent.  Sam tries to account for this by using what he calls a meta-margin, in which he calculates how much the probabilities (in terms of preference) need to move for the leading candidate to lose. Also, the state polls will likely pick up any correlations as the election gets closer.

Most statistical models predict that Obama will be re-elected with fairly high probability but the national polls are showing that the race is almost tied. This discrepancy is a puzzle.  Silver’s hypothesis for why is here and Sam’s is here.  One of the sources for error in polls is that they must predict who will vote.  The 2008 election had a voter turnout of a little less than 62%. That means that an election can be easily won or lost based on turnout alone, which makes one wonder about democracy.

 

Nov 4: dead link is updated

Weather prediction

I think it was pretty impressive how accurate the predictions for Superstorm Sandy were up to a week ahead.  The hurricane made the left hand turn from the Atlantic into New Jersey just as predicted.  I don’t think the storm could have been hyped any more than it was.  The east coast was completely devastated but at least we did have time to prepare.  The weather models  have gotten much better from even ten years ago. The storm also shows just how vulnerable the east coast is to a 14 foot storm surge.  I can’t imagine what a 20 foot surge would do to New York.

Time constants in the economy

I was struck recently by a figure that economist Paul Krugman posted on his blog of the recovery time from recessions.

The interesting point to me was that there seems to be some consistency in the rate of recovery from different types of recessions. In other words, there are fixed time constants in the economy. A recession is defined as a period with negative GDP growth rate, i.e. it is a time when the economy shrinks.  Generally, the US has been growing a few percentage points a year in real terms for the past several decades. This is interrupted by occasional recessions such as a severe one in 1980-81 and the most recent Great Recession of 2008-2009.

However, not all recessions are created equal and economists have made a distinction between those caused by disinflation and financial crises.  An example scenario for a disflationary recession is that the economy is initially overheated so while there may be lots of growth there is also lots of inflation. The causes of inflation are complicated but they are sometimes linked to the interest rate. When rates are low, it is cheap to borrow, so people can acquire more money to spend and too much money chasing too few goods leads to inflation. A recession can then be induced by interest rates increasing, either through direct action by the Federal Reserve Bank or some exogenous factor, which makes it more expensive to borrow money and also incentivizes saving. This reduces the money supply.  This is what happened in the 1980-81 recession.  Inflation was extremely high in the 1970’s so Fed chairman Paul Volker dramatically increased interest rates. This induced a recession and also curbed inflation.  How the Fed controls interest rates is extremely interesting and something I may post about in the future. A recession can also be caused by no apparent external event if people simply decide to decrease spending all at once. A beautiful example is given by the famous story of a babysitting coop (see here). In a disinflationary recession, the economy can start growing if you can get people to start spending again. This can be done by lowering the interest rate or through a fiscal stimulus plan where the government starts to spend more. The time constant for recovery will be about the time it takes for people who lost jobs to find new ones and this is usually less than two years.

Recessions due to financial crises are generally preceded by a financial bubble where some asset, such as real estate, increases dramatically in price and then people, companies and banks take on more and more debt to try to make money by speculating on this asset. This is what happened in the run up to the Great Recession and the Japanese financial crisis in the 1990’s. When the bubble finally bursts, people are left with lots of debt and little money  to spend, thereby inducing a recession. In the case of real estate, the debt is in the form of mortgages, which are usually long term. The time constant will be the average duration of the mortgage or the time it takes to refinance. In both cases, this will take longer than two years. Thus, the recession will persist until people can pay off or unload their debts and both are difficult when the economy is depressed. It also shows why monetary policy may have little effect. Lowering interest rates can’t directly help the people trapped in long-term mortgages. However, if the interest rates can be kept low enough and long enough to induce some inflation then house prices increase while effective debt decreases so people can sell their homes or refinance and get more money to spend. This is basically what current Fed chairman Ben Bernanke is trying to do.  Another option would be for the federal government to take advantage of low interest rates and start buying property. They could have started a buy-and-lease program where underwater homeowners could sell their homes to the government and then rent it back from them. This would keep people in their homes, bolster the economy and also ensure that people who made bad decisions during the bubble do not profit from their mistakes. When house prices rise again, the government would pocket the profits.

Using formal logic in biology

The 2012 Nobel Prize in physiology or medicine went to John Gurdon and Shinya Yamanaka for turning mature cells into stem cells. Yamanaka shook the world just six years ago in a Cell paper (it can be obtained here) that showed how to reprogram adult fibroblast cells into pluripotent stem cells (iPS cells) by simply inducing four genes – Oct3/4, Sox2, c-Myc, and Klf4.  Although he may not frame it this way, Yamanaka arrived at these four genes by applying a simple theorem of formal logic, which is that a set of AND conditions is equivalent to negations of OR conditions.  For example, the statement A AND B  is True is the same as Not A OR Not B is False.  In formal logic notation you would write A \wedge B = \neg(\neg A \vee \neg B).  The problem then is given that we have about 20,000 genes, what subset of them will turn an adult cell into an embryonic-like stem cell. Yamanaka first chose 24 genes that are known to be expressed in stem cells and inserted them into an adult cell. He found that this made the cell pluripotent. He then wanted to find a smaller subset that would do the same. This is where knowing a little formal logic goes a long way. There are 2^{24} possible subsets that can be made out of 24 genes so trying all combinations is impossible. What he did instead was to run 24 experiments where each gene is removed in turn and then checked to see which cells were not viable. These would be the necessary genes for pluripotency.  He found that  pluripotent stem cells never arose when either Oct3/4, Sox2, c-Myc or Klf4 were missing. Hence, a pluripotent cell needed all four genes and when he induced them, it worked. It was a positively brilliant idea and although I have spoken out against the Nobel Prize (see here), this one is surely deserved.

2016-1-20:  typo corrected.

Revised SDE and path integral paper

At the MBI last week, I gave a tutorial on using path integrals to compute moments of stochastic differential equations perturbatively.  The slides are the same as the tutorial I gave a few years ago (see here).  I slightly modified the review paper that goes with the talk. I added the explicit computation for the generating functional of the complex Gaussian PDF. The new version can be found here.

More on health care costs

I posted previously that the rising cost of health care may not be a bad thing if it ends up providing jobs for the bulk of the population.  The Economist magazine blog Free Exchange had an interesting piece on how health care can become both more expensive and more affordable simultaneously. The argument comes from William Baumol of Baumol’s cost disease, (of which I posted on previously here). In simple terms, Baumol’s argument is that as society gets more productive and richer the salaries of everyone goes up including those in professions, like art and health care, where productivity does not increase. Now, given that the bulk of costs of most sectors are salaries, productivity increases generally imply decreases in the number of people in that economic sector. At current rates of growth, health care expenditures will be 60% of US GDP by 2105. However, as long as the economy as a whole grows faster than the rate of increase in health care costs then we will still have plenty leftover to buy more of everything else. If we make the simple assumption that contribution to GDP is proportional to population then an increase in health care’s share of GDP simply means that the share of the population working in health care is also increasing. Basically, at current rates of growth, we will all become health care workers. I don’t think there is anything intrinsically wrong with this.  How a nation’s wealth is distributed among its population is more important than how it is distributed among sectors.

 

Complete solutions to life’s little problems

One of the nice consequence of the finiteness of human existence is that there can exist complete solutions to some of our problems.  For example, I used to leave the gasoline (petrol for non-Americans) cap of my car on top of the gas pump every once in a while.  This has now been completely solved by the ludicrously simple solution of tethering the cap to the car.  I could still drive off with the gas cap dangling but I wouldn’t lose it.  The same goes for locking myself out of my car.  The advent of remote control locks has also eliminated this problem.  Because human reaction time is finite, there is also an absolute threshold for internet bandwidth above which the web browser will seem instantaneous for loading pages and simple computations.  Given our finite lifespan, there is also a threshold for the amount of disk space required to store every document, video, and photo we will ever want.  The converse is that are also more books in existence than we can possibly read in a life time although there will always be just a finite number of books by specific authors that we may enjoy.  I think one strategy for life is to make finite as many things as possible because then there is a chance for a complete solution.