The probability of extraterrestrial life

Since, the discovery of exoplanets nearly 3 decades ago most astronomers, at least the public facing ones, seem to agree that it is just a matter of time before they find signs of life such as the presence of volatile gases in the atmosphere associated with life like methane or oxygen. I’m an agnostic on the existence of life outside of earth because we don’t have any clue as to how easy or hard it is for life to form. To me, it is equally possible that the visible universe is teeming with life or that we are alone. We simply do not know.

But what would happen if we find life on another planet. How would that change our expected probability for life in the universe? MIT astronomer Sara Seager once made an offhand remark in a podcast that finding another planet with life would make it very likely there were many more. But is this true? Does the existence of another planet with life mean a dramatic increase in the probability of life in the universe. We can find out by doing the calculation.

Suppose you believe that the probability of life on a planet is $f$ (i.e. fraction of planets with life) and this probability is uniform across the universe. Then if you search $n$ planets, the probability for the number of planets with life you will find is given by a Binomial distribution. The probability that there are $x$ planets is given by the expression $P(x | f) = C(x,n) f^x(1-f)^{n-x}$, where $C$ is a factor (the binomial coefficient) such that the sum of $x$ from one to $n$ is 1. By Bayes Theorem, the posterior probability for $f$ (yes, that would be the probability of a probability) is given by

$P(f | x) = \frac{ P(x | f) P(f)}{P(x)}$

where $P(x) = \int_0^1 P(x | f) P(f) df$. As expected, the posterior depends strongly on the prior. A convenient way to express the prior probability is to use a Beta distribution

$P(f |\alpha, \beta) = B(\alpha,\beta)^{-1} f^{\alpha-1} (1-f)^{\beta-1}$ (*)

where $B$ is again a normalization constant (the Beta function). The mean of a beta distribution is given by $E(f) = \alpha/(\alpha + \beta)$ and the variance, which is a measure of uncertainty, is given by $Var(f) = \alpha \beta /(\alpha + \beta)^2 (\alpha + \beta + 1)$. The posterior distribution for $f$ after observing $x$ planets with life out of $n$ will be

$P(f | x) = D f^{\alpha + x -1} (1-f)^{n+\beta - x -1}$

where $D$ is a normalization factor. This is again a Beta distribution. The Beta distribution is called the conjugate prior for the Binomial because it’s form is preserved in the posterior.

Applying Bayes theorem in equation (*), we see that the mean and variance of the posterior become $(\alpha+x)/(\alpha + \beta +n)$ and $(\alpha+x)( \beta+n-x) /(\alpha + \beta + n)^2 (\alpha + \beta + n + 1)$, respectively. Now let’s consider how our priors have updated. Suppose our prior was $\alpha = \beta = 1$, which gives a uniform distribution for $f$ on the range 0 to 1. It has a mean of 1/2 and a variance of 1/12. If we find one planet with life after checking 10,000 planets then our expected $f$ becomes 2/10002 with variance $2\times 10^{-8}$. The observation of a single planet has greatly reduced our uncertainty and we now expect about 1 in 5000 planets to have life. Now what happens if we find no planets. Then, our expected $f$ only drops to 1 in 10000 and the variance is about the same. So, the difference between finding a planet versus not finding a planet only halves our posterior if we had no prior bias. But suppose we are really skeptical and have a prior with $\alpha =0$ and $\beta = 1$ so our expected probability is zero with zero variance. The observation of a single planet increases our posterior to 1 in 10001 with about the same small variance. However, if we find a single planet out of much fewer observations like 100, then our expected probability for life would be even higher but with more uncertainty. In any case, Sara Seager’s intuition is correct – finding a planet would be a game breaker and not finding one shouldn’t really discourage us that much.

Why middle school science should not exist

My 8th grade daughter had her final (distance learning) science quiz this week on work, or as it is called in her class, the scientific definition of work. I usually have no idea what she does in her science class since she rarely talks to me about school but she so happened to mention this one tidbit because she was proud that she didn’t get fooled by what she thought was a trick question. I’ve always believed that work, as in force times displacement (not the one where you produce economic value), is one of the most useless concepts in physics and should not be taught to anyone until they reach graduate school, if then. It is a concept that has long outlived its usefulness and all it does now is to convince students that science is just a bunch of concepts invented to confuse you. The problem with science education in general is that it is taught as a set of facts and definitions when the only thing that kids need to learn is that science is about trying to show something is true using empirical evidence. My daughter’s experience is evidence that science education in the US has room for improvement.

Work, as defined in science class, is just another form of energy, and the only physics that should be taught to middle school kids is that there are these quantities in the universe called energy and momentum and they are conserved. Work is just the change in energy of a system due to a force moving something. For example, the work required to lift a mass against gravity is the distance the mass was lifted multiplied by the force used to move it. This is where it starts to get a little confusing because there are actually two reasons you need force to move something. The first is because of Newton’s First Law of inertia – things at rest like to stay at rest and things in motion like to stay in motion. In order to move something from rest you need to accelerate it, which requires a force and from Newton’s second law, Force equals mass times acceleration, or F = ma. However, if you move something upwards against the force of gravity then even to move at a constant velocity you need to use a force that is equal to the gravitational force pulling the thing downwards, which from Newton’s law of gravitation is given by $F = G M m/r^2$, where $G$ is the universal gravitational constant, $M$ is the mass of the earth, $m$ is the mass of the object and $r$ is the distance between the objects. By a very deep property of the universe, the mass in Newton’s law of gravitation is the exact same mass as that in Newton’s second law, called inertial mass. So that means if we let $GM/r^2 = g$, then we get $F = mg$, and $g = 9.8 m/s^2$ is the gravitational acceleration constant if we set $r$ be the radius of the earth, which is much bigger than the height of things we usually deal with in our daily lives. All things dropped near the earth will accelerate to the ground at $9.8 m/s^2$. If gravitational mass and inertial mass were not the same, then objects of different masses would not fall with the same acceleration. Many people know that Galileo showed this fact in his famous experiment where he dropped a big and small object from the Leaning Tower of Pisa. However, many probably also cannot explain why including my grade 7 (or was it 8) science teacher who thought it was because the earth’s mass was much bigger than the two objects so the difference was not noticeable. The equivalence of gravitational and inertial mass was what led Einstein to his General Theory of Relativity.

In the first part of my daughter’s quiz, she was asked to calculate the energy consumed by several appliances in her house for one week. She had to look up how much power was consumed by the refrigerator, computer, television and so forth on the internet. Power is energy per unit time so she computed the amount of energy used by multiplying the power used by the total time the device is on per week. In the second part of the quiz she was asked to calculate how far she must move to power those devices. This is actually a question about conservation of energy and to answer the question she had to equate the energy used with the work definition of force times distance traveled. The question told her to use gravitational force, which implies she had to be moving upwards against the force of gravity, or accelerating at g if moving horizontally, although this was not specifically mentioned. So, my daughter took the energy used to power all her appliances and divided it by the force, i.e. her mass times g, and got a distance. The next question was, and I don’t recall exactly how it was phrased but something to the effect of: “Did you do scientifically defined work when you moved?”

Now, in her class, she probably spent a lot of time examining situations to distinguish work from non-work. Lifting a weight is work, a cat riding a Roomba is not work. She learned that you did no work when you walked because the force was perpendicular to your direction of motion. I find these types of gotcha exercises to be useless at best and in my daughter’s case completely detrimental. If you were to walk by gliding along completely horizontally with absolutely no vertical motion at a constant speed then yes you are technically not doing mechanical work. But your muscles are contracting and expanding and you are consuming energy. It’s not your weight times the distance you moved but some very complicated combination of metabolic rate, muscle biochemistry, energy losses in your shoes, etc. Instead of looking at examples and identifying which are work and which are not, it would be so much more informative if they were asked to deduce how much energy would be consumed in doing these things. The cat on the Roomba is not doing work but the Roomba is using energy to turn an electric motor that has to turn the wheel to move the cat. It has to accelerate from standing still and also gets warm, which means some of the energy is wasted to heat. A microwave oven uses energy because it must generate radio waves. Boiling water takes energy because you need to impart random kinetic energy to the water molecules. A computer uses energy because it needs to send electrons through transistors. Refrigerators work by using work energy to pump the heat energy from the inside to the outside. You can’t cool a room by leaving the refrigerator door open because you will just pump heat around in a circle and some of the energy will be wasted as extra heat.

My daughter’s answer to the question of was work done was that no work was done because she interpreted movement to be walking horizontally and she knew from all the gotcha examples that walking was not work. She read to me her very legalistically parsed paragraph explaining her reasoning, which made me think that while science may not be in her future, law might be. I tried to convince her that in order for the appliances to run, energy had to come from somewhere so she must have done some work at some point in her travels but she would have no part of it. She said it must be a trick question so the answer has to not make sense. She proudly submitted the quiz convinced more then ever that her so-called scientist Dad is a complete and utter idiot.

Audio of SIAM talk

Here is an audio recording synchronized to slides of my talk a week and a half ago in Pittsburgh. I noticed some places where I said the wrong thing such as conflating neuron with synapse.  I also did not explain the learning part very well. I should point out that we are not applying a control to the network.  We train a set of weights so that given some initial condition, the neuron firing rates follow a specified target pattern. I also made a joke that implied that the Recursive Least Squares algorithm dates to 1972. That is not correct. It goes back much further back than that. I also take a pot shot at physicists. It was meant as a joke of course and describes many of my own papers.

Talk at Maryland

I gave a talk at the Center for Scientific Computing and Mathematical Modeling at the University of Maryland today.  My slides are here.  I apologize for the excessive number of pages but I had to render each build in my slides, otherwise many would be unreadable.  A summary of the work and links to other talks and papers can be found here.

Gravitational waves detected

You have probably heard the news that gravitational waves have finally been directly detected. The Times has a nice summary. Also see Steve Hsu. Here is an interview with one of the founders, Rainer Weiss. I remember Weiss talking about LIGO back when I was a graduate student at MIT. It’s nice to see that he finally succeeded.

The problem with sci fi movies

I, like many people, enjoy science fiction films. The biggest problem I find in these fictional universes is not that sounds can propagate through space, people can travel at the speed of light with no relativistic effects then decelerate to a stop in a few seconds and not even be knocked to the floor, be able to generate artificial gravity everywhere, have power sources that rarely need refueling, and so forth. I accept that these are convenient plot devices that keep the story moving forward. Although I do have to say that successful films like 2001: A Space Odyssey and more recently Interstellar and The Martian show that trying to be faithful to science can often provide an even better plot device. I am still impressed by the special effects in 2001 and the amazing attention to detail of director Stanley Kubrick, e.g. near the beginning of the movie when they are on the rotating space station you can see the subtle curvature of the floor inside the rim. I hope the success of these movies lead to more realistic science fiction and even realistic action movies where the violence is realistically portrayed – people can’t be hit by a brick and then get up.

No, the thing that most irks me about science fiction movies is that the film makers either refuse or are too lazy to make their universes self-consistent. This list is in no particular order and is by no means exhaustive.

1. Why do storm troopers in Star Wars movies wear plastic suits if they don’t protect them from anything?
2. In an age with extremely powerful computers and communication devices, why should various control systems only be accessed at specific locations in a building or space craft. Do you really need to go to the engine room to fix the engine? Haven’t they progressed beyond a WWII aircraft carrier?
3. Why are weapons in the future so bad? Why do people ever miss? There is self-aiming, self-guided bullet technology now and in a future universe with flying cars no one has thought of making this? This also goes for space crafts still engaging in dog fights like the Battle of Britain in 1940.
4. In the Avenger movies, Iron Man Tony Stark invents a fusion reactor that can fit in his chest and power a flying suit for at least the duration of the movie without ever refueling. Shouldn’t this have transformed the world? This could solve global warming if not end global poverty. Even if he is not making the invention public shouldn’t the rest of the world be working on this?
5. In the Hunger Games series they have technology to make mutant animals and plants so why is there hunger? They have a ban on GMO’s for food? Why do they still need coal mining or at least need people to do it?
6. My very first blog post was about the thermodynamic impossibility of the premise of the Matrix movies. Stupid premises seem to be a major problem with the Warchowski sibbling’s films that I have seen. They have this pretense for being intellectual and try to infuse their films with a social consciousness but unfortunately fail. The theme in both the Matrix and the more recent film Jupiter Ascending (JA) is that there is an evil future society that treats humans as commodities – as energy in the Matrix and as a source for an immortal elixir in JA. That could be fine if in JA there was something mystical about humans that could not be reproduced elsewhere but what the Warchowskis do instead is try to infuse some science in it so it is not magic. There is a proto-human race that caused the dinosaurs on earth to go extinct so that humans could arise and then waited 65 million years before they could harvest them for the elixir. That was the easiest way to create a farm for humans? A second premise is that the heroine of the movie is an exact genetic replica of a former Queen who owns earth and who bequeathed her wealth to anyone who is a genetic replica. Again, the Warchowskis forgot to do their math. The probability of an exact genetic replica coming from chance, which is what they insisted on, would be at most 1 in $2^{10,000,000}$ (if differences are only biallelic common variants), which is unimaginably small. The proto-humans are also billions of years old but have not evolved in any way over that time even though squirrel-like creatures turned into humans in 65 million years on earth.
7. Even in the movie Interstellar, there is a future race of humans that have the technology to tame a black hole and send messages to the past but they can’t send back instructions for making crops that will grow on earth?

I appreciate that some of these movies are not about science or the future but remakes of old western, adventure, or war movies. However, some are really trying to portray a possible future. If that is the case then some amount of self-consistency is necessary to make the story compelling. One very possible future that I don’t see being explored in popular movies is that unlike dystopian futures where there is a return to feudalism and people are exploited by evil overlords or capitalists, a real problem we may face is that people will become obsolete. People should make movies about what a world where machines can replace almost everything people do would look like. In fact a better premise for the Matrix is that we chose to live in a big simulacrum and a subset of us rebelled. Now that would be an interesting movie.

Are we in a fusion renaissance?

Fusion is a potentially unlimited source of non-carbon emitting energy. It requires the mashing together of small nuclei such as deuterium and tritium to make another nucleus and a lot of leftover energy. The problem is that nuclei do not want to be mashed together and thus to achieve fusion you need something to confine high energy nuclei for a long enough time. Currently, there are only two methods that have successfully demonstrated fusion: 1) gravitational confinement as in the center of a star, and 2) inertial confinement as in a nuclear bomb. In order to get nuclei at high enough energy to overcome the energy barrier for a fusion reaction, electrons can no longer be bound to nuclei to form atoms. A gas of quasi-neutral hot nuclei and electrons is called a plasma and has often been dubbed the fourth state of matter. Hence, the physics of fusion is mostly the physics of plasmas.

My PhD work was in plasma physics and although my thesis ultimately dealt with chaos in nonlinear partial differential equations, my early projects were tangentially related to fusion. At that time there were two approaches to attaining fusion, one was to try to do controlled inertial confinement by using massive lasers to implode a tiny pellet of fuel and the second was to use magnetic confinement in a tokamak reactor. Government sponsored research has been focused almost exclusively on these two approaches for the past forty years. There is a huge laser fusion lab at Livermore and an even bigger global project for magnetic confinement fusion in Cadarache France, called ITER. As of today, neither has proven that they will ever be viable sources of energy although there is evidence of break even where the reactors produce more energy than is put in.

However, these approaches may not ultimately be viable and there really has not been much research funding to pursue alternative strategies. This recent New York Times article reports on a set of privately funded efforts to achieve fusion backed by some big names in technology including Paul Allen, Jeff Bezos and Peter Thiel. Although there is well deserved skepticism for the success of these companies,  (I’m sure my thesis advisor Abe Bers would have had some insightful things to say about them), the time may be ripe for new approaches. In an impressive talk I heard many years ago, roboticist Rodney Brooks remarked that Moore’s Law has allowed robotics to finally be widely available because you could use software to compensate for hardware. Instead of requiring cost prohibitive high precision motors, you could use cheap ones and use software to control them. The hybrid car is only possible because of the software to decide when to use the electric motor and when to use the gas engine. The same idea may also apply to fusion. Fusion is so difficult because plasmas are inherently unstable. Most of the past effort has been geared towards designing physical systems to contain them. However, I can now imagine using software instead.

Finally, government attempts have mostly focused on using a Deuterium-Tritium fusion reaction because it has the highest yield. The problem with this reaction is that it produces a neutron, which then destroys the reactor. However, there are reactions that do not produce neutrons (see here). Abe used to joke that that we could mine the moon for Helium 3 to use in a Deuterium-Helium 3 reactor. So, although we may never have viable fusion on earth, it could be a source of energy on Elon Musk’s moon base, although solar would probably be a lot cheaper.

Abraham Bers, 1930 – 2015

I was saddened to hear that my PhD thesis advisor at MIT, Professor Abraham Bers, passed away last week at the age of 85. Abe was a fantastic physicist and mentor. He will be dearly missed by his many students. I showed up at MIT in the fall of 1986 with the intent of doing experimental particle physics. I took Abe’s plasma physics course as a breadth requirement for my degree. When I began, I didn’t know what a plasma was but by the end of the term I had joined his group. Abe was one of the best teachers I have ever had. His lectures exemplified his extremely clear and insightful mind. I still consult the notes from his classes from time to time.

Abe also had a great skill in finding the right problem for students. I struggled to get started doing research but one day Abe came to my desk with this old Russian book and showed me a figure. He said that it didn’t make sense according to the current theory and asked me to see if I could understand it. Somehow, this lit a spark in me and pursuing that little puzzle resulted in my first three papers. However, Abe also realized, even before I did I think, that I actually liked applied math better than physics. Thus, after finishing these papers and building some command in the field, he suggested that I completely switch my focus to nonlinear dynamics and chaos, which was very hot at the time. This turned out to be the perfect thing for me and it also made me realize that I could always change fields. I have never been afraid of going outside of my comfort zone since. I am always thankful for the excellent training I received at MIT.

The most eventful experience of those days was our weekly group meetings. These were famous no holds barred affairs where the job of the audience was to try to tear down everything the presenter said. I would prepare for a week to get ready when it was my turn. I couldn’t even get through the first slide my first time but by the time I graduated, nothing could faze me. Although the arguments could get quite heated at times, Abe never lost his cool. He would also come to my office after a particularly bad presentation to cheer me up. I don’t ever have any stress when giving talks or speaking in public now because I know that there could never be a sharper or tougher audience than Abe.

To me, Abe will always represent the gentleman scholar to which I’ve always aspired. He was always impeccably dressed with his tweed jacket, Burberry trench coat, and trademark bow tie. Well before good coffee became de rigueur in the US, Abe was a connoisseur and kept his coffee in a freezer in his office. He led a balanced life. He took work very seriously but also made sure to have time for his family and other pursuits. I visited him at MIT a few years ago and he was just as excited about what he was doing then as he was when I was a graduate student. Although he is gone, he will not be forgotten. The book he had been working on, Plasma Waves and Fusion, will be published this fall. I will be sure to get a copy as soon as it comes out.

2015-9-16: Here is a link to his MIT obituary.

Hopfield on the difference between physics and biology

Here is a short essay by theoretical physicist John Hopfield of the Hopfield net and kinetic proofreading fame among many other things (hat tip to Steve Hsu). I think much of the hostility of biologists towards physicists and mathematicians that Hopfield talks about have dissipated over the past 40 years, especially amongst the younger set. In fact these days, a good share of Cell, Science, and Nature papers have some computational or mathematical component. However, the trend is towards brute force big data type analysis rather than the simple elegant conceptual advances that Hopfield was famous for. In the essay, Hopfield gives several anecdotes and summarizes them with pithy words of advice. The one that everyone should really heed and one I try to always follow is “Do your best to make falsifiable predictions. They are the distinction between physics and ‘Just So Stories.’”

New paper on path integrals

Carson C. Chow and Michael A. Buice. Path Integral Methods for Stochastic Differential Equations. The Journal of Mathematical Neuroscience,  5:8 2015.

Abstract: Stochastic differential equations (SDEs) have multiple applications in mathematical neuroscience and are notoriously difficult. Here, we give a self-contained pedagogical review of perturbative field theoretic and path integral methods to calculate moments of the probability density function of SDEs. The methods can be extended to high dimensional systems such as networks of coupled neurons and even deterministic systems with quenched disorder.

This paper is a modified version of our arXiv paper of the same title.  We added an example of the stochastically forced FitzHugh-Nagumo equation and fixed the typos.

Talk at Jackfest

I’m currently in Banff, Alberta for a Festschrift for Jack Cowan (webpage here). Jack is one of the founders of theoretical neuroscience and has infused many important ideas into the field. The Wilson-Cowan equations that he and Hugh Wilson developed in the early seventies form a foundation for both modeling neural systems and machine learning. My talk will summarize my work on deriving “generalized Wilson-Cowan equations” that include both neural activity and correlations. The slides can be found here. References and a summary of the work can be found here. All videos of the talks can be found here.

Addendum: 17:44. Some typos in the talk were fixed.

Addendum: 18:25. I just realized I said something silly in my talk.  The Legendre transform is an involution because the transform of the transform is the inverse. I said something completely inane instead.

Analytic continuation continued

As I promised in my previous post, here is a derivation of the analytic continuation of the Riemann zeta function to negative integer values. There are several ways of doing this but a particularly simple way is given by Graham Everest, Christian Rottger, and Tom Ward at this link. It starts with the observation that you can write

$\int_1^\infty x^{-s} dx = \frac{1}{s-1}$

if the real part of $s>0$. You can then break the integral into pieces with

$\frac{1}{s-1}=\int_1^\infty x^{-s} dx =\sum_{n=1}^\infty\int_n^{n+1} x^{-s} dx$

$=\sum_{n=1}^\infty \int_0^1(n+x)^{-s} dx=\sum_{n=1}^\infty\int_0^1 \frac{1}{n^s}\left(1+\frac{x}{n}\right)^{-s} dx$      (1)

For $x\in [0,1]$, you can expand the integrand in a binomial expansion

$\left(1+\frac{x}{n}\right)^{-s} = 1 +\frac{sx}{n}+sO\left(\frac{1}{n^2}\right)$   (2)

Now substitute (2) into (1) to obtain

$\frac{1}{s-1}=\zeta(s) -\frac{s}{2}\zeta(s+1) - sR(s)$  (3)

or

$\zeta(s) =\frac{1}{s-1}+\frac{s}{2}\zeta(s+1) +sR(s)$   (3′)

where the remainder $R$ is an analytic function when $Re s > -1$ because the resulting series is absolutely convergent. Since the zeta function is analytic for $Re s >1$, the right hand side is a new definition of $\zeta$ that is analytic for $s >0$ aside from a simple pole at $s=1$. Now multiply (3) by $s-1$ and take the limit as $s\rightarrow 1$ to obtain

$\lim_{s\rightarrow 1} (s-1)\zeta(s)=1$

which implies that

$\lim_{s\rightarrow 0} s\zeta(s+1)=1$     (4)

Taking the limit of $s$ going to zero from the right of (3′) gives

$\zeta(0^+)=-1+\frac{1}{2}=-\frac{1}{2}$

Hence, the analytic continuation of the zeta function to zero is -1/2.

The analytic domain of $\zeta$ can be pushed further into the left hand plane by extending the binomial expansion in (2) to

$\left(1+\frac{x}{n}\right)^{-s} = \sum_{r=0}^{k+1} \left(\begin{array}{c} -s\\r\end{array}\right)\left(\frac{x}{n}\right)^r + (s+k)O\left(\frac{1}{n^{k+2}}\right)$

Inserting into (1) yields

$\frac{1}{s-1}=\zeta(s)+\sum_{r=1}^{k+1} \left(\begin{array}{c} -s\\r\end{array}\right)\frac{1}{r+1}\zeta(r+s) + (s+k)R_{k+1}(s)$

where $R_{k+1}(s)$ is analytic for $Re s>-(k+1)$.  Now let $s\rightarrow -k^+$ and extract out the last term of the sum with (4) to obtain

$\frac{1}{-k-1}=\zeta(-k)+\sum_{r=1}^{k} \left(\begin{array}{c} k\\r\end{array}\right)\frac{1}{r+1}\zeta(r-k) - \frac{1}{(k+1)(k+2)}$    (5)

Rearranging (5) gives

$\zeta(-k)=-\sum_{r=1}^{k} \left(\begin{array}{c} k\\r\end{array}\right)\frac{1}{r+1}\zeta(r-k) -\frac{1}{k+2}$     (6)

where I have used

$\left( \begin{array}{c} -s\\r\end{array}\right) = (-1)^r \left(\begin{array}{c} s+r -1\\r\end{array}\right)$

The righthand side of (6) is now defined for $Re s > -k$.  Rewrite (6) as

$\zeta(-k)=-\sum_{r=1}^{k} \frac{k!}{r!(k-r)!} \frac{\zeta(r-k)(k-r+1)}{(r+1)(k-r+1)}-\frac{1}{k+2}$

$=-\sum_{r=1}^{k} \left(\begin{array}{c} k+2\\ k-r+1\end{array}\right) \frac{\zeta(r-k)(k-r+1)}{(k+1)(k+2)}-\frac{1}{k+2}$

$=-\sum_{r=1}^{k-1} \left(\begin{array}{c} k+2\\ k-r+1\end{array}\right) \frac{\zeta(r-k)(k-r+1)}{(k+1)(k+2)}-\frac{1}{k+2} - \frac{\zeta(0)}{k+1}$

Collecting terms, substituting for $\zeta(0)$ and multiplying by $(k+1)(k+2)$  gives

$(k+1)(k+2)\zeta(-k)=-\sum_{r=1}^{k-1} \left(\begin{array}{c} k+2\\ k-r+1\end{array}\right) \zeta(r-k)(k-r+1) - \frac{k}{2}$

Reindexing gives

$(k+1)(k+2)\zeta(-k)=-\sum_{r'=2}^{k} \left(\begin{array}{c} k+2\\ r'\end{array}\right) \zeta(-r'+1)r'-\frac{k}{2}$

Now, note that the Bernoulli numbers satisfy the condition $\sum_{r=0}^{N-1} B_r = 0$.  Hence,  let $\zeta(-r'+1)=-\frac{B_r'}{r'}$

and obtain

$(k+1)(k+2)\zeta(-k)=\sum_{r'=0}^{k+1} \left(\begin{array}{c} k+2\\ r'\end{array}\right) B_{r'}-B_0-(k+2)B_1-(k+2)B_{k+1}-\frac{k}{2}$

which using $B_0=1$ and $B_1=-1/2$ gives the self-consistent condition

$\zeta(-k)=-\frac{B_{k+1}}{k+1}$,

which is the analytic continuation of the zeta function for integers $k\ge 1$.

Analytic continuation

I have received some skepticism that there are possibly other ways of assigning the sum of the natural numbers to a number other than -1/12 so I will try to be more precise. I thought it would be also useful to derive the analytic continuation of the zeta function, which I will do in a future post.  I will first give a simpler example to motivate the notion of analytic continuation. Consider the geometric series $1+s+s^2+s^3+\dots$. If $|s| < 1$ then we know that this series is equal to

$\frac{1}{1-s}$                (1)

Now, while the geometric series is only convergent and thus analytic inside the unit circle, (1) is defined everywhere in the complex plane except at $s=1$. So even though the sum doesn’t really exist outside of the domain of convergence, we can assign a number to it based on (1). For example, if we set $s=2$ we can make the assignment of $1 + 2 + 4 + 8 + \dots = -1$. So again, the sum of the powers of two doesn’t really equal -1, only (1) is defined at s=2. It’s just that the geometric series and (1) are the same function inside the domain of convergence. Now, it is true that the analytic continuation of a function is unique. However, although the value of -1 for $s=-1$ is the only value for the analytic continuation of the geometric series, that doesn’t mean that the sum of the powers of 2 needs to be uniquely assigned to negative one because the sum of the powers of 2 is not an analytic function. So if you could find some other series that is a function of some parameter $z$ that is analytic in some domain of convergence and happens to look like the sum of the powers of two for some $z$ value, and you can analytically continue the series to that value, then you would have another assignment.

Now consider my example from the previous post. Consider the series

$\sum_{n=1}^\infty \frac{n-1}{n^{s+1}}$  (2)

This series is absolutely convergent for $s>1$.  Also note that if I set s=-1, I get

$\sum_{n=1}^\infty (n-1) = 0 +\sum_{n'=1}^\infty n' = 1 + 2 + 3 + \dots$

which is the sum of then natural numbers. Now, I can write (2) as

$\sum_{n=1}^\infty\left( \frac{1}{n^s}-\frac{1}{n^{s+1}}\right)$

and when the real part of s is greater than 1,  I can further write this as

$\sum_{n=1}^\infty\frac{1}{n^s}-\sum_{n=1}^\infty\frac{1}{n^{s+1}}=\zeta(s)-\zeta(s+1)$  (3)

All of these operations are perfectly fine as long as I’m in the domain of absolute convergence.  Now, as I will show in the next post, the analytic continuation of the zeta function to the negative integers is given by

$\zeta (-k) = -\frac{B_{k+1}}{k+1}$

where $B_k$ are the Bernoulli numbers, which is given by the Taylor expansion of

$\frac{x}{e^x-1} = \sum B_n \frac{x^n}{n!}$   (4)

The first few Bernoulli numbers are $B_0=1, B_1=-1/2, B_2 = 1/6$. Thus using this in (4) gives $\zeta(-1)=-1/12$. A similar proof will give $\zeta(0)=-1/2$.  Using this in (3) then gives the desired result that the sum of the natural numbers is (also) 5/12.

Now this is not to say that all assignments have the same physical value. I don’t know the details of how -1/12 is used in bosonic string theory but it is likely that the zeta function is crucial to the calculation.

Nonuniqueness of -1/12

I’ve been asked to give an example of how the sum of the natural numbers could lead to another value in the comments to my previous post so I thought it may be of general interest to more people. Consider again $S=1+2+3+4\dots$ to be the sum of the natural numbers.  The video in the previous slide gives a simple proof by combining divergent sums. In essence, the manipulation is doing renormalization by subtracting away infinities and the left over of this renormalization is -1/12. There is another video that gives the proof through analytic continuation of the Riemann zeta function

$\zeta(s)=\sum_{n=1}^\infty \frac{1}{n^s}$

The zeta function is only strictly convergent when the real part of s is greater than 1. However, you can use analytic continuation to extract values of the zeta function to values where the sum is divergent. What this means is that the zeta function is no longer the “same sum” per se, but a version of the sum taken to a domain where it was not originally defined but smoothly (analytically) connected to the sum. Hence, the sum of the natural numbers is given by $\zeta(-1)$ and $\zeta(0)=\sum_{n=1}^\infty 1$, (infinite sum over ones). By analytic continuation, we obtain the values $\zeta(-1)=-1/12$ and $\zeta(0)=-1/2$.

Now notice that if I subtract the sum over ones from the sum over the natural numbers I still get the sum over the natural numbers, e.g.

$1+2+3+4\dots - (1+1+1+1\dots)=0+1+2+3+4\dots$.

Now, let me define a new function $\xi(s)=\zeta(s)-\zeta(s+1)$ so $\xi(-1)$ is the sum over the natural numbers and by analytic continuation $\xi(-1)=-1/12+1/2=5/12$ and thus the sum over the natural numbers is now 5/12. Again, if you try to do arithmetic with infinity, you can get almost anything. A fun exercise is to create some other examples.

The sum of the natural numbers is -1/12?

This wonderfully entertaining video giving a proof for why the sum of the natural numbers  is -1/12 has been viewed over 1.5 million times. It just shows that there is a hunger for interesting and well explained math and science content out there. Now, we all know that the sum of all the natural numbers is infinite but the beauty (insidiousness) of infinite numbers is that they can be assigned to virtually anything. The proof for this particular assignment considers the subtraction of the divergent oscillating sum $S_1=1-2+3-4+5 \dots$ from the divergent sum of the natural numbers $S = 1 + 2 + 3+4+5\dots$ to obtain $4S$.  Then by similar trickery it assigns $S_1=1/4$. Solving for $S$ gives you the result $S = -1/12$.  Hence, what you are essentially doing is dividing infinity by infinity and that as any school child should know, can be anything you want. The most astounding thing to me about the video was learning that this assignment was used in string theory, which makes me wonder if the calculations would differ if I chose a different assignment.

Addendum: Terence Tao has a nice blog post on evaluating such sums.  In a “smoothed” version of the sum, it can be thought of as the “constant” in front of an asymptotic divergent term.  This constant is equivalent to the analytic continuation of the Riemann zeta function. Anyway, the -1/12 seems to be a natural way to assign a value to the divergent sum of the natural numbers.

Talk in Taiwan

I’m currently at the National Center for Theoretical Sciences, Math Division, on the campus of the National Tsing Hua University, Hsinchu for the 2013 Conference on Mathematical Physiology.  The NCTS is perhaps the best run institution I’ve ever visited. They have made my stay extremely comfortable and convenient.

Here are the slides for my talk on Correlations, Fluctuations, and Finite Size Effects in Neural Networks.  Here is a list of references that go with the talk

E. Hildebrand, M.A. Buice, and C.C. Chow, Kinetic theory of coupled oscillators,’ Physical Review Letters 98 , 054101 (2007) [PRL Online] [PDF]

M.A. Buice and C.C. Chow, Correlations, fluctuations and stability of a finite-size network of coupled oscillators’. Phys. Rev. E 76 031118 (2007) [PDF]

M.A. Buice, J.D. Cowan, and C.C. Chow, ‘Systematic Fluctuation Expansion for Neural Network Activity Equations’, Neural Comp., 22:377-426 (2010) [PDF]

C.C. Chow and M.A. Buice, ‘Path integral methods for stochastic differential equations’, arXiv:1009.5966 (2010).

M.A. Buice and C.C. Chow, `Effective stochastic behavior in dynamical systems with incomplete incomplete information.’ Phys. Rev. E 84:051120 (2011).

MA Buice and CC Chow. Dynamic finite size effects in spiking neural networks. PLoS Comp Bio 9:e1002872 (2013).

MA Buice and CC Chow. Generalized activity equations for spiking neural networks. Front. Comput. Neurosci. 7:162. doi: 10.3389/fncom.2013.00162, arXiv:1310.6934.

Here is the link to relevant posts on the topic.

New paper on neural networks

Michael Buice and I have a new paper in Frontiers in Computational Neuroscience as well as on the arXiv (the arXiv version has fewer typos at this point). This paper partially completes the series of papers Michael and I have written about developing generalized activity equations that include the effects of correlations for spiking neural networks. It combines two separate formalisms we have pursued over the past several years. The first was a way to compute finite size effects in a network of coupled deterministic oscillators (e.g. see here, herehere and here).  The second was to derive a set of generalized Wilson-Cowan equations that includes correlation dynamics (e.g. see here, here, and here ). Although both formalisms utilize path integrals, they are actually conceptually quite different. The first formalism adapted kinetic theory of plasmas to coupled dynamical systems. The second used ideas from field theory (i.e. a two-particle irreducible effective action) to compute self-consistent moment hierarchies for a stochastic system. This paper merges the two ideas to generate generalized activity equations for a set of deterministic spiking neurons.

Richard Azuma, 1930 – 2013

I was saddened to learn that Richard “Dick” Azuma, who was a professor in the University of Toronto Physics department from 1961 to 1994 and emeritus after that, passed yesterday. He was a nuclear physicist par excellence and chair of the department when I was there as an undergraduate in the early 80’s. I was in the Engineering Science (physics option) program, which was an enriched engineering program at UofT. I took a class in nuclear physics with Professor Azuma during my third year. He brought great energy and intuition to the topic. He was one of the few professors I would talk to outside of class and one day I asked if he had any open summer jobs. He went out of his way to secure a position for me at the nuclear physics laboratory TRIUMF in Vancouver in 1984. That was the best summer of my life. The lab was full of students from all over Canada and I remain good friends with many of them today. I worked on a meson scattering experiment and although I wasn’t of much use to the experiment I did get to see first hand what happens in a lab. I wrote a 4th year thesis on some of the results from that experiment. I last saw Dick in 2010 when I went to Toronto to give a physics colloquium. He was still very energetic and as engaged in physics as ever. We will all miss him greatly.

New paper on neural networks

Michael Buice and I have just published a review paper of our work on how to go beyond mean field theory for systems of coupled neurons. The paper can be obtained here. Michael and I actually pursued two lines of thought on how to go beyond mean field theory and we show how the two are related in this review. The first line started in trying to understand how to create a dynamic statistical theory of a high dimensional fully deterministic system. We first applied the method to the Kuramoto system of coupled oscillators but the formalism could apply to any system. Our recent paper in PLoS Computational Biology was an application for a network of synaptically coupled spiking neurons. I’ve written about this work multiple times (e.g. here,  here, and here). In this series of papers, we looked at how you can compute fluctuations around the infinite system size limit, which defines mean field theory for the system, when you have a finite number of neurons. We used the inverse number of neurons as a perturbative expansion parameter but the formalism could be generalized to expand in any small parameter, such as the inverse of a slow time scale.

The second line of thought was with regards to the question of how to generalize the Wilson-Cowan equation, which is a phenomenological population activity equation for a set of neurons, which I summarized here. That paper built upon the work that Michael had started in his PhD thesis with Jack Cowan. The Wilson-Cowan equation is a mean field theory of some system but it does not specify what that system is. Michael considered the variable in the Wilson-Cowan equation to be the rate (stochastic intensity) of a Poisson process and prescribed a microscopic stochastic system, dubbed the spike model, that was consistent with the Wilson-Cowan equation. He then considered deviations away from pure Poisson statistics. The expansion parameter in this case was more obscure. Away from a bifurcation (i.e. critical point) the statistics of firing would be pure Poisson but they would deviate near the critical point, so the small parameter was the inverse distance to criticality. Michael, Jack and I then derived a set of self-consistent set of equations for the mean rate and rate correlations that generalized the Wilson-Cowan equation.

The unifying theme of both approaches is that these systems can be described by either a hierarchy of moment equations or equivalently as a functional or path integral. This all boils down to the fact that any stochastic system is equivalently described by a distribution function or the moments of the distribution. Generally, it is impossible to explicitly calculate or compute these quantities but one can apply perturbation theory to extract meaningful quantities. For a path integral, this involves using Laplace’s method or the method of steepest descents to approximate an integral and in the moment hierarchy method it involves finding ways to truncate or close the system. These methods are also directly related to WKB expansion, but I’ll leave that connection to another post.