The low carb war continues

Last month, a paper in the British Journal of Medicine on the effect of low carb diets on energy expenditure, with senior author David Ludwig, made a big splash in the popular press and also instigated a mini-Twitter war. The study, which cost somewhere in the neighborhood of 12 million dollars, addressed the general question of whether a person will burn more energy on a low carbohydrate diet compared to an average or high carb diet. In particular, the study looked at the time period after weight loss where people are susceptible to regaining weight. The argument is that it will be easier to maintain weight loss on a low carb diet since you will be burning more energy. Recent intensive studies by my colleague Kevin Hall and others have found that low carb diets had little effect if any on energy expenditure, so this paper was somewhat of a surprise and gave hope to low carb aficionados. However, Kevin found some possible flaws, which he points out in an official response to BMJ and a BioRxiv paper, which then prompted a none-too-pleased response from Ludwig, which you can follow on Twitter. The bottom line is that the low carb effect size depends on the baseline point you compare too. In the original study plan, the baseline point was chosen to be energy expenditure prior to the weight loss phase of the study. In the publication, the baseline point was changed to after the weight loss but before the weight loss maintenance phase. If the original baseline was chosen, the low carb effect is no longer significant. The authors claim that they were blinded to the data and changed the baseline for technical reasons so this did not represent a case of p-hacking where one tries multiple combinations until something significant turns up. It seems pretty clear to me that low carbs do not have much of a metabolic effect but that is not to say that low carb diets are not effective. The elephant in the room is still appetite. It is possible that you are simply less hungry on a low carb diet and thus you eat less. Also, when you eliminate a whole category of food, there is just less food to eat. That could be the biggest effect of all.

The tragedy of low probability events

We live in an age of fear and yet life (in the US at least) is the safest it has ever been. Megan McArdle blames coddling parents and the media in a Washington Post column. She argues that cars and swimming pools are much more dangerous than school shootings and kidnappings yet we mostly ignore the former and obsess about the latter. However, to me dying from an improbable event is just so much more tragic than dying from an expected one. I would be much less despondent meeting St. Peter at the Pearly Gates if I happened to expire from cancer or heart disease than if I were to be hit by an asteroid while weeding my garden. We are so scared now because we have never been safer. We would fear terrorist attacks less if they were more frequent. This is the reason that I would never want a major increase in lifespan. I most certainly would like to last long enough to see my children become independent but anything beyond that is bonus time. Nothing could be worse to me than immortality. The pain of any tragedy would be unbearable. Life would consist of an endless accumulation of sad memories. The way out is to forget but that to me is no different from death. What would be the point of living forever if you were to erase much of it. What would a life be if you forgot the people and things that you loved? To me that is no life at all.

Harvard and Asian Americans

The current trial regarding Harvard’s admissions policies seem to clearly indicate that they discriminate against Asian Americans. I had always assumed this to be the case. My take is that the problem is not so much that Harvard is non-transparent and unfair in how it selects students but rather that Harvard and the other top universities have too much influence on the rest of society. Each justice on the US Supreme Court has a degree from either Harvard or Yale. That is positively feudalistic. So here is my solution. All universities have a choice. They can 1) choose students any way they wish but they lose their tax free status or 2) retain tax exempt status but then adhere to strict non-discrimination and affirmative action rules. The top schools already have massive endowments and hurt the locales they are in by buying property and then not pay property taxes. I say let them do what they want but tax them heavily for the right to do so. The government should also not subsidize loans for students that attend such schools.

New paper on learning in spiking neural networks

Chris Kim and I recently published a paper in eLife:

Learning recurrent dynamics in spiking networks.


Spiking activity of neurons engaged in learning and performing a task show complex spatiotemporal dynamics. While the output of recurrent network models can learn to perform various tasks, the possible range of recurrent dynamics that emerge after learning remains unknown. Here we show that modifying the recurrent connectivity with a recursive least squares algorithm provides sufficient flexibility for synaptic and spiking rate dynamics of spiking networks to produce a wide range of spatiotemporal activity. We apply the training method to learn arbitrary firing patterns, stabilize irregular spiking activity in a network of excitatory and inhibitory neurons respecting Dale’s law, and reproduce the heterogeneous spiking rate patterns of cortical neurons engaged in motor planning and movement. We identify sufficient conditions for successful learning, characterize two types of learning errors, and assess the network capacity. Our findings show that synaptically-coupled recurrent spiking networks possess a vast computational capability that can support the diverse activity patterns in the brain.


The ideas that eventually led to this paper were seeded by two events. The first was about five years ago when I heard Dean Buonomano talk about his work with Rodrigo Laje on how to tame chaos in a network of rate neurons. Dean and Rodrigo expanded on the work by Larry Abbott and David Sussillo. The guiding idea from these two influential works stems from the “the echo state machine” or “reservoir computing”. Basically, this idea exploits the inherent chaotic dynamics of a recurrent neural network to project inputs onto diverse trajectories from which a simple learning rule can be deployed to extract a desired output.

To explain the details of this idea and our work, I need to go back to Minsky and Papert and their iconic 1969 book on feedforward neural networks (called perceptrons), who divided learning problems into two types.  The first type is linearly separable, which means that if you want to learn a classifier on some inputs, then a single linear plane can be drawn to separate the two input classes on the space of all inputs. The classic example is the OR function.  When given inputs (x_1,x_2) = (0,1), (1,0), (1,1), it outputs 1 and when given (0,0) it outputs 0. If we consider the inputs on the x-y plane then we can easily draw a line separating point (0,0) from the rest. The classic linearly non-separable problem is exclusive OR or XOR, where (0,0) and (1,1) map to 0, while (0,1) and (1,0) map to 1. In this case, no single straight line can separate the points. Minsky and Papert showed that a single layer perceptron, where the only thing you learn is the connection strengths from input to output, can never learn a linearly inseparable problem. Most interesting and nontrivial problems are inseparable.

Mathematically, we can write a perceptron as x_i^{\alpha+1} = \sum w_{ij}^{\alpha}f^{\alpha}(x_j^{\alpha}), where x_i^{\alpha} is the value of neuron i in layer \alpha and f is a connection or gain function. The inputs are x_i^{0} and the output are x_i^{L}. The perceptron problem is to find a set of w‘s such that the output layer gives you the right value to a task posed in the input layer, e.g. perform XOR. A single layer perceptron is then simply x_i^{1} = \sum w_{ij}^{\alpha}f^{\alpha}(x_j^{0}). Now of course we could always design f to do what you ask but since we are not training f, it needs to be general enough for all problems and is usually chosen to be a monotonic increasing function with a threshold. Minsky and Papert showed that the single layer problem is equivalent to the matrix equation w = M v and this can never solve a linearly inseparable problem, since it defines a plane. If f is a linear function then a multiple layer problem reduces to a single layer problem so what makes perceptron learning and deep learning possible is that there are multiple layers and f is a nonlinear function. Minsky and Paper also claimed that there was no efficient way to train a multi-layered network and this killed the perceptron for more than a decade until backpropagation was discovered and rediscovered in the 1980’s. Backprop rekindled a flurry of neural network activity and then died down because other machine learning methods proved better at that time. The recent ascendancy of deep learning is the third wave of perceptron interest and was spurred by the confluence of 1) more computing power via the GPU, 2) more data, and 3) finding the right parameter tweaks to make perceptron learning work much much better. Perceptrons still can’t solve everything, e.g. NP complete problems are still NP complete, they are still far from being optimal, and they do not discount a resurgence or invention of another method.

The idea of reservoir computing is to make a linearly inseparable problem separable by processing the inputs. The antecedent is the support vector machine or kernel method, which projects the data to a higher dimension such that an inseparable problem is separable. In the XOR example, if we can add a dimension and map (0,0) and (1,1) to (0,0,0) and (1,1,0) and map (1,0) and (0,1) to (1,0,1) and (0,1,1) then the problem is separable. The hard part is finding the mapping or kernels to do this. Reservoir computing uses the orbit of a chaotic system as a kernel. Chaos, by definition, causes initial conditions to diverge exponentially and by following a trajectory for as long as you want you can make as high dimensional a space as you want; in high enough dimensions all points are linearly separable if they are far enough apart. However, the defining feature of chaos is also a bug because any slight error in your input will also diverge exponentially and thus the kernel is inherently unstable. The Sussillo and Abbott breakthrough was that they showed you could have your cake and eat it too. They stabilized the chaos using feedback and/or learning while still preserving the separating property. This then allowed training of the output layer to work extremely efficiently. Laje and Bunomano took this one step further by showing that you could directly train the recurrent network to stabilize chaos. My thought at that time was why are chaotic patterns so special? Why can’t you learn any pattern?

The second pivotal event came in a conversation with the ever so insightful Kechen Zhang when I gave a talk at Hopkins. In that conversation, we discussed how perhaps it was possible that any internal neuron mechanism, such as nonlinear dendrites, could be reproduced by adding more neurons to the network and thus from an operational point of view it didn’t matter if you had the biology correct. There would always exist a recurrent network that could do your job. The problem was to find the properties that make a network “universal” in that it could reproduce the dynamics of any other network or any dynamical system. After this conversation, I was certain that this was true and began spouting this idea to anyone who would listen.

One of the people I mentioned this to was Chris Kim when he contacted me for a job in my lab in 2015. Later Chris told me that he thought my idea was crazy or impossible to prove but he took the job anyway because he wanted to be in Maryland where his family lived. So, upon his arrival in the fall of 2016, I tasked him with training a recurrent neural network to follow arbitrary patterns. I also told him that we should do it on a network of spiking neurons. I thought that doing this on a set of rate neurons would be too easy or already done so we should move to spiking neurons. Michael Buice and I had just recently published our paper on computing finite size corrections to a spiking network of coupled theta neurons with linear synapses. Since we had good control of the dynamics of this network, I thought it would be the ideal system. The network has the form

\dot\theta_i = f(\theta_i, I_i, u_i)

\tau_s \dot u_i= - u_i+ 2 \sum_j w_{ij}\delta(\theta_j-\pi)

Whenever neuron j crosses the angle \pi it gives an impulse to neuron i with weight scaled by w_{ij}, which can be positive or negative. The idea is to train the synaptic drive u_i(t) or the firing rate of neuron i to follow an arbitrary temporal pattern. Despite his initial skepticism, Chris actually got this to work in less than six months. It took us another year or so to understand how and why.

In our paper, we show that if the synapses are fast enough, i.e. \tau_s is small enough, and the patterns are diverse enough, then any set of patterns can be learned. The reason, which is explained in mathematical detail in the paper, is that if the synapses are fast enough, then the synaptic drive acts like a quasi-static function of the inputs and thus the spiking problem reduces to the rate problem

\tau_s \dot u_i= - u_i+ \sum_j w_{ij}g(u_j)

where g is the frequency-input curve of a theta neuron. Then the problem is about satisfying the synaptic drive equation, which given the linearity in the weights, boils down to whether \tau_s \dot u_i + u_i is in the space spanned by \sum w_{ij} g(u_j),  which we show is always possible as long as the desired patterns imposed on u_i(t) are uncorrelated or linearly independent enough. However, there is a limit to how long the patterns can be, which is governed by the number of entries in w_{ij}, which is limited by the number of neurons. The diversity of patterns limitation can also be circumvented by adding auxiliary neurons. If you wanted some number of neurons to do the same thing, you just need to include a lot of other neurons that do different things. A similar argument can be applied to the time averaged firing rate (on any time window) of a given neuron. I now think we have shown that a recurrent network of really simple spiking neurons is dynamically universal. All you need are lots of fast neurons.


Addendum: The dates of events may not all be correct. I think my conversation with Kechen came before Dean’s paper but my recollection is foggy. Memories are not reliable.


Optimizing luck

Each week on the NPR podcast How I Built This, host Guy Raz interviews a founder of a successful enterprise like James Dyson or Ben and Jerry. At the end of most segments, he’ll ask the founder how much of their success do they attribute to luck and how much to talent. In most cases, the founder will modestly say that luck played a major role but some will add that they did take advantage of the luck when it came. One common thread for these successful people is that they are extremely resilient and aren’t afraid to try something new when things don’t work at first.

There are two ways to look at this. On the one hand there is certainly some selection bias. For each one of these success stories there are probably hundreds of others who were equally persistent and worked equally hard but did not achieve the same success. It is like the infamous con where you send 1024 people a two outcome prediction about a stock.  The prediction will be correct in 512 of them so the next week you send those people another prediction and so on. After 10 weeks, one person will have received the correct prediction 10 times in a row and will think you are infallible. You then charge them a King’s ransom for the next one.

Yet, it may be possible to optimize luck and you can see this with Jensen’s inequality. Suppose x represents some combination of your strategy and effort level and \phi(x) is your outcome function.  Jensen’s inequality states that the average or expectation value of a convex function (e.g. a function that bends upwards) is greater than (or equal to) the function of the expectation value. Thus, E(\phi(x)) \ge \phi(E(x)). In other words, if your outcome function is convex then your average outcome will be larger just by acting in a random fashion. During “convex” times, the people who just keep trying different things will invariably be more successful than those who do nothing. They were lucky (or they recognized) that their outcome was convex but their persistence and willingness to try anything was instrumental in their success. The flip side is that if they were in a nonconvex era, their random actions would have led to a much worse outcome. So, do you feel lucky?

AI and authoritarianism

Much of the discourse on the future of AI , such as this one, has focused on people being displaced by machines. While this is certainly a worthy concern, these analyses sometimes fall into the trap of linear thinking because the displaced workers are also customers. The revenues of companies like Google and Facebook depend almost entirely on selling advertisements to a consumer base that has disposable income to spend. What happens when this base dwindles to a tiny fraction of the world’s population? The progression forward will also most likely not be monotonic because as people initially start to be replaced by machines, those left with jobs may actually get increased compensation and thus drive more consumerism. The only thing that is certain is that the end point of a world where no one has work is one where capitalism as we know it will no longer exist.

Historian and author Yuval Harari argues that in the pre-industrial world, to have power is to have land (I would add slaves and I strongly recommend visiting the National Museum of African American History and Culture for a sobering look at how America became so powerful). In the industrial world, the power shifted to those who own the machines (although land won’t hurt) while in the post-industrial world, power falls to those with the data. Harari was extrapolating our current world where large corporations can track us continually and use machine learning to monopolize our attention and get us to do what they desire. However, data on people is only useful as long as they have resources you want. If people truly become irrelevant then their data is also irrelevant.

It’s anyone’s guess as to what will happen in the future. I proposed an optimistic scenario here but here is a darker one. Henry Ford supposedly wanted to pay his employees a decent wage because he realized that they were also the customers for his product. In the early twentieth century, the factory workers formed the core of the burgeoning middle class that would drive demand for consumer products made in the very factories where they toiled. It was in the interest of industrialists that the general populace be well educated and healthy because they were the source of their wealth. This link began to fray at the end of the twentieth century with the rise of the service economy, globalisation, and automation. After the second World War, post-secondary education became available to a much larger fraction of the population. These college educated people did not go to work on the factory floor but fed the expanding ranks of middle management and professionals. They became managers and accountants and dentists and lawyers and writers and consultants and doctors and educators and scientists and engineers and administrators. They started new businesses and new industries and helped drive the economy to greater prosperity. They formed an upper middle class that slowly separated from the working class and the rest of the middle class. They also started to become a self-sustaining entity that did not rely so much on the rest of the population. Globalisation and automation made labor plentiful and cheap so there was less of an incentive to have a healthy educated populace. The wealth of the elite no longer depended on the working class and thus their desire to invest in them declined. I agree with the thesis that the abandonment of the working class in Western liberal democracies is the main driver of the recent rise of authoritarianism and isolationism around the world.

However, authoritarian populist regimes, such as those in Venezuela and Hungary, stay in power because the disgruntled class that supports them is a larger fraction of the population than the opposing educated upper middle class that are the winners in a contemporary liberal democracy. In the US, the disgruntled class is still a minority so thus far it seems like authoritarianism will be held at bay by the majority coalition of immigrants, minorities, and costal liberals. However, this coalition could be short lived. Up to now, AI and machine learning has not been taking jobs away from the managerial and professional classes. But as I wrote about before, the people most at risk for losing jobs to machines may not be those doing jobs that are simple for humans to master but those that are difficult. It may take awhile before professionals start to be replaced but once it starts it could go swiftly. Once a machine learning algorithm is trained, it can be deployed everywhere instantly. As the ranks of the upper middle class dwindle, support for a liberal democracy could weaken and a new authoritarian regime could rise.

Ironically, a transition to a consumer authoritarianism would be smoothed and possibly quickened by a stronger welfare state. A possible jobless economy would be one where the state provides a universal basic income that is funded by taxation on existing corporations, which would then compete for those very same dollars. Basically, the future incarnations of Apple, Netflix, Facebook, Amazon, and Google would give money to an idle population and then try to win it back. Although, this is not a world I would choose to live in, it would be preferable to a socialistic model where the state would decide on what goods and services to provide. It would actually be in the interest of the corporations and their elite owners to lobby for high taxes and to not form monopolies and allow for competition to provide better goods and services. The tax rate would not matter much because in a steady state loop, any wealth inequality is stable regardless of the flux. It is definitely in their interest to keep the idle population happy.

Mosquito experiment concluded

It’s hard to see from the photo but when I checked my bucket after a week away, there were definitely a few mosquito larvae swimming around. There was also an impressive biofilm on the bottom of the bucket. It took less than a month for mosquitoes to breed in a newly formed pool of stagnant water. My son also noticed that a nearby flower pot with water only a few centimeters deep also had larvae. So the claims that mosquitos will breed in tiny amounts of stagnant water is true.IMG_3158