Here are my slides for the talk I gave today at The 4th workshop on Advanced Methods in Theoretical Neuroscience, Structure and disorder: From random connections to functional circuits, July 10-12 2019, Göttingen, Germany.
I have read two essays in the past month on the brain and consciousness and I think both point to examples of why consciousness per se and the “problem of consciousness” are both so confusing and hard to understand. The first article is by philosopher Galen Strawson in The Stone series of the New York Times. Strawson takes issue with the supposed conventional wisdom that consciousness is extremely mysterious and cannot be easily reconciled with materialism. He argues that the problem isn’t about consciousness, which is certainly real, but rather matter, for which we have no “true” understanding. We know what consciousness is since that is all we experience but physics can only explain how matter behaves. We have no grasp whatsoever of the essence of matter. Hence, it is not clear that consciousness is at odds with matter since we don’t understand matter.
I think Strawson’s argument is mostly sound but he misses on the crucial open question of consciousness. It is true that we don’t have an understanding of the true essence of matter and we probably never will but that is not why consciousness is mysterious. The problem is that we do now know whether the rules that govern matter, be they classical mechanics, quantum mechanics, statistical mechanics, or general relativity, could give rise to a subjective conscious experience. Our understanding of the world is good enough for us to build bridges, cars, computers and launch a spacecraft 4 billion kilometers to Pluto, take photos, and send them back. We can predict the weather with great accuracy for up to a week. We can treat infectious diseases and repair the heart. We can breed super chickens and grow copious amounts of corn. However, we have no idea how these rules can explain consciousness and more importantly we do not know whether these rules are sufficient to understand consciousness or whether we need a different set of rules or reality or whatever. One of the biggest lessons of the twentieth century is that knowing the rules does not mean you can predict the outcome of the rules. Not even taking into the computability and decidability results of Turing and Gödel, it is still not clear how to go from the microscopic dynamics of molecules to the Navier-Stokes equation for macroscopic fluid flow and how to get from Navier-Stokes to the turbulent flow of a river. Likewise, it is hard to understand how the liver works, much less the brain, starting from molecules or even cells. Thus, it is possible that consciousness is an emergent phenomenon of the rules that we already know, like wetness or a hurricane. We simply do not know and are not even close to knowing. This is the hard problem of consciousness.
The second article is by psychologist Robert Epstein in the online magazine Aeon. In this article, Epstein rails against the use of computers and information processing as a metaphor for how the brain works. He argues that this type of restricted thinking is why we can’t seem to make any progress understanding the brain or consciousness. Unfortunately, Epstein seems to completely misunderstand what computers are and what information processing means.
Firstly, a computation does not necessarily imply a symbolic processing machine like a von Neumann computer with a central processor, memory, inputs and outputs. A computation in the Turing sense is simply about finding or constructing a desired function from one countable set to another. Now, the brain certainly performs computations; any time we identify an object in an image or have a conversation, the brain is performing a computation. You can couch it in whatever language you like but it is a computation. Additionally, the whole point of a universal computer is that it can perform any computation. Computations are not tied to implementations. I can always simulate whatever (computable) system you want on a computer. Neural networks and deep learning are not symbolic computations per se but they can be implemented on a von Neumann computer. We may not know what the brain is doing but it certainly involves computation of some sort. Any thing that can sense the environment and react is making a computation. Bacteria can compute. Molecules compute. However, that is not to say that everything a brain does can be encapsulated by Turing universal computation. For example, Penrose believes that the brain is not computable although as I argued in a previous post, his argument is not very convincing. It is possible that consciousness is beyond the realm of computation and thus would entail very different physics. However, we have yet to find an example of a real physical phenomenon that is not computable.
Secondly, the brain processes information by definition. Information in both the Shannon and Fisher senses is a measure of uncertainty reduction. For example, in order to meet someone for coffee you need at least two pieces of information, where and when. Before you received that information your uncertainty was huge since there were so many possible places and times the meeting could take place. After receiving the information your uncertainty was eliminated. Just knowing it will be on Thursday is already a big decrease in uncertainty and an increase in information. Much of the brain’s job at least for cognition is about uncertainly reduction. When you are searching for your friend in the crowded cafe, you are eliminating possibilities and reducing uncertainty. The big mistake that Epstein makes is conflating an example with the phenomenon. Your brain does not need to function like your smartphone to perform computations or information processing. Computation and information theory are two of the most important mathematical tools we have for analyzing cognition.
Shashaank Vattikuti , Phyllis Thangaraj, Hua W. Xie, Stephen J. Gotts, Alex Martin, Carson C. Chow. Canonical Cortical Circuit Model Explains Rivalry, Intermittent Rivalry, and Rivalry Memory. PLoS Computational Biology (2016).
It has been shown that the same canonical cortical circuit model with mutual inhibition and a fatigue process can explain perceptual rivalry and other neurophysiological responses to a range of static stimuli. However, it has been proposed that this model cannot explain responses to dynamic inputs such as found in intermittent rivalry and rivalry memory, where maintenance of a percept when the stimulus is absent is required. This challenges the universality of the basic canonical cortical circuit. Here, we show that by including an overlooked realistic small nonspecific background neural activity, the same basic model can reproduce intermittent rivalry and rivalry memory without compromising static rivalry and other cortical phenomena. The background activity induces a mutual-inhibition mechanism for short-term memory, which is robust to noise and where fine-tuning of recurrent excitation or inclusion of sub-threshold currents or synaptic facilitation is unnecessary. We prove existence conditions for the mechanism and show that it can explain experimental results from the quartet apparent motion illusion, which is a prototypical intermittent rivalry stimulus.
When the brain is presented with an ambiguous stimulus like the Necker cube or what is known as the quartet illusion, the perception will alternate or rival between the possible interpretations. There are neurons in the brain whose activity is correlated with the perception and not the stimulus. Hence, perceptual rivalry provides a unique probe of cortical function and could possibly serve as a diagnostic tool for cognitive disorders such as autism. A mathematical model based on the known biology of the brain has been developed to account for perceptual rivalry when the stimulus is static. The basic model also accounts for other neural responses to stimuli that do not elicit rivalry. However, these models cannot explain illusions where the stimulus is intermittently switched on and off and the same perception returns after an off period because there is no built-in mechanism to hold the memory. Here, we show that the inclusion of experimentally observed low-level background neural activity is sufficient to explain rivalry for static inputs, and rivalry for intermittent inputs. We validate the model with new experiments.
This paper is the latest of a continuing series of papers outlining how a canonical cortical circuit of excitatory and inhibitory cells can explain psychophysical and electrophysiological data of perceptual and cortical dynamics under a wide range of stimuli and conditions. I’ve summarized some of the work before (e.g. see here). In this particular paper, we show how the same circuit previously shown to explain winner-take-all behavior, normalization, and oscillations at various time scales, can also possess memory in the absence of input. Previous work has shown that if you have a circuit with effective mutual inhibition between two pools representing different percepts and include some type of fatigue process such as synaptic depression or spike frequency adaptation, then the circuit exhibits various dynamics depending on the parameters and input conditions. If the inhibition strength is relatively low and the two pools receive equal inputs then the model will have a symmetric fixed point where both pools are equally active. As the inhibition strength (or input strength) increases, then there can be a bifurcation to oscillations between the two pools with a frequency that is dependent on the strengths of inhibition, recurrent excitation, input, and the time constant of the fatigue process. A further increase in inhibition leads to a bifurcation to a winner-take-all (WTA) state where one of the pools dominates the others. However, the same circuit would be expected to not possess “rivalry memory”, where the same percept returns after the stimulus is completely removed for a duration that is long compared to the average oscillation period (dominance time). The reason is that during rivalry, the dominant pool is weakened while the suppressed pool is strengthened by the fatigue process. Thus when the stimulus is removed and returned, the suppressed pool would be expected to win the competition and become dominant. This reasoning had led people, including myself, to believe that rivalry memory could not be explained by this same model.
However, one thing Shashaank observed and that I hadn’t really noticed before was that the winner-take-all state can persist for arbitrarily low input strength. We prove a little theorem in the paper showing that if the gain function (or FI curve) is concave (i.e. does not bend up), then the winner-take-all will persist for arbitrarily low input if the inhibition is strong enough. Most importantly, the input does not need to be tuned and could be provided by the natural background activity known to exist in the brain. Even zero mean noise is sufficient to maintain the WTA state. This low-activity WTA state can then serve as a memory since whatever was active during a state with strong input can remain active when the input is turned off and the neurons just receive low level background activity. It is thus a purely mutual inhibition maintained memory. We dubbed this “topological memory” because it is like a kink in the carpet that never disappears and persists over a wide range of parameter values and input strengths. Although, we only consider rivalry memory in this paper, the mechanism could also apply in other contexts such as working memory. In this paper, we also focus on a specific rivalry illusion called the quartet illusion, which makes the model slightly more complicated but we show how it naturally reduces to a two pool model. We are currently finishing a paper quantifying precisely how excitatory and inhibitory strengths affect rivalry and other cortical phenomena so watch this space. We also have submitted an abstract to neuroscience demonstrating how you can get WTA and rivalry in a balanced-state network.
Update: link to paper is fixed.
Read what computational neuroscientist Ken Miller of Columbia thinks about brain preservation and emulation. The piece captures Ken’s tragic sense so perfectly.
Definitely read Christof Koch and Michael Buice’s commentary on the Blue Brain Project paper in Cell. They nicely summarize all the important points of the paper and propose a Turing Test for models. The performance of a model can be assessed by how long it would take an experimenter to figure out if the data from proposed neurophysiological experiments was coming from a model or the real thing. I think that this is a nice idea but there is one big difference between the Turing Test for artificial intelligence and brain simulations and that is that everyone has an innate sense of what it means to be human but no one knows what a real brain should be doing. In that sense, it is not really a Turing Test per se but rather the replication of experiments in a more systematic way than is done now. You do an experiment on a real brain then repeat it on the model and see if they get comparable results.
Appearing in this week’s edition of Cell is a paper summarizing the current status of Henry Markram’s Blue Brain Project. You can download the paper for free until Oct 22 here. The paper reports on a morphological and electrophysiological statistically accurate reconstruction of a rat somatosensory cortex. I think it is a pretty impressive piece of work. They first did a survey of cortex (14 thousand recorded and labeled neurons) to get probability distributions for various types of neurons and their connectivities. The neurons are classified according to their morphology (55 m-types), electrophysiology (11 e-types), and synaptic dynamics (6 s-types). The neurons are connected according to an algorithm outlined in a companion paper in Frontiers in Computational Neuroscience that reproduces the measured connectivity distribution. They then created a massive computer simulation of the reconstructed circuit and show that it has interesting dynamics and can reproduce some experimentally observed behaviour.
Although much of the computational neuroscience community has not really rallied behind Markram’s mission, I’m actually more sanguine about it now. Whether the next project to do the same for the human brain is worth a billion dollars, especially if this is a zero sum game, is another question. However, it is definitely a worthwhile pursuit to systematically catalogue and assess what we know now. Just like how IBM’s Watson did not really invent any new algorithms per se, it clearly changed how we perceive machine learning by showing what can be done if enough resources are put into it. One particularly nice thing the project has done is to provide a complete set of calibrated models for all types of cortical neurons. I will certainly be going to their data base to get the equations for spiking neurons in all of my future models. I think one criticism they will face is that their model basically produced what they put in but to me that is a feature not a bug. A true complete description of the brain would be a joint probability distribution for everything in the brain. This is impossible to compute in the near future no matter what scale you choose to coarse grain over. No one really believes that we need all this information and thus the place to start is to assume that the distribution completely factorizes into a product of independent distributions. We should at least see if this is sufficient and this work is a step in that direction.
However, the one glaring omission in the current rendition of this project is an attempt to incorporate genetic and developmental information. A major constraint in how much information is needed to characterize the brain is how much is contained in the genome. How much of what determines a neuron type and its location is genetically coded, determined by external inputs, or is just random? When you see great diversity in something there are two possible answers: 1) the details matter a lot or 2) details do not matter at all. I would want to know the answer to this question first before I tried to reproduce the brain.
First there was Carl Sagan, then there was Neil deGrasse Tyson, now there is Michael Buice:
I asked Rick Gerkin to write a summary of his recent eLife paper commenting on a much hyped Science paper on how many odours we can discriminate.
On Proving Too Much in Scientific Data Analysis
by Richard C. Gerkin
First off, thank you to Carson for inviting me to write about this topic.
Last year, Science published a paper by a group at Rockefeller University claiming that humans can discriminate at least a trillion smells. This was remarkable and exciting because, as the authors noted, there are far fewer than a trillion mutually discriminable colors or pure tones, and yet olfaction has been commonly believed to be much duller than vision or audition, at least in humans. Could it in fact be much sharper than the other senses?
After the paper came out in Science, two rebuttals were published in eLife. The first was by Markus Meister, an olfaction and vision researcher and computational neuroscientist at Cal Tech. My colleague Jason Castro and I had a separate rebuttal. The original authors have also posted a re-rebuttal of our two papers (mostly of Meister’s paper), which has not yet been peer reviewed. Here I’ll discuss the source of the original claim, and the logical underpinnings of the counterclaims that Meister, Castro, and I have made.
How did the original authors support their claim in the Science paper? Proving this claim by brute force would have been impractical, so the authors selected a representative set of 128 odorous molecules and then tested a few hundred random 30-component mixtures of those molecules. Since many mixture stimuli can be constructed in this way but only a small fraction can be practically tested, they tried to extrapolate their experimental results to the larger space of possible mixtures. They relied on a statistical transformation of the data, followed by a theorem from the mathematics of error-correcting codes, to estimate — from the data they collected — a lower bound on the actual number of discriminable olfactory stimuli.
The two rebuttals in eLife are mostly distinct from one another but have a common thread: both effectively identify the Science paper’s analysis framework with the logical fallacy of `proving too much‘, which can be thought of as a form of reductio ad absurdum. An argument `proves too much’ when it (or an argument of parallel construction) can prove things that are known to be false. For example, the 11th century theologian St. Anselm’s ontological argument [ed note: see previous post] for the existence of god states (in abbreviated form): “God is the greatest possible being. A being that exists is greater than one that doesn’t. If God does not exist, we can conceive of an even greater being, that is one that does exist. Therefore God exists”. But this proves too much because the same argument can be used to prove the existence of the greatest island, the greatest donut, etc., by making arguments of parallel construction about those hypothetical items, e.g. “The Lost Island is the greatest possible island…” as shown by Anselm’s contemporary Gaunilo of Marmoutiers. One could investigate further to identify more specific errors in logic in Anselm’s argument, but this can be tricky and time-consuming. Philosophers have spent centuries doing just this, with varying levels of success. But simply showing that the argument proves too much is sufficient to at least call the conclusion into question. This makes `proves too much’ a rhetorically powerful approach. In the context of a scientific rebuttal, leading with a demonstration that this fallacy has occurred piques enough reader interest to justify a dissection of more specific technical errors. Both eLife rebuttals use this approach, first showing that the analysis framework proves too much, and then exploring the source(s) of the error in greater detail.
How does one show that a particular detailed mathematical analysis `proves too much’ about experimental data? Let me reduce the analysis in the Science paper to the essentials, and abstract away all the other mathematical details. The most basic claim in that paper is based upon what I will call `the analysis framework’:
The authors did three basic things. First, they extracted a critical parameter from their data set using a statistical procedure I’ll call . represents an average threshold for discriminability, corresponding to the number of components by which two mixtures must differ to be barely discriminable. Second, they fed this derived value, , into a function, that produces a number of odorous mixtures . Finally, they argued that the number so obtained necessarily underestimates the `true’ number of discriminable smells, owing to the particular form of . Each step and proposition can be investigated:
1) How does the quantity behave as the data or form of varies? That is, is the `right thing’ to do to the data?
2) What implicit assumptions does make about the sense of smell — are these assumptions reasonable?
3) Is the stated inequality — which says that any number derived using will always underestimate the true value z — really valid?
What are the rebuttals about? Meister’s paper rejects the equation 2 on the grounds that is unjustified for the current problem. Castro and I are also critical of , but focus more on equations 1 and 3, criticizing the robustness of and demonstrating that the inequality should be reversed (the last of which I will not discuss further here). So together we called everything about the analysis framework into question. However, all parties are enthusiastic about the data itself, as well as its importance, so great care should be taken to distinguish the quality of the data from the validity of the interpretation.
In Meister’s paper, he shows that the analysis framework proves too much by using simulations of simple models, using either synthetic data or the actual data from the original paper. These simulations show that the original analysis framework can generate all sorts of values for which are known to be false by construction. For example, he shows that a synthetic organism constructed to have 3 odor percepts necessarily produces data which, when the analysis framework is applied, yield values of . Since we know by construction that the correct answer is 3, the analysis framework must be flawed. This kind of demonstration of `proving too much’ is also known by the more familiar term `positive control’: a control where a specific non-null outcome can be expected in advance if everything is working correctly. When instead of the correct outcome the analysis framework produces an incredible outcome reminiscent of the one reported in the Science paper, then that framework proves too much.
Meister then explores the reason the equations are flawed, and identifies the flaw in . Imagine making a map of all odors, wherein similar-smelling odors are near each other on the map, and dissimilar-smelling odors are far apart. Let the distance between odors on the map be highly predictive of their perceptual similarity. How many dimensions must this map have to be accurate? We know the answer for a map of color vision: 3. Using only hue (H), saturation (S), and lightness (L) any perceptible color can be constructed, and any two nearby colors in an HSL map are perceptually similar, while any two distant colors in an such a map are perceptually dissimilar. The hue and saturation subspace of that map is familiar as the `color wheel‘, and has been understood for more than a century. In that map, hue is the angular dimension, saturation is the radial dimension, and lightness (if it were shown) would be perpendicular to the other two.
Meister argues that must be based upon a corresponding perceptual map. Since no such reliable map exists for olfaction, Meister argues, we cannot even begin to construct an for the smell problem; in fact, the actually used in the Science paper assumes a map with 128 dimensions, corresponding to the dimensionality of the stimulus not the (unknown) dimensionality of the perceptual space. By using such a high dimensional version of , a very high large value of is guaranteed, but unwarranted.
In my paper with Castro, we show that the original paper proves too much in a different way. We show that very similar datasets (differing only in the number of subjects, the number of experiments, or the number of molecules) or very similar analytical choices (differing only in the statistical significance criterion or discriminability thresholds used) produce vastly different estimates for , differing over tens of orders of magnitude from the reported value. Even trivial differences produce absurd results such as `all possible odors can be discriminated’ or `at most 1 odor can be discriminated’. The differences were trivial in the sense that equally reasonable experimental designs and analyses could and have proceeded according to these differences. But the resulting conclusions are obviously false, and therefore the analysis framework has proved too much. This kind of demonstration of `proving too much’ differs from that in Meister’s paper. Whereas he showed that the analysis framework produces specific values that are known to be incorrect, we showed that it can produce any value at all under equally reasonable assumptions. For many of those assumptions, we don’t know if the values it produces is correct or not; after all, there may be or or discriminable odors — we don’t know. But if all values are equally justified, the framework proves too much.
We then showed the technical source of the error, which is a very steep dependence of on incidental features of the study design, mediated by , which is then amplified exponentially by a steep nonlinearity in . I’ll illustrate with a much more well-known example from gene expression studies. When identifying genes that are thought to be differentially expressed in some disease or phenotype of interest, there is always a statistical significance threshold, e.g. , , etc. used for selection. After correcting for multiple comparisons, some number of genes pass the threshold and are identified as candidates for involvement in the phenotype. With a liberal threshold, e.g. , many candidates will be identified (e.g. 50). With a more moderate threshold, e.g. , fewer candidates will be identified (e.g. 10). With a more strict threshold, e.g. , still fewer candidates will be identified (e.g. 2). This sensitivity is well known in gene expression studies. We showed that the function in the original paper has a similar sensitivity.
Now suppose some researcher went a step further and said, “If there are candidates genes involved in inflammation, and each has two expression levels, then there are inflammation phenotypes”. Then the estimate for the number of inflammation phenotypes might be:
at , and
Any particular claim about the number of inflammation phenotypes from this approach would be arbitrary, incredibly sensitive to the significance threshold, and not worth considering seriously. One could obtain nearly any number of inflammation phenotypes one wanted, just by setting the significance threshold accordingly (and all of those thresholds, in different contexts, are considered reasonable in experimental science).
But this is essentially what the function does in the original paper. By analogy, is the thresholding step, is the number of candidate genes, and is the number of inflammation phenotypes. And while all of the possible values for in the Science paper are arbitrary, a wide range of them would have been unimpressively small, another wide range would have been comically large, and only the `goldilocks zone’ produced the impressive but just plausible value reported in the paper. This is something that I think can and does happen to all scientists. If your first set of analysis decisions gives you a really exciting result, you may forget to check whether other reasonable sets of decisions would give you similar results, or whether instead they would give you any and every result under the sun. This robustness check can prevent you from proving too much — which really means proving nothing at all.
2015-08-14: typos fixed
I’m currently in Göttingen, Germany at the Bernstein Sparks Workshop: Beyond mean field theory in the neurosciences, a topic near and dear to my heart. The slides for my talk are here. Of course no trip to Göttingen would be complete without a visit to Gauss’s grave and Max Born’s house. Photos below.
Carson C. Chow and Michael A. Buice. Path Integral Methods for Stochastic Differential Equations. The Journal of Mathematical Neuroscience, 5:8 2015.
Abstract: Stochastic differential equations (SDEs) have multiple applications in mathematical neuroscience and are notoriously difficult. Here, we give a self-contained pedagogical review of perturbative field theoretic and path integral methods to calculate moments of the probability density function of SDEs. The methods can be extended to high dimensional systems such as networks of coupled neurons and even deterministic systems with quenched disorder.
SCIENTIST I – MODELING, ANALYSIS AND THEORY
The Modeling, Analysis and Theory team at the Allen Institute is seeking a candidate with strong mathematical and computational skills who will work closely with both the team as well as experimentalists in order to both maximize the potential of datasets as well as realize that potential via analysis and theory. The successful candidate will be expected to develop analysis for populations of neurons as well as establish theoretical results on cortical computation, object recognition, and related areas in order to aid the Institute in understanding the most complex piece of matter in the universe.
Michael Buice, Scientist II
Everyone in computational neuroscience knows about the McCulloch-Pitts neuron model, which forms the foundation for neural network theory. However, I never knew anything about Warren McCulloch or Walter Pitts until I read this very interesting article in Nautilus. I had no idea that Pitts was a completely self-taught genius that impressed the likes of Bertrand Russell, Norbert Wiener and John von Neumann but was also a self-destructive alcoholic. One thing the article nicely conveys was the camaraderie and joie de vivre that intellectuals experienced in the past. Somehow this spirit seems missing now.
The New York Times Magazine has a nice profile on theoretical neuroscientist Sebastian Seung this week. I’ve known Sebastian since we were graduate students in Boston in the 1980’s. We were both physicists then and both ended up in biology though through completely different paths. The article focuses on his quest to map all the connections in the brain, which he terms the connectome. Near the end of the article, neuroscientist Eve Marder of Brandeis comments on the endeavor with the pithy remark that “If we want to understand the brain, the connectome is absolutely necessary and completely insufficient.” To which the article ends with
Seung agrees but has never seen that as an argument for abandoning the enterprise. Science progresses when its practitioners find answers — this is the way of glory — but also when they make something that future generations rely on, even if they take it for granted. That, for Seung, would be more than good enough. “Necessary,” he said, “is still a pretty strong word, right?”
Personally, I am not sure if the connectome is necessary or sufficient although I do believe it is a worthy task. However, my hesitation is not because of what was proposed in the article, which is that we exist in a fluid world and the connectome is static. Rather, like Sebastian, I do believe that memories are stored in the connectome and I do believe that “your” connectome does capture much of the essence of “you”. Many years ago, the CPU on my computer died. Our IT person swapped out the CPU and when I turned my computer back on, it was like nothing had happened. This made me realize that everything about the computer that was important to me was stored on the hard drive. The CPU didn’t matter even though every thing a computer did relied on the CPU. I think the connectome is like the hard drive and trying to figure out how the brain works from it is like trying to reverse engineer the CPU from the hard drive. You can certainly get clues from it such as information is stored in binary form but I’m not sure if it is necessary or sufficient to figure out how a computer works by recreating an entire hard drive. Likewise, someday we may use the connectome to recover lost memories or treat some diseases but we may not need it to understand how a brain works.
Gary Marcus, Adam Marblestone, and Thomas Dean have an opinion piece in Science this week challenging the notion of the “canonical cortical circuit”. They have a longer and open version here. Their claim is that the cortex is probably doing a variety of different computations, which they list in their longer paper. The piece has prompted responses by a number of people including Terry Sejnowski and Stephen Grossberg on the connectionist listserv (Check the November archive here).
Here is a cute parable in Frontiers in Neuroscience from cognitive scientist Joshua Brown at Indiana Univeristy. It mirrors a lot of what I’ve been saying for the past few years:
The tale of the neuroscientists and the computer: Why mechanistic theory matters
A little over a decade ago, a biologist asked the question “Can a biologist fix a radio?” (Lazebnik, 2002). That question framed an amusing yet profound discussion of which methods are most appropriate to understand the inner workings of a system, such as a radio. For the engineer, the answer is straightforward: you trace out the transistors, resistors, capacitors etc., and then draw an electrical circuit diagram. At that point you have understood how the radio works and have sufficient information to reproduce its function. For the biologist, as Lazebnik suggests, the answer is more complicated. You first get a hundred radios, snip out one transistor in each, and observe what happens. Perhaps the radio will make a peculiar buzzing noise that is statistically significant across the population of radios, which indicates that the transistor is necessary to make the sound normal. Or perhaps we should snip out a resistor, and then homogenize it to find out the relative composition of silicon, carbon, etc. We might find that certain compositions correlate with louder volumes, for example, or that if we modify the composition, the radio volume decreases. In the end, we might draw a kind of neat box-and-arrow diagram, in which the antenna feeds to the circuit board, and the circuit board feeds to the speaker, and the microphone feeds to the recording circuit, and so on, based on these empirical studies. The only problem is that this does not actually show how the radio works, at least not in any way that would allow us to reproduce the function of the radio given the diagram. As Lazebnik argues, even though we could multiply experiments to add pieces of the diagram, we still won’t really understand how the radio works. To paraphrase Feynmann, if we cannot recreate it, then perhaps we have not understood it (Eliasmith and Trujillo, 2014; Hawking, 2001).
Lazebnik’s argument should not be construed to disparage biological research in general. There are abundant examples of how molecular biology has led to breakthroughs, including many if not all of the pharmaceuticals currently on the market. Likewise, research in psychology has provided countless insights that have led to useful interventions, for instance in cognitive behavioral therapy (Rothbaum et al., 2000). These are valuable ends in and of themselves. Still, are we missing greater breakthroughs by not asking the right questions that would illuminate the larger picture? Within the fields of systems, cognitive, and behavioral neuroscience in particular, I fear we are in danger of losing the meaning of the Question “how does it work”? As the saying goes, if you have a hammer, everything starts to look like a nail. Having been trained in engineering as well as neuroscience and psychology, I find all of the methods of these disciplines useful. Still, many researchers are especially well-trained in psychology, and so the research questions focus predominantly on understanding which brain regions carry out which psychological or cognitive functions, following the established paradigms of psychological research. This has resulted in the question being often reframed as “what brain regions are active during what psychological processes”, or the more sophisticated “what networks are active”, instead of “what mechanisms are necessary to reproduce the essential cognitive functions and activity patterns in the system.” To illustrate the significance of this difference, consider a computer. How does it work?
Once upon a time, a group of neuroscientists happened upon a computer (Carandini, 2012). Not knowing how it worked, they each decided to find out how it sensed a variety of inputs and generated the sophisticated output seen on its display. The EEG researcher quickly went to work, putting an EEG cap on the motherboard and measuring voltages at various points all over it, including on the outer case for a reference point. She found that when the hard disk was accessed, the disk controller showed higher voltages on average, and especially more power in the higher frequency bands. When there was a lot of computation, a lot of activity was seen around the CPU. Furthermore, the CPU showed increased activity in a way that is time-locked to computational demands. “See here,” the researcher declared, “we now have a fairly temporally precise picture of which regions are active, and with what frequency spectra.” But has she really understood how the computer works?
Next, the enterprising physicist and cognitive neuroscientist came along. “We don’t have enough spatial resolution to see inside the computer,” they said. So they developed a new imaging technique by which activity can be measured, called the Metabolic Radiation Imaging (MRI) camera, which now measures the heat (infrared) given off by each part of the computer in the course of its operations. At first, they found simply that lots of math operations lead to heat given off by certain parts of the CPU, and that memory storage involved the RAM, and that file operations engaged the hard disk. A flurry of papers followed, showing that the CPU and other areas are activated by a variety of applications such as word-processing, speech recognition, game play, display updating, storing new memories, retrieving from memory, etc.
Eventually, the MRI researchers gained a crucial insight, namely that none of these components can be understood properly in isolation; they must understand the network. Now the field shifts, and they begin to look at interactions among regions. Before long, a series of high profile papers emerge showing that file access does not just involve the disks. It involves a network of regions including the CPU, the RAM, the disk controller, and the disk. They know this because when they experimentally increase the file access, all of these regions show correlated increases in activity. Next, they find that the CPU is a kind of hub region, because its activity at various times correlates with activity in other regions, such as the display adapter, the disk controller, the RAM, and the USB ports, depending on what task they require the computer to perform.
Next, one of the MRI researchers has the further insight to study the computer while it is idle. He finds that there is a network involving the CPU, the memory, and the hard disk, as (unbeknownst to them) the idle computer occasionally swaps virtual memory on and off of the disk and monitors its internal temperature. This resting network is slightly different across different computers in a way that correlates with their processor speed, memory capacity, etc., and thus it is possible to predict various capacities and properties of a given computer by measuring its activity pattern when idle. Another flurry of publications results. In this way, the neuroscientists continue to refine their understanding of the network interactions among parts of the computer. They can in fact use these developments to diagnose computer problems. After studying 25 normal computers and comparing them against 25 computers with broken disk controllers, they find that the connectivity between the CPU and the disk controller is reduced in those with broken disk controllers. This allows them to use MRI to diagnose other computers with broken disk controllers. They conclude that the disk controller plays a key role in mediating disk access, and this is confirmed with a statistical mediation analysis. Someone even develops the technique of Directional Trunk Imaging (DTI) to characterize the structure of the ribbon cables (fiber tract) from the disk controller to the hard disk, and the results match the functional correlations between the hard disk and disk controller. But for all this, have they really understood how the computer works?
The neurophysiologist spoke up. “Listen here”, he said. “You have found the larger patterns, but you don’t know what the individual circuits are doing.” He then probes individual circuit points within the computer, measuring the time course of the voltage. After meticulously advancing a very fine electrode in 10 micron increments through the hard material (dura mater) covering the CPU, he finds a voltage. The particular region shows brief “bursts” of positive voltage when the CPU is carrying out math operations. As this is the math co-processor unit (unbeknownst to the neurophysiologist), the particular circuit path is only active when a certain bit of a floating point representation is active. With careful observation, the neurophysiologist identifies this “cell” as responding stochastically when certain numbers are presented for computation. The cell therefore has a relatively broad but weak receptive field for certain numbers. Similar investigations of nearby regions of the CPU yield similar results, while antidromic stimulation reveals inputs from related number-representing regions. In the end, the neurophysiologist concludes that the cells in this particular CPU region have receptive fields that respond to different kinds of numbers, so this must be a number representation area.
Finally the neuropsychologist comes along. She argues (quite reasonably) that despite all of these findings of network interactions and voltage signals, we cannot infer that a given region is necessary without lesion studies. The neuropsychologist then gathers a hundred computers that have had hammer blows to various parts of the motherboard, extension cards, and disks. After testing their abilities extensively, she carefully selects just the few that have a specific problem with the video output. She finds that among computers that don’t display video properly, there is an overlapping area of damage to the video card. This means of course that the video card is necessary for proper video monitor functioning. Other similar discoveries follow regarding the hard disks and the USB ports, and now we have a map of which regions are necessary for various functions. But for all of this, have the neuroscientists really understood how the computer works?
As the above tale illustrates, despite all of our current sophisticated methods, we in neuroscience are still in a kind of early stage of scientific endeavor; we continue to discover many effects but lack a proportionally strong standard model for understanding how they all derive from mechanistic principles. There are nonetheless many individual mathematical and computational neural models. The Hodgkin-Huxley equations (Hodgkin and Huxley, 1952), Integrate-and-fire model (Izhikevich, 2003), Genesis (Bower and Beeman, 1994), SPAUN (Eliasmith et al., 2012), and Blue Brain project (Markram, 2006) are only a few examples of the models, modeling toolkits, and frameworks available, besides many others more focused on particular phenomena. Still, there are many different kinds of neuroscience models, and even many different frameworks for modeling. This means that there is no one theoretical lingua franca against which to evaluate empirical results, or to generate new predictions. Instead, there is a patchwork of models that treat some phenomena, and large gaps where there are no models relevant to existing phenomena. The moral of the story is not that the brain is a computer. The moral of the story is twofold: first, that we sorely need a foundational mechanistic, computational framework to understand how the elements of the brain work together to form functional units and ultimately generate the complex cognitive behaviors we study. Second, it is not enough for models to exist—their premises and implications must be understood by those on the front lines of empirical research.
**The Path Forward**
A more unified model shared by the community is not out of reach for neuroscience. Such exists in physics (e.g. the standard model), engineering (e.g. circuit theory), and chemistry. To move forward, we need to consider placing a similar level of value on theoretical neuroscience as for example the field of physics places on theoretical physics. We need to train neuroscientists and psychologists early in their careers in not just statistics, but also in mathematical and computational modeling, as well as dynamical systems theory and even engineering. Computational theories exist (Marr, 1982), and empirical neuroscience is advancing, but we need to develop the relationships between them. This is not to say that all neuroscientists should spend their time building computational models. Rather, every neuroscientist should at least possess literacy in modeling as no less important than, for example, anatomy. Our graduate programs generally need improvement on this front. For faculty, if one is in a soft money position or on the tenure clock and cannot afford the time to learn or develop theories, then why not collaborate with someone who can? If we really care about the question of how the brain works, we must not delude ourselves into thinking that simply collecting more empirical results will automatically tell us how the brain works any more than measuring the heat coming from computer parts will tell us how the computer works. Instead, our experiments should address the questions of what mechanisms might account for an effect, and how to test and falsify specific mechanistic hypotheses (Platt, 1964).
We’ve been using Julia for a little over a month now and I am quite pleased thus far. It is fast enough for us to get decent results for our MCMC runs; runs for which Python and Brian were too slow. Of course, we probably could have tried to optimize our other codes but Julia installs right out of the box and is very easy to program in. We still have issues for plotting but do have minimal plot functionality using PyCall and importing matplotlib.pyplot. One can also dump the results to a file and have either Python or some other software make the plots. I already find myself reaching for Julia when I want to write a quick code to solve a problem instead of relying on Matlab like I used to. While I would like to try PyDSTool, the lack of a trivial installation is holding us back. For you budding software developers out there, if it takes more than one click on a link to make it work on my computer, I am likely to go to something else. The reason I switched from Linux to Mac a decade ago was that I wanted Unix capability and the ability to print without having to consult with a sys admin.
Solving differential equations numerically is a very mature field with multiple algorithms carefully explained in multiple textbooks. However, when it comes down to actually solving them in practice by nonspecialists, a lot of people, myself included, will often resort to the Euler method. There are two main reasons for this. The first is that it is trivial to remember and code and the second is that Moore’s law has increased the speed of computing sufficiently that we can get away with it. However, I think it’s time to get out of the eighteenth century and at least move into the nineteenth.
Suppose you have a differential equation , where can be a scalar or a vector in an arbitrary number of dimensions. The Euler method simply discretizes the time derivative via
so that at time step is given by
Now what could be simpler than that? There are two problems with this algorithm. The first is that it is only accurate to order so you need to have very small steps to simulate accurately. The second is that the Euler method is unstable if the step size is too large. The way around these limitations is to use very small time steps, which is computationally expensive.
The second most popular algorithm is probably (4th order) Runge-Kutta, which can generally use larger time steps for the same accuracy. However, a higher order algorithm is not necessarily faster because what really slows down a numerical scheme is the number of function evaluations it needs to make and 4th order Runge-Kutta needs to make a lot of function evaluations. Runge-Kutta is also plagued by the same stability problems as Euler and that can also limit the maximum allowed step size.
Now, these problems have been known for decades if not centuries and work-arounds do exist. In fact, the most sophisticated solvers such as CVODE tend to use adaptive multi-step methods, like Adams-Bashforth (AB) or it’s implicit cousin Adams-Moulton and this is what we should all be implementing in our own codes. Below I will give simple pseudo-code to show how a 2nd order Adams-Bashforth scheme is not much more complicated to code than Euler and uses the same number of function evaluations. In short, with not much effort, you can speed your code immensely by simply switching to AB.
The cleverness of AB is that it uses previous steps to refine the estimate for the next step. In this way, you get a higher order scheme without having to re-evaluate the right hand side of your ODE. You simply reuse your computations. It’s the ultimate green scheme. For the same ODE example above 2nd order AB is simply
You just need to store the previous two evaluations to compute the next step. To start the algorithm, you can just use an Euler algorithm to get the first step.
Here is the 2nd order AB algorithm written in pseudo Julia code for a one dimensional version of the ODE above.
# make x a two dimensional vector for the two time steps
# Set initial condition
# Store evaluation of function f(x,t) for later use
# Take Euler step
x = x + h*fstore
# store function at time = 2h
fstore = f(x,2*h)
# Store computed values for output
# Take AB steps
for t = 3:Tfinal
# set update indices for x based on parity of t
index2 = t%2+1
index1 = (t-1)%2+1
# 2nd order AB step
x[index1] = x[index2]+h*(1.5*fstore-0.5*fstore))
# update stored function
fstore[index1] = f(x[index1],t*h)
Our foray into Julia was mostly motivated by a desire to have a programming language as fast as C but as easy to program as Matlab. In particular, we need really fast looping speed, which is necessary to simulate dynamical systems. Our particular project involves fitting a conductance-based model (i.e. Hodgkin-Huxley-like equations) to some data using MCMC. We’re not trying to fit the precise spike shapes per se but the firing rates of synaptically coupled pools of neurons in response to various inputs. We thus need to run simulations for some time, compare the averaged rates to previous runs, and iterate. For good fits, we may need to do this a million times. Hence, we need speed and we can’t really parallelize because we need information from one iteration before going to the next.
Now, one would think that simulating neural networks is a long solved problem and I thought that too. Hence, I instructed my fellow Wally Xie to look into all the available neural simulators and see what fits the bill. We first tried Neuron but couldn’t get all of it to work on our Linux machine running CentOS (which is what NIH forces us to use). That was the end of Neuron for us. We next tried Nest but found it difficult to specify the neuron model. We finally settled on Brian, which is adaptable, written in Python, and supported by very nice people, but is turning out to be too slow. Native Python is also too slow. Stan would be great but it does not yet support differential equations. Wally is now implementing the code in Julia, which is turning out to be very fast but has a possible bug that is preventing long runs. The developers are extremely helpful though so we believe this will be worked out soon. However, it is hard to get plotting to work on Julia and as pointed out by many, Julia is not nearly as complete as Python or Matlab. (Yes, Matlab is too slow). We could use C and will go to it next if Julia does not work out. Thus, even though people have been developing neural simulators for decades, there is still an unmet need for something adaptable, fast, easy to use, and runs out of the box. A neuro-Stan would be great.
I’m currently in Banff, Alberta for a Festschrift for Jack Cowan (webpage here). Jack is one of the founders of theoretical neuroscience and has infused many important ideas into the field. The Wilson-Cowan equations that he and Hugh Wilson developed in the early seventies form a foundation for both modeling neural systems and machine learning. My talk will summarize my work on deriving “generalized Wilson-Cowan equations” that include both neural activity and correlations. The slides can be found here. References and a summary of the work can be found here. All videos of the talks can be found here.
Addendum: 17:44. Some typos in the talk were fixed.
Addendum: 18:25. I just realized I said something silly in my talk. The Legendre transform is an involution because the transform of the transform is the inverse. I said something completely inane instead.
Thanks for all the comments about the attributes of Python and Julia. It seems to me that the most prudent choice is to learn Python and Julia. However, what I would really like to know is just how fast these languages really are and here is the test. What I want to do is to fit networks of coupled ODEs (and PDEs) to data using MCMC (see here). This means I need a language that loops fast. An example in pseudo-Matlab code would be
for n = 1:N
for i = 1:T
y(i+1) = M\y(i)
Compare to data and set new parameters
where h is a parameter and M is some matrix (say 1000 dimensional), which is sometimes a Toeplitz matrix but not always. Hence, in each time step I need to invert a matrix, which can depend on time so I can’t always precompute, and do a matrix multiplication. Then in each parameter setting step I need to sum an objective function like the mean square error over all the data points. The code to do this in C or Fortran can be pretty complicated because you have to keep track of all the indices and call linear algebra libraries. I thus want something that has the simple syntax of Matlab but is as fast as C. Python seems to be too slow for our needs but maybe we haven’t optimized the code. Julia seems like the perfect fit but let me know if I am just deluded.