The myth of the heroic entrepreneur

I sometimes listen to the podcast “How I built this“, where host Guy Raz interviews successful entrepreneurs like Herb Kelleher, who founded Southwest Airlines, Reid Hoffman of Linkedin, Stacy Madison of Stacy’s Pita Chips, and so on. The story arc of each interview is similar – some scrappy undervalued person comes up with a novel idea and then against all odds succeeds by hard work, unrelenting drive, and taking risks. The podcast fully embraces the American myth of the hero entrepreneur although Guy tries to do his best to extend it beyond the stereotypical Silicon Valley one typified by Steve Jobs or Elon Musk. At the end of each interview Guy will ask the subject how much of their success was due to luck and how much due to their ingenuity and diligence. Most are humble or savvy enough to say that some large fraction of the success was luck. While I have no doubt that each successful entrepreneur is bright, hard working, and possesses unique skills, there are countless others who are equally talented and yet did not succeed. Each success story is an example of survivor bias. We sometimes hear about spectacular failures, like the Edsel , but rarely do we hear about the story of “How I almost built this”.

There is a stock market scam where you email blocks of 1024 prospective marks a prediction of what a stock will do that week. For one half, you say the stock will go up and for the other half you say it will go down. Then for the half for which you were correct, you do the same thing and half of them (one quarter of the original) will receive a correct prediction. Finally after ten weeks, one of the original 1024 will have received 10 correct predictions in a row and think that you are either a genius or have inside information and will be primed to sign up for whatever scam you are selling. The lucky (or unlucky) person is fooled because they lack the information that 1023 others did not receive perfect predictions. Obviously, this also works for sports predictions.

While, I think most success is luck there do seem to be outliers. Elon Musk seems to be one. He manages to invent new industries and succeed with regularity. Warren Buffet does seem to be able to beat the market. However, it is for us as a society to decide how winners should be rewarded. In many industries there is a winner-take-all dynamic, where the larger you get the easier it is to crush the competition. Mark Zuckerberg is clearly skilled but Facebook is dominant right now because it is a monopolist; it simply buys up as many competitors as it can. The same goes for Google, Amazon, and AT&T until the government broke it up. Finance works that way too. The bigger a bank or hedge fund gets, the easier it is to succeed. A small fluctuation that propels one firm a little ahead of the rest at the right time will be exponentially amplified. While, I do think it is a positive thing to reward success I don’t think the reward needs to be so disparate. Right now, a very small difference in ability (or none at all) and a lot of luck can be the difference between flying to your house in the Hamptons in a helicopter or selling hotdogs from a cart on Fifth Avenue.

A Covid-19 Manhattan Project

Right now there are hundreds if not thousands of Covid-19 models floating out there. Some are better than others and some have much more influence than others and the ones that have the most influence are not necessarily the best. There must be a better way of doing this. The world’s greatest minds convened in Los Alamos in WWII and built two atomic bombs. Metaphors get thrown around with reckless abandon but if there ever was a time for a Manhattan project, we need one now. Currently, the world’s scientific community has mobilized to come up with models to predict the spread and effectiveness of mitigation efforts, to produce new therapeutics and to develop new vaccines. But this is mostly going on independently.

Would it be better if we were to coordinate all of this activity. Right now at the NIH, there is an intense effort to compile all the research that is being done in the NIH Intramural Program and to develop a system where people can share reagents and materials. There are other coordination efforts going on worldwide as well.  This website contains a list of open source computational resources.  This link gives a list of data scientists who have banded together. But I think we need a world wide plan if we are ever to return to normal activity. Even if some nation has eliminated the virus completely within its borders there is always a chance of reinfection from outside.

In terms of models, they seem to have very weak predictive ability. This is probably because they are all over fit. We don’t really understand all the mechanisms of SARS-CoV-2 propagation. The case or death curves are pretty simple and as Von Neumann or Ulam or someone once said, “give me 4 parameters and I can fit an elephant, give me 5 and I can make it’s tail wiggle.” Almost any model can fit the curve but to make a projection into the future, you need to get the dynamics correct and this I claim, we have not done. What I am thinking of proposing is to have the equivalent of FDA approval for predictive models. However, instead of a binary decision of approval non-approval, people could submit there models for a predictive score based on some cross validation scheme or prediction on a held out set. You could also submit as many times as you wish to have your score updated. We could then pool all the models and produce a global Bayesian model averaged prediction and see if that does better. Let me know if you wish to participate or ideas on how to do this better.

The probability of life

Estimates from the Kepler satellite suggest that there could be at least 40 billion exoplanets capable of supporting life in our galaxy alone. Given that there are perhaps 2 trillion observable galaxies, that amounts to a lot of places where life could exist. And this is only counting biochemical life as we know it. There could also be non-biochemical lifeforms that we can’t even imagine. I chatted with mathematician Morris Hirsch a long ago in Berkeley and he whimsically suggested that there could be creatures living in astrophysical plasmas, which are highly nonlinear. So let’s be generous and say that there are 10^{12} planets for biochemical life to exist in the milky way and 10^{24} total in the observable universe.

Now, does this mean that extraterrestrial life is very likely? If you listen to most astronomers and astrobiologists these days, it would seem that life is guaranteed to be out there and we just need to build a big enough telescope to find it. There are several missions in the works to detect signatures of life like methane or oxygen in the atmosphere of an exoplanet. However, the likelihood of life outside of the earth is predicated on the probability of forming life anywhere and we have no idea what that number is. Although it only took about a billion years for life to form on earth that does not really gives us any information for how likely it will form elsewhere.

Here is a simple example to illustrate how life could grow exponentially fast after it forms but take an arbitrarily long time to form. Suppose the biomass of life on a planet, x, obeys the simple equation

\frac{dx}{dt} = -x(a-x) + \eta(t)

where \eta is a zero mean stochastic forcing with variance D. The deterministic equation has two fixed points, a stable one at x = 0 and an unstable one at x = a. Thus as long as x is smaller than a life will never form but as soon as x exceeds a it will grow (super-exponentially), which will then be damped by nonlinear processes that I don’t consider. We can rewrite this problem as

\frac{dx}{dt} = -\partial_x U(x) + \eta(t)

where U = a x^2/2 - x^3/3.

Untitled.001

The probability of life is then given by the probability of escape from the well U(x) given noisy forcing (thermal bath) provided by \eta. By Kramer’s escape rate (which you can look up or I’ll derive in a future post), the rate of escape is given approximately by e^{-E/D}, where E is the well depth, which is given by a^3/6. Thus the probability of life is exponentially damped by a factor of a^3/6D. Given that we know nothing about a or D the probability of life could be anything. For example, if we arbitrarily assign a = 10^{10} and D = 10^{-10}, we get a rate (or probability if we normalize correctly) for life to be on the order of e^{-10^{40}/6}, which is very small indeed and makes life very unlikely in the universe.

Now, how could it be that there is any life in the universe at all if it had such a low probability to form at all. Well, there is no reason that there could have been lots of universes, which is what string theory and cosmology now predict. Maybe it took 10^{100} universes to exist before life formed. I’m not saying that there is no one out there, I’m only saying that an N of one does not give us much information about how likely that is.

 

Addendum, 2019-04-07: As was pointed out in the comments the model as is allows for negative biomass.  This can be corrected by adding an infinite barrier at zero (i.e. restricting x to always be positive) and this won’t affect the result.  Depending on the barrier height and noise amplitude it can take an arbitrarily long time to escape.

New paper on learning in spiking neural networks

Chris Kim and I recently published a paper in eLife:

Learning recurrent dynamics in spiking networks.

Abstract

Spiking activity of neurons engaged in learning and performing a task show complex spatiotemporal dynamics. While the output of recurrent network models can learn to perform various tasks, the possible range of recurrent dynamics that emerge after learning remains unknown. Here we show that modifying the recurrent connectivity with a recursive least squares algorithm provides sufficient flexibility for synaptic and spiking rate dynamics of spiking networks to produce a wide range of spatiotemporal activity. We apply the training method to learn arbitrary firing patterns, stabilize irregular spiking activity in a network of excitatory and inhibitory neurons respecting Dale’s law, and reproduce the heterogeneous spiking rate patterns of cortical neurons engaged in motor planning and movement. We identify sufficient conditions for successful learning, characterize two types of learning errors, and assess the network capacity. Our findings show that synaptically-coupled recurrent spiking networks possess a vast computational capability that can support the diverse activity patterns in the brain.

 

The ideas that eventually led to this paper were seeded by two events. The first was about five years ago when I heard Dean Buonomano talk about his work with Rodrigo Laje on how to tame chaos in a network of rate neurons. Dean and Rodrigo expanded on the work by Larry Abbott and David Sussillo. The guiding idea from these two influential works stems from the “the echo state machine” or “reservoir computing”. Basically, this idea exploits the inherent chaotic dynamics of a recurrent neural network to project inputs onto diverse trajectories from which a simple learning rule can be deployed to extract a desired output.

To explain the details of this idea and our work, I need to go back to Minsky and Papert and their iconic 1969 book on feedforward neural networks (called perceptrons), who divided learning problems into two types.  The first type is linearly separable, which means that if you want to learn a classifier on some inputs, then a single linear plane can be drawn to separate the two input classes on the space of all inputs. The classic example is the OR function.  When given inputs (x_1,x_2) = (0,1), (1,0), (1,1), it outputs 1 and when given (0,0) it outputs 0. If we consider the inputs on the x-y plane then we can easily draw a line separating point (0,0) from the rest. The classic linearly non-separable problem is exclusive OR or XOR, where (0,0) and (1,1) map to 0, while (0,1) and (1,0) map to 1. In this case, no single straight line can separate the points. Minsky and Papert showed that a single layer perceptron, where the only thing you learn is the connection strengths from input to output, can never learn a linearly inseparable problem. Most interesting and nontrivial problems are inseparable.

Mathematically, we can write a perceptron as x_i^{\alpha+1} = \sum w_{ij}^{\alpha}f^{\alpha}(x_j^{\alpha}), where x_i^{\alpha} is the value of neuron i in layer \alpha and f is a connection or gain function. The inputs are x_i^{0} and the output are x_i^{L}. The perceptron problem is to find a set of w‘s such that the output layer gives you the right value to a task posed in the input layer, e.g. perform XOR. A single layer perceptron is then simply x_i^{1} = \sum w_{ij}^{\alpha}f^{\alpha}(x_j^{0}). Now of course we could always design f to do what you ask but since we are not training f, it needs to be general enough for all problems and is usually chosen to be a monotonic increasing function with a threshold. Minsky and Papert showed that the single layer problem is equivalent to the matrix equation w = M v and this can never solve a linearly inseparable problem, since it defines a plane. If f is a linear function then a multiple layer problem reduces to a single layer problem so what makes perceptron learning and deep learning possible is that there are multiple layers and f is a nonlinear function. Minsky and Paper also claimed that there was no efficient way to train a multi-layered network and this killed the perceptron for more than a decade until backpropagation was discovered and rediscovered in the 1980’s. Backprop rekindled a flurry of neural network activity and then died down because other machine learning methods proved better at that time. The recent ascendancy of deep learning is the third wave of perceptron interest and was spurred by the confluence of 1) more computing power via the GPU, 2) more data, and 3) finding the right parameter tweaks to make perceptron learning work much much better. Perceptrons still can’t solve everything, e.g. NP complete problems are still NP complete, they are still far from being optimal, and they do not discount a resurgence or invention of another method.

The idea of reservoir computing is to make a linearly inseparable problem separable by processing the inputs. The antecedent is the support vector machine or kernel method, which projects the data to a higher dimension such that an inseparable problem is separable. In the XOR example, if we can add a dimension and map (0,0) and (1,1) to (0,0,0) and (1,1,0) and map (1,0) and (0,1) to (1,0,1) and (0,1,1) then the problem is separable. The hard part is finding the mapping or kernels to do this. Reservoir computing uses the orbit of a chaotic system as a kernel. Chaos, by definition, causes initial conditions to diverge exponentially and by following a trajectory for as long as you want you can make as high dimensional a space as you want; in high enough dimensions all points are linearly separable if they are far enough apart. However, the defining feature of chaos is also a bug because any slight error in your input will also diverge exponentially and thus the kernel is inherently unstable. The Sussillo and Abbott breakthrough was that they showed you could have your cake and eat it too. They stabilized the chaos using feedback and/or learning while still preserving the separating property. This then allowed training of the output layer to work extremely efficiently. Laje and Bunomano took this one step further by showing that you could directly train the recurrent network to stabilize chaos. My thought at that time was why are chaotic patterns so special? Why can’t you learn any pattern?

The second pivotal event came in a conversation with the ever so insightful Kechen Zhang when I gave a talk at Hopkins. In that conversation, we discussed how perhaps it was possible that any internal neuron mechanism, such as nonlinear dendrites, could be reproduced by adding more neurons to the network and thus from an operational point of view it didn’t matter if you had the biology correct. There would always exist a recurrent network that could do your job. The problem was to find the properties that make a network “universal” in that it could reproduce the dynamics of any other network or any dynamical system. After this conversation, I was certain that this was true and began spouting this idea to anyone who would listen.

One of the people I mentioned this to was Chris Kim when he contacted me for a job in my lab in 2015. Later Chris told me that he thought my idea was crazy or impossible to prove but he took the job anyway because he wanted to be in Maryland where his family lived. So, upon his arrival in the fall of 2016, I tasked him with training a recurrent neural network to follow arbitrary patterns. I also told him that we should do it on a network of spiking neurons. I thought that doing this on a set of rate neurons would be too easy or already done so we should move to spiking neurons. Michael Buice and I had just recently published our paper on computing finite size corrections to a spiking network of coupled theta neurons with linear synapses. Since we had good control of the dynamics of this network, I thought it would be the ideal system. The network has the form

\dot\theta_i = f(\theta_i, I_i, u_i)

\tau_s \dot u_i= - u_i+ 2 \sum_j w_{ij}\delta(\theta_j-\pi)

Whenever neuron j crosses the angle \pi it gives an impulse to neuron i with weight scaled by w_{ij}, which can be positive or negative. The idea is to train the synaptic drive u_i(t) or the firing rate of neuron i to follow an arbitrary temporal pattern. Despite his initial skepticism, Chris actually got this to work in less than six months. It took us another year or so to understand how and why.

In our paper, we show that if the synapses are fast enough, i.e. \tau_s is small enough, and the patterns are diverse enough, then any set of patterns can be learned. The reason, which is explained in mathematical detail in the paper, is that if the synapses are fast enough, then the synaptic drive acts like a quasi-static function of the inputs and thus the spiking problem reduces to the rate problem

\tau_s \dot u_i= - u_i+ \sum_j w_{ij}g(u_j)

where g is the frequency-input curve of a theta neuron. Then the problem is about satisfying the synaptic drive equation, which given the linearity in the weights, boils down to whether \tau_s \dot u_i + u_i is in the space spanned by \sum w_{ij} g(u_j),  which we show is always possible as long as the desired patterns imposed on u_i(t) are uncorrelated or linearly independent enough. However, there is a limit to how long the patterns can be, which is governed by the number of entries in w_{ij}, which is limited by the number of neurons. The diversity of patterns limitation can also be circumvented by adding auxiliary neurons. If you wanted some number of neurons to do the same thing, you just need to include a lot of other neurons that do different things. A similar argument can be applied to the time averaged firing rate (on any time window) of a given neuron. I now think we have shown that a recurrent network of really simple spiking neurons is dynamically universal. All you need are lots of fast neurons.

 

Addendum: The dates of events may not all be correct. I think my conversation with Kechen came before Dean’s paper but my recollection is foggy. Memories are not reliable.

 

Mosquito experiment concluded

It’s hard to see from the photo but when I checked my bucket after a week away, there were definitely a few mosquito larvae swimming around. There was also an impressive biofilm on the bottom of the bucket. It took less than a month for mosquitoes to breed in a newly formed pool of stagnant water. My son also noticed that a nearby flower pot with water only a few centimeters deep also had larvae. So the claims that mosquitos will breed in tiny amounts of stagnant water is true.IMG_3158

Insider trading

I think one of the main things that has fueled a backlash against the global elites is the (correct) perception that they play by different rules. When they make financial mistakes, they get bailed out with taxpayer dollars with no consequences. Gains are privatized and losses are socialized. Another example is insider trading where people profit from securities transactions using nonpublic information. While there have been several high profile cases in recent years (e.g. here is a Baltimore example), my guess is that insider trading is rampant since it is so easy to do and so hard to detect. The conventional wisdom for combating insider trading is stronger enforcement and penalties. However, my take is that this will just lead to a situation where small time insider traders get squeezed out while the sophisticated ones who have more resources will continue. This is an example where a regulation creates a monopoly or economic rent opportunity.

Aside from the collapse of morality that may come with extreme wealth and power (e.g. listen here), I also think that insider traders rationalize their activities because they don’t think that it hurts anyone even though there is an obvious victim. For example, if someone gets inside information that a highly touted drug has failed to win approval from the FDA then they can short the stock (or buy put options), which is an agreement or opportunity to sell the stock at the current price in the future. When the stock decreases in value after the announcement, they just buy the stock at the lower price, resell at the higher price, and reap the profits. The victim is the counter party to the trade who could be a rich trader but could also be someone’s pension fund or employees of the company.

Now the losing party or a regulatory agency could suspect a case of insider trading but to prove it would require someone confessing or finding an email or phone recording of the information passed. They could also try to set up a sting operation to try to catch serial violators. All of these things are difficult and costly. The alternative may seem ridiculous but I think the best solution may be to make insider trading legal. If it were legal then several things would happen. More people would do it which would drive down the prices for the trades, the information would more likely be leaked to the public since people would not be afraid of sharing it, and people would be more careful in making trades prior to big decisions because the other party may have more information than they do. Companies would be responsible for policing people in their firms that leak information. By making insider information legal, the rent created by regulations would be reduced.

Selection of the week

Sorry for the long radio silence. However, I was listening to the radio yesterday and this version of the Double Violin Concerto in D minor, BWV 1043 by JS Bach came on and I sat in my car in a hot parking lot listening to it. It’s from a forty year old EMI recording with violinists Itzhak Perlman and Pinchas Zukerman with Daniel Barenboim conducting the English Chamber Orchestra. I’ve been limiting my posts to videos of live performances but sometimes classic recordings should be given their due and this is certainly a classic. Even though I posted a version with Oistrakh and Menuhin before, I just had to share this.

What Uber doesn’t get

You may have heard that ride hailing services Uber and Lyft have pulled out of Austin, TX because they refuse to be regulated. You can read about the details here. The city wanted to fingerprint drivers, as they do for taxis, but Uber and Lyft forced a referendum on the city to make them exempt or else they would leave. The city voted against them. I personally use Uber and really like it but what I like about Uber has nothing to do with Uber per se or regulation. What I like is 1) no money needs to be exchanged especially the tip and 2) the price is essentially fixed so it is in the driver’s interest to get me to my destination as fast as possible. I have been taken on joy rides far too many times by taxi drivers trying to maximize the fare and I never know how much to tip. However, these are things that regulated taxis could implement and should implement. I do think it is extremely unfair that Uber can waltz into a city like New York and compete against highly regulated taxis, who have paid as much as a million dollars for the right to operate. Uber and Lyft should collaborate with existing taxi companies rather than trying to put them out of business. There was a reason to regulate taxis (e.g. safety, traffic control, fraud protection), and that should apply whether I hail a cab on the street or I use a smartphone app.

New review paper on GWAS

Comput Struct Biotechnol J. 2015 Nov 23;14:28-34
Uncovering the Genetic Architectures of Quantitative Traits.
Lee JJ, Vattikuti S, Chow CC.

Abstract
The aim of a genome-wide association study (GWAS) is to identify loci in the human genome affecting a phenotype of interest. This review summarizes some recent work on conceptual and methodological aspects of GWAS. The average effect of gene substitution at a given causal site in the genome is the key estimand in GWAS, and we argue for its fundamental importance. Implicit in the definition of average effect is a linear model relating genotype to phenotype. The fraction of the phenotypic variance ascribable to polymorphic sites with nonzero average effects in this linear model is called the heritability, and we describe methods for estimating this quantity from GWAS data. Finally, we show that the theory of compressed sensing can be used to provide a sharp estimate of the sample size required to identify essentially all sites contributing to the heritability of a given phenotype.
KEYWORDS:
Average effect of gene substitution; Compressed sensing; GWAS; Heritability; Population genetics; Quantitative genetics; Review; Statistical genetics

Phasers on stun

The recent controversy over police shootings of unarmed citizens has again stirred up the debate over gun control. However, Shashaank Vattikuti points out that there is another option and that is for the police to carry nonlethal weapons like phasers with a stun option. Although, an effective long range nonlethal weapon currently does not exist (tasers just don’t cut it), a billionaire like Mark Zuckerberg, Peter Thiel, or Elon Musk could start a company to develop one. New York Times columnist Joe Nocera has suggested that Michael Bloomberg buy a gun company. There are so many guns already in existence that barring an unlikely confiscation scheme there is probably no way to get rid of them. The only way to reduce gun violence at this point is for a superior technology to make them obsolete. Hobbyists and collectors would still own guns, just as there are sword collectors, but those who own guns for protection would probably slowly switch over. However, the presence of a nonlethal option could lead to more people shooting each other so strong laws regarding their use would need to accompany their introduction.

 

 

 

 

Optimizing dynastic succession genetically

The traditional rule for succession in a monarchy is to pass from father to son. Much of King Henry VIII’s spousal folly was over his anxiety for producing an heir. However, if the basis of being a successful ruler has a genetic component then this would be the least optimal way to run an empire. For diploid sexually reproducing organisms, such as humans, the offspring inherits equal numbers of chromosomes from both parents and classically the genetic relationship or kinship coefficient between parent and child is assigned the value of 1/2.  However, there is a crucially important asymmetry in that males are heterozygous in the sex chromosomes, i.e. they inherit an X chromosome from their mothers and a Y from their fathers, while females are homozygous, inheriting an X from both. Now the X is about 100 million base pairs longer than the Y, which accounts for about 2 percent of the (father’s) genome (counting chromosomes separately). Additionally, given that everyone has at least one X while only males have a Y, the Y cannot contain genes that are crucial for survival and in fact there are much fewer genes on the Y than the X (~800 vs ~50). The Y has been shrinking in mammals over time and there is a debate about its importance and eventual fate (e.g. see here).

We can compute the sex chromosome adjusted genetic correlation coefficients between parents and children.  Let the father’s genetic content be F=F_S + F_D, where F_S is the genetic content passed to sons (half of the autosomes plus the Y chromosome) and F_D is that passed to daughters (half of the autosomes plus the X) and similarly M=M_S+M_D. The son genetic content is then S=F_S+M_S and daughter is D=F_D+M_D. We can treat F and M as a string of random variables with variance 1/(length of mother’s genome) and assuming that the genetic correlation between fathers and mothers is zero (i.e. no inbreeding and no assortative mating) then the correlation coefficient between father and son is

\langle FS\rangle = \frac{ \langle F_S^2\rangle}{\sqrt{\langle F_S^2\rangle+\langle F_D^2\rangle}\sqrt{\langle F_S^2\rangle+\langle M_S^2\rangle}}=\frac{ 1}{\sqrt{1+\langle F_D^2\rangle/\langle F_S^2\rangle}\sqrt{1+\langle M_S^2\rangle/\langle F_S^2\rangle}}

and similarly:

\langle FD\rangle =\frac{ 1}{\sqrt{1+\langle F_S^2\rangle/\langle F_D^2\rangle}\sqrt{1+\langle M_D^2\rangle/\langle F_D^2\rangle}}

\langle MS\rangle =\frac{ 1}{\sqrt{1+\langle M_D^2\rangle/\langle M_S^2\rangle}\sqrt{1+\langle F_S^2\rangle/\langle M_S^2\rangle}}

\langle MD\rangle =\frac{ 1}{\sqrt{1+\langle M_S^2\rangle/\langle M_D^2\rangle}\sqrt{1+\langle F_D^2\rangle/\langle M_D^2\rangle}}

Now, if you assume that genetic content is homogeneous among all chromosomes then that would mean that the genetic material that fathers pass on to sons is 0.48 of the total and thus \langle F_S^2\rangle = 0.48 while \langle F_D^2\rangle = 0.5, \langle M_S^2\rangle = 0.5, and \langle M_D^2\rangle = 0.5 implying that \langle FS\rangle = 0.49\langle FD\rangle = 0.51\langle MS\rangle = 0.51\langle MD\rangle = 0.5 . Hence, parents are more correlated with their children of the opposite sex and fathers are least correlated with their sons. These numbers also probably underestimate the asymmetry. If genetic relationship is the most important factor for royal succession then a dynasty based on opposite sex succession will be more logical than the father to son model.

 

 

Selection of the week

Here’s a concert I wish I could have attended. A young Glenn Gould (greatest Bach interpreter since Bach although I heard Felix Mendelssohn was pretty good too) with Leonard Bernstein in his prime conducting the New York Philharmonic (I think) playing the first movement of JS Bach’s Keyboard Concerto in D minor, BWV 1052.

There is a famous incident of a Gould performance of the Brahms Piano Concerto 1 when he and Bernstein had such a disagreement on the tempo (Gould wanted to play it really slow) that Bernstein got up on stage beforehand to make a disclaimer. That performance with speech is recorded and someone has uploaded it to YouTube.

Gould gave up performing in 1964 at age 31. Notice how low he likes to sit at the piano. He used to bring his chair with him when he toured. One of my favourite films is “Thirty two short films about Glenn Gould,” which I definitely recommend seeing.