Optimizing luck

Each week on the NPR podcast How I Built This, host Guy Raz interviews a founder of a successful enterprise like James Dyson or Ben and Jerry. At the end of most segments, he’ll ask the founder how much of their success do they attribute to luck and how much to talent. In most cases, the founder will modestly say that luck played a major role but some will add that they did take advantage of the luck when it came. One common thread for these successful people is that they are extremely resilient and aren’t afraid to try something new when things don’t work at first.

There are two ways to look at this. On the one hand there is certainly some selection bias. For each one of these success stories there are probably hundreds of others who were equally persistent and worked equally hard but did not achieve the same success. It is like the infamous con where you send 1024 people a two outcome prediction about a stock.  The prediction will be correct in 512 of them so the next week you send those people another prediction and so on. After 10 weeks, one person will have received the correct prediction 10 times in a row and will think you are infallible. You then charge them a King’s ransom for the next one.

Yet, it may be possible to optimize luck and you can see this with Jensen’s inequality. Suppose x represents some combination of your strategy and effort level and \phi(x) is your outcome function.  Jensen’s inequality states that the average or expectation value of a convex function (e.g. a function that bends upwards) is greater than (or equal to) the function of the expectation value. Thus, E(\phi(x)) \ge \phi(E(x)). In other words, if your outcome function is convex then your average outcome will be larger just by acting in a random fashion. During “convex” times, the people who just keep trying different things will invariably be more successful than those who do nothing. They were lucky (or they recognized) that their outcome was convex but their persistence and willingness to try anything was instrumental in their success. The flip side is that if they were in a nonconvex era, their random actions would have led to a much worse outcome. So, do you feel lucky?

Advertisements

Catch-22 of our era

The screen on my wife’s iPhone was shattered this week and she had not backed up the photos. The phone seems to still be functioning otherwise so we plugged it into the computer to try to back it up but it requires us to unlock the phone and we can’t enter in the password. My wife refused to pay the 99 cents or whatever Apple charges to increase the disk space for iCloud to automatically back up the phone, so I suggested we just pay the ransom money and then the phone will back up automatically. I currently pay both Apple and Dropbox extortion money. However, since she hadn’t logged onto iCloud in maybe ever, it sent a code to her phone under the two-factor authentication scheme to type in to the website, but of course we can’t see it on her broken screen so that idea is done. We called Apple and they said you could try to change the number on her iCloud account to my phone but that was two days ago and they haven’t complied. So my wife gave up and tried to order a new phone. Under the new system of her university, which provides her phone, she can get a phone if she logs onto this site to request it. The site requires VPN and in order to get VPN she needs to, you guessed it, type in a code sent to her phone. So you need a functioning phone to order a new phone. Basically, tech products are not very good. Software still kind of sucks and is not really improving. My Apple Mac is much worse now than it was 10 years ago. I still have trouble projecting stuff on a screen. I will never get into a self driving car made by any tech company. I’ll wait for Toyota to make one; my (Japanese) car always works (my Audi was terrible).

Missing the trend

I have been fortunate to have been born at a time when I had the opportunity to witness the birth of several of the major innovations that shape our world today.  I have also managed to miss out on capitalizing on every single one of them. You might make a lot of money betting against what I think.

I was a postdoctoral fellow in Boulder, Colorado in 1993 when my very tech savvy advisor John Cary introduced me and his research group to the first web browser Mosaic shortly after it was released. The web was the wild west in those days with just a smattering of primitive personal sites authored by early adopters. The business world had not discovered the internet yet. It was an unexplored world and people were still figuring out how to utilize it. I started to make a list of useful sites but unlike Jerry Yang and David Filo, who immediately thought of doing the same thing and forming a company, it did not remotely occur to me that this activity could be monetized. Even though I struggled to find a job in 1994, was fairly adept at programming, watched the rise of Yahoo! and the rest of the internet startups, and had friends at Stanford and Silicon Valley, it still did not occur to me that perhaps I could join in too.

Just months before impending unemployment, I managed to talk my way into being the first post doc of Jim Collins, who just started as a non-tenure track research assistant professor at Boston University.  Midway through my time with Jim, we had a meeting with Charles Cantor, who was a professor at BU then, about creating engineered organisms that could eat oil. Jim subsequently recruited graduate student Tim Gardner, now CEO of Riffyn, to work on this idea. I thought we should create a genetic Hopfield network and I showed Tim how to use XPP to simulate the various models we came up with. However, my idea seemed too complicated to implement biologically so when I went to Switzerland to visit Wulfram Gerstner at the end of 1997,  Tim and Jim, freed from my meddling influence, were able create the genetic toggle switch and the field of synthetic biology was born.

I first learned about Bitcoin in 2009 and had even thought about mining some. However, I then heard an interview with one of the early developers, Gavin Andresen, and he failed to understand that because the supply of Bitcoins is finite, prices denominated in it would necessarily deflate over time. I was flabbergasted that he didn’t comprehend the basics of economics and was convinced that Bitcoin would eventually fail. Still, I could have mined thousands of Bitcoins on a laptop back then, which would be worth tens of millions today.  I do think blockchains are an important innovation and my former post-bac fellow Wally Xie is even the CEO of the blockchain startup QChain. Although I do not know where cryptocurrencies and blockchains will be in a decade, I do know that I most likely won’t have a role.

I was in Pittsburgh during the late nineties/early 2000’s in one of the few places where neural networks/deep learning, still called connectionism, was king. Geoff Hinton had already left Carnegie Mellon for London by the time I arrived at Pitt but he was still revered in Pittsburgh and I met him in London when I visited UCL. I actually thought the field had great promise and even tried to lobby our math department to hire someone in machine learning for which I was summarily dismissed and mocked. I recruited Michael Buice to work on the path integral formulation for neural networks because I wanted to write down a neural network model that carried both rate and correlation information so I could implement a correlation based learning rule. Michael even proposed that we work on an algorithm to play Go but obviously I demurred. Although, I missed out on this current wave of AI hype, and probably wouldn’t have made an impact anyway, this is the one area where I may get a second chance in the future.

 

 

Technology and inference

In my previous post, I gave an example of how fake news could lead to a scenario of no update of posterior probabilities. However, this situation could occur just from the knowledge of technology. When I was a child, fantasy and science fiction movies always had a campy feel because the special effects were unrealistic looking. When Godzilla came out of Tokyo Harbour it looked like little models in a bathtub. The Creature from the Black Lagoon looked like a man in a rubber suit. I think the first science fiction movie that looked astonishing real was Stanley Kubrick’s 1968 masterpiece 2001: A Space Odyssey, which adhered to physics like no others before and only a handful since. The simulation of weightlessness in space was marvelous and to me the ultimate attention to detail was the scene in the rotating space station where a mild curvature in the floor could be perceived. The next groundbreaking moment was the 1993 film Jurassic Park, which truly brought dinosaurs to life. The first scene of a giant sauropod eating from a tree top was astonishing. The distinction between fantasy and reality was forever gone.

The effect of this essentially perfect rendering of anything into a realistic image is that we now have a plausible reason to reject any evidence. Photographic evidence can be completely discounted because the technology exists to create completely fabricated versions. This is equally true of audio tapes and anything your read on the Internet. In Bayesian terms, we now have an internal model or likelihood function that any data could be false. The more cynical you are the closer this constant is to one. Once the likelihood becomes insensitive to data then we are in the same situation as before. Technology alone, in the absence of fake news, could lead to a world where no one ever changes their mind. The irony could be that this will force people to evaluate truth the way they did before such technology existed, which is that you believe people (or machines) that you trust through building relationships over long periods of time.

Fake news and beliefs

Much has been written of the role of fake news in the US presidential election. While we will never know how much it actually contributed to the outcome, as I will show below, it could certainly affect people’s beliefs. Psychology experiments have found that humans often follow Bayesian inference – the probability we assign to an event or action is updated according to Bayes rule. For example, suppose P(T) is the probability we assign to whether climate change is real; P(F) = 1-P(T) is our probability that climate change is false. In the Bayesian interpretation of probability, this would represent our level of belief in climate change. Given new data D (e.g. news), we will update our beliefs according to

P(T|D) = \frac{P(D|T) P(T)}{P(D)}

What this means is that our posterior probability or belief that climate change is true given the new data, P(T|D), is equal to the probability that the new data came from our internal model of a world with climate change (i.e. our likelihood), P(D|T), multiplied by our prior probability that climate change is real, P(T), divided by the probability of obtaining such data in all possible worlds, P(D). According to the rules of probability, the latter is given by P(D) = P(D|T)P(T) + P(D|F)P(F), which is the sum of the probability the data came from a world with climate change and that from one without.

This update rule can reveal what will happen in the presence of new data including fake news. The first thing to notice is that if P(T) is zero, then there is no update. In this binary case, this means that if we believe that climate change is absolutely false or true then no data will change our mind. In the case of multiple outcomes, any outcome with zero prior (has no support) will never change. So if we have very specific priors, fake news is not having an impact because no news is having an impact. If we have nonzero priors for both true and false then if the data is more likely from our true model then our posterior for true will increase and vice versa. Our posteriors will tend towards the direction of the data and thus fake news could have a real impact.

For example, suppose we have an internal model where we expect the mean annual temperature to be 10 degrees Celsius with a standard deviation of 3 degrees if there is no climate change and a mean of 13 degrees with climate change. Thus if the reported data is mostly centered around 13 degrees then our belief of climate change will increase and if it is mostly centered around 10 degrees then it will decrease. However, if we get data that is spread uniformly over a wide range then both models could be equally likely and we would get no update. Mathematically, this is expressed as – if P(D|T)=P(D|F) then P(D) = P(D|T)(P(T)+P(F))= P(D|T). From the Bayesian update rule, the posterior will be identical to the prior. In a world of lots of misleading data, there is no update. Thus, obfuscation and sowing confusion is a very good strategy for preventing updates of priors. You don’t need to refute data, just provide fake examples and bury the data in a sea of noise.

 

Revolution vs incremental change

I think that the dysfunction and animosity we currently see in the US political system and election is partly due to the underlying belief that meaningful change cannot be effected through slow evolution but rather requires an abrupt revolution where the current system is torn down and rebuilt. There is some merit to this idea. Sometimes the structure of a building can be so damaged that it would be easier to demolish and rebuild rather than repair and renovate. Mathematically, this can be expressed as a system being stuck in a local minimum (where getting to the global minimum is desired). In order to get to the true global optimum, you need to get worse before you can get better. When fitting nonlinear models to data, dealing with local minima is a major problem and the reason that a stochastic MCMC algorithm that does occasionally go uphill works so much better than gradient descent, which only goes downhill.

However, the recent success of deep learning may dispel this notion when the dimension is high enough. Deep learning, which is a multi-layer neural network that can have millions of parameters is the quintessence of a high dimensional model. Yet, it seems to be able to work just fine using the back propagation algorithm, which is a form of gradient descent. The reason could be that in high enough dimensions, local minima are rare and the majority of critical points (places where the slope is zero) are high dimensional saddle points, where there is always a way out in some direction. In order to have a local minimum, the matrix of second derivatives in all directions (i.e. Hessian matrix) must be positive definite (i.e. have all positive eigenvalues). As the dimension of the matrix gets larger and larger there are simply more ways for one eigenvalue to be negative and that is all you need to provide an escape hatch. So in a high dimensional system, gradient descent may work just fine and there could be an interesting tradeoff between a parsimonious model with few parameters but difficult to fit versus a high dimensional model that is easy to fit. Now the usual danger of having too many parameters is that you overfit and thus you fit the noise at the expense of the signal and have no ability to generalize. However, deep learning models seem to be able to overcome this limitation.

Hence, if the dimension is high enough evolution can work while if it is too low then you need a revolution. So the question is what is the dimensionality of governance and politics. In my opinion, the historical record suggests that revolutions generally do not lead to good outcomes and even when they do small incremental changes seem to get you to a similar place. For example, the US and France had bloody revolutions while Canada and the England did not and they all have arrived at similar liberal democratic systems. In fact, one could argue that a constitutional monarchy (like Canada and Denmark), where the head of state is a figure head is more stable and benign than a republic, like Venezuela or Russia (e.g. see here). This distinction could have pertinence for the current US election if a group of well-meaning people, who believe that the two major parties do not have any meaningful difference, do not vote or vote for a third party. They should keep in mind that incremental change is possible and small policy differences can and do make a difference in people’s lives.