Catch-22 of our era

The screen on my wife’s iPhone was shattered this week and she had not backed up the photos. The phone seems to still be functioning otherwise so we plugged it into the computer to try to back it up but it requires us to unlock the phone and we can’t enter in the password. My wife refused to pay the 99 cents or whatever Apple charges to increase the disk space for iCloud to automatically back up the phone, so I suggested we just pay the ransom money and then the phone will back up automatically. I currently pay both Apple and Dropbox extortion money. However, since she hadn’t logged onto iCloud in maybe ever, it sent a code to her phone under the two-factor authentication scheme to type in to the website, but of course we can’t see it on her broken screen so that idea is done. We called Apple and they said you could try to change the number on her iCloud account to my phone but that was two days ago and they haven’t complied. So my wife gave up and tried to order a new phone. Under the new system of her university, which provides her phone, she can get a phone if she logs onto this site to request it. The site requires VPN and in order to get VPN she needs to, you guessed it, type in a code sent to her phone. So you need a functioning phone to order a new phone. Basically, tech products are not very good. Software still kind of sucks and is not really improving. My Apple Mac is much worse now than it was 10 years ago. I still have trouble projecting stuff on a screen. I will never get into a self driving car made by any tech company. I’ll wait for Toyota to make one; my (Japanese) car always works (my Audi was terrible).


Missing the trend

I have been fortunate to have been born at a time when I had the opportunity to witness the birth of several of the major innovations that shape our world today.  I have also managed to miss out on capitalizing on every single one of them. You might make a lot of money betting against what I think.

I was a postdoctoral fellow in Boulder, Colorado in 1993 when my very tech savvy advisor John Cary introduced me and his research group to the first web browser Mosaic shortly after it was released. The web was the wild west in those days with just a smattering of primitive personal sites authored by early adopters. The business world had not discovered the internet yet. It was an unexplored world and people were still figuring out how to utilize it. I started to make a list of useful sites but unlike Jerry Yang and David Filo, who immediately thought of doing the same thing and forming a company, it did not remotely occur to me that this activity could be monetized. Even though I struggled to find a job in 1994, was fairly adept at programming, watched the rise of Yahoo! and the rest of the internet startups, and had friends at Stanford and Silicon Valley, it still did not occur to me that perhaps I could join in too.

Just months before impending unemployment, I managed to talk my way into being the first post doc of Jim Collins, who just started as a non-tenure track research assistant professor at Boston University.  Midway through my time with Jim, we had a meeting with Charles Cantor, who was a professor at BU then, about creating engineered organisms that could eat oil. Jim subsequently recruited graduate student Tim Gardner, now CEO of Riffyn, to work on this idea. I thought we should create a genetic Hopfield network and I showed Tim how to use XPP to simulate the various models we came up with. However, my idea seemed too complicated to implement biologically so when I went to Switzerland to visit Wulfram Gerstner at the end of 1997,  Tim and Jim, freed from my meddling influence, were able create the genetic toggle switch and the field of synthetic biology was born.

I first learned about Bitcoin in 2009 and had even thought about mining some. However, I then heard an interview with one of the early developers, Gavin Andresen, and he failed to understand that because the supply of Bitcoins is finite, prices denominated in it would necessarily deflate over time. I was flabbergasted that he didn’t comprehend the basics of economics and was convinced that Bitcoin would eventually fail. Still, I could have mined thousands of Bitcoins on a laptop back then, which would be worth tens of millions today.  I do think blockchains are an important innovation and my former post-bac fellow Wally Xie is even the CEO of the blockchain startup QChain. Although I do not know where cryptocurrencies and blockchains will be in a decade, I do know that I most likely won’t have a role.

I was in Pittsburgh during the late nineties/early 2000’s in one of the few places where neural networks/deep learning, still called connectionism, was king. Geoff Hinton had already left Carnegie Mellon for London by the time I arrived at Pitt but he was still revered in Pittsburgh and I met him in London when I visited UCL. I actually thought the field had great promise and even tried to lobby our math department to hire someone in machine learning for which I was summarily dismissed and mocked. I recruited Michael Buice to work on the path integral formulation for neural networks because I wanted to write down a neural network model that carried both rate and correlation information so I could implement a correlation based learning rule. Michael even proposed that we work on an algorithm to play Go but obviously I demurred. Although, I missed out on this current wave of AI hype, and probably wouldn’t have made an impact anyway, this is the one area where I may get a second chance in the future.



Technology and inference

In my previous post, I gave an example of how fake news could lead to a scenario of no update of posterior probabilities. However, this situation could occur just from the knowledge of technology. When I was a child, fantasy and science fiction movies always had a campy feel because the special effects were unrealistic looking. When Godzilla came out of Tokyo Harbour it looked like little models in a bathtub. The Creature from the Black Lagoon looked like a man in a rubber suit. I think the first science fiction movie that looked astonishing real was Stanley Kubrick’s 1968 masterpiece 2001: A Space Odyssey, which adhered to physics like no others before and only a handful since. The simulation of weightlessness in space was marvelous and to me the ultimate attention to detail was the scene in the rotating space station where a mild curvature in the floor could be perceived. The next groundbreaking moment was the 1993 film Jurassic Park, which truly brought dinosaurs to life. The first scene of a giant sauropod eating from a tree top was astonishing. The distinction between fantasy and reality was forever gone.

The effect of this essentially perfect rendering of anything into a realistic image is that we now have a plausible reason to reject any evidence. Photographic evidence can be completely discounted because the technology exists to create completely fabricated versions. This is equally true of audio tapes and anything your read on the Internet. In Bayesian terms, we now have an internal model or likelihood function that any data could be false. The more cynical you are the closer this constant is to one. Once the likelihood becomes insensitive to data then we are in the same situation as before. Technology alone, in the absence of fake news, could lead to a world where no one ever changes their mind. The irony could be that this will force people to evaluate truth the way they did before such technology existed, which is that you believe people (or machines) that you trust through building relationships over long periods of time.

Fake news and beliefs

Much has been written of the role of fake news in the US presidential election. While we will never know how much it actually contributed to the outcome, as I will show below, it could certainly affect people’s beliefs. Psychology experiments have found that humans often follow Bayesian inference – the probability we assign to an event or action is updated according to Bayes rule. For example, suppose P(T) is the probability we assign to whether climate change is real; P(F) = 1-P(T) is our probability that climate change is false. In the Bayesian interpretation of probability, this would represent our level of belief in climate change. Given new data D (e.g. news), we will update our beliefs according to

P(T|D) = \frac{P(D|T) P(T)}{P(D)}

What this means is that our posterior probability or belief that climate change is true given the new data, P(T|D), is equal to the probability that the new data came from our internal model of a world with climate change (i.e. our likelihood), P(D|T), multiplied by our prior probability that climate change is real, P(T), divided by the probability of obtaining such data in all possible worlds, P(D). According to the rules of probability, the latter is given by P(D) = P(D|T)P(T) + P(D|F)P(F), which is the sum of the probability the data came from a world with climate change and that from one without.

This update rule can reveal what will happen in the presence of new data including fake news. The first thing to notice is that if P(T) is zero, then there is no update. In this binary case, this means that if we believe that climate change is absolutely false or true then no data will change our mind. In the case of multiple outcomes, any outcome with zero prior (has no support) will never change. So if we have very specific priors, fake news is not having an impact because no news is having an impact. If we have nonzero priors for both true and false then if the data is more likely from our true model then our posterior for true will increase and vice versa. Our posteriors will tend towards the direction of the data and thus fake news could have a real impact.

For example, suppose we have an internal model where we expect the mean annual temperature to be 10 degrees Celsius with a standard deviation of 3 degrees if there is no climate change and a mean of 13 degrees with climate change. Thus if the reported data is mostly centered around 13 degrees then our belief of climate change will increase and if it is mostly centered around 10 degrees then it will decrease. However, if we get data that is spread uniformly over a wide range then both models could be equally likely and we would get no update. Mathematically, this is expressed as – if P(D|T)=P(D|F) then P(D) = P(D|T)(P(T)+P(F))= P(D|T). From the Bayesian update rule, the posterior will be identical to the prior. In a world of lots of misleading data, there is no update. Thus, obfuscation and sowing confusion is a very good strategy for preventing updates of priors. You don’t need to refute data, just provide fake examples and bury the data in a sea of noise.


Revolution vs incremental change

I think that the dysfunction and animosity we currently see in the US political system and election is partly due to the underlying belief that meaningful change cannot be effected through slow evolution but rather requires an abrupt revolution where the current system is torn down and rebuilt. There is some merit to this idea. Sometimes the structure of a building can be so damaged that it would be easier to demolish and rebuild rather than repair and renovate. Mathematically, this can be expressed as a system being stuck in a local minimum (where getting to the global minimum is desired). In order to get to the true global optimum, you need to get worse before you can get better. When fitting nonlinear models to data, dealing with local minima is a major problem and the reason that a stochastic MCMC algorithm that does occasionally go uphill works so much better than gradient descent, which only goes downhill.

However, the recent success of deep learning may dispel this notion when the dimension is high enough. Deep learning, which is a multi-layer neural network that can have millions of parameters is the quintessence of a high dimensional model. Yet, it seems to be able to work just fine using the back propagation algorithm, which is a form of gradient descent. The reason could be that in high enough dimensions, local minima are rare and the majority of critical points (places where the slope is zero) are high dimensional saddle points, where there is always a way out in some direction. In order to have a local minimum, the matrix of second derivatives in all directions (i.e. Hessian matrix) must be positive definite (i.e. have all positive eigenvalues). As the dimension of the matrix gets larger and larger there are simply more ways for one eigenvalue to be negative and that is all you need to provide an escape hatch. So in a high dimensional system, gradient descent may work just fine and there could be an interesting tradeoff between a parsimonious model with few parameters but difficult to fit versus a high dimensional model that is easy to fit. Now the usual danger of having too many parameters is that you overfit and thus you fit the noise at the expense of the signal and have no ability to generalize. However, deep learning models seem to be able to overcome this limitation.

Hence, if the dimension is high enough evolution can work while if it is too low then you need a revolution. So the question is what is the dimensionality of governance and politics. In my opinion, the historical record suggests that revolutions generally do not lead to good outcomes and even when they do small incremental changes seem to get you to a similar place. For example, the US and France had bloody revolutions while Canada and the England did not and they all have arrived at similar liberal democratic systems. In fact, one could argue that a constitutional monarchy (like Canada and Denmark), where the head of state is a figure head is more stable and benign than a republic, like Venezuela or Russia (e.g. see here). This distinction could have pertinence for the current US election if a group of well-meaning people, who believe that the two major parties do not have any meaningful difference, do not vote or vote for a third party. They should keep in mind that incremental change is possible and small policy differences can and do make a difference in people’s lives.

AlphaGo and the Future of Work

In March of this year, Google DeepMind’s computer program AlphaGo defeated world Go champion Lee Sedol. This was hailed as a great triumph of artificial intelligence and signaled to many the beginning of the new age when machines take over. I believe this is true but the real lesson of AlphaGo’s win is not how great machine learning algorithms are but how suboptimal human Go players are. Experts believed that machines would not be able to defeat humans at Go for a long time because the number of possible games is astronomically large, \sim 250^{150} moves, in contrast to chess with a paltry \sim 35^{80} moves. Additionally, unlike chess, it is not clear what is a good position and who is winning during intermediate stages of a game. Thus, any direct enumeration and evaluation of possible next moves as chess computers do, like IBM’s Deep Blue that defeated Gary Kasparov, seemed to be impossible. It was thought that humans had some sort of inimitable intuition to play Go that machines were decades away from emulating. It turns out that this was wrong. It took remarkably little training for AlphaGo to defeat a human. All the algorithms used were fairly standard – supervised and reinforcement backpropagation learning in multi-layer neural networks1. DeepMind just put them together in a clever way and had the (in retrospect appropriate) audacity to try.

The take home message of AlphaGo’s success is that humans are very, very far away from being optimal at playing Go. Uncharitably, we simply stink at Go. However, this probably also means that we stink at almost everything we do. Machines are going to take over our jobs not because they are sublimely awesome but because we are stupendously inept. It is like the old joke about two hikers encountering a bear and one starts to put on running shoes. The other hiker says: “Why are you doing that? You can’t outrun a bear.” to which she replies, “I only need to outrun you!” In fact, the more difficult a job seems to be for humans to perform, the easier it will be for a machine to do better. This was noticed a long time ago in AI research and called Moravec’s Paradox. Tasks that require a lot of high level abstract thinking like chess or predicting what movie you will like are easy for computers to do while seemingly trivial tasks that a child can do like folding laundry or getting a cookie out of a jar on an unreachable shelf is really hard. Thus high paying professions in medicine, accounting, finance, and law could be replaced by machines sooner than lower paying ones in lawn care and house cleaning.

There are those who are not worried about a future of mass unemployment because they believe people will just shift to other professions. They point out that a century ago a majority of Americans worked in agriculture and now the sector comprises of less than 2 percent of the population. The jobs that were lost to technology were replaced by ones that didn’t exist before. I think this might be true but in the future not everyone will be a software engineer or a media star or a CEO of her own company of robot employees. The increase in productivity provided by machines ensures this. When the marginal cost of production goes to zero (i.e. cost to make one more item), as it is for software or recorded media now, the whole supply-demand curve is upended. There is infinite supply for any amount of demand so the only way to make money is to increase demand.

The rate-limiting step for demand is the attention span of humans. In a single day, a person can at most attend to a few hundred independent tasks such as thinking, reading, writing, walking, cooking, eating, driving, exercising, or consuming entertainment. I can stream any movie I want now and I only watch at most twenty a year, and almost all of them on long haul flights. My 3 year old can watch the same Wild Kratts episode (great children’s show about animals) ten times in a row without getting bored. Even though everyone could be a video or music star on YouTube, superstars such as Beyoncé and Adele are viewed much more than anyone else. Even with infinite choice, we tend to do what our peers do. Thus, for a population of ten billion people, I doubt there can be more than a few million that can make a decent living as a media star with our current economic model. The same goes for writers. This will also generalize to manufactured goods. Toasters and coffee makers essentially cost nothing compared to three decades ago, and I will only buy one every few years if that. Robots will only make things cheaper and I doubt there will be a billion brands of TV’s or toasters. Most likely, a few companies will dominate the market as they do now. Even, if we could optimistically assume that a tenth of the population could be engaged in producing goods and services necessary for keeping the world functioning that still leaves the rest with little to do.

Even much of what scientists do could eventually be replaced by machines. Biology labs could consist of a principle investigator and robot technicians. Although it seems like science is endless, the amount of new science required for sustaining the modern world could diminish. We could eventually have an understanding of biology sufficient to treat most diseases and injuries and develop truly sustainable energy technologies. In this case, machines could be tasked to keep the modern world up and running with little need of input from us. Science would mostly be devoted to abstract and esoteric concerns.

Thus, I believe the future for humankind is in low productivity occupations – basically a return to pre-industrial endeavors like small plot farming, blacksmithing, carpentry, painting, dancing, and pottery making, with an economic system in place to adequately live off of this labor. Machines can provide us with the necessities of life while we engage in a simulated 18th century world but without the poverty, diseases, and mass famines that made life so harsh back then. We can make candles or bread and sell them to our neighbors for a living wage. We can walk or get in self-driving cars to see live performances of music, drama and dance by local artists. There will be philosophers and poets with their small followings as they have now. However, even when machines can do everything humans can do, there will still be a capacity to sustain as many mathematicians as there are people because mathematics is infinite. As long as P is not NP, theorem proving can never be automated and there will always be unsolved math problems.  That is not to say that machines won’t be able to do mathematics. They will. It’s just that they won’t ever be able to do all of it. Thus, the future of work could also be mathematics.

  1. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).