Revolution vs incremental change

I think that the dysfunction and animosity we currently see in the US political system and election is partly due to the underlying belief that meaningful change cannot be effected through slow evolution but rather requires an abrupt revolution where the current system is torn down and rebuilt. There is some merit to this idea. Sometimes the structure of a building can be so damaged that it would be easier to demolish and rebuild rather than repair and renovate. Mathematically, this can be expressed as a system being stuck in a local minimum (where getting to the global minimum is desired). In order to get to the true global optimum, you need to get worse before you can get better. When fitting nonlinear models to data, dealing with local minima is a major problem and the reason that a stochastic MCMC algorithm that does occasionally go uphill works so much better than gradient descent, which only goes downhill.

However, the recent success of deep learning may dispel this notion when the dimension is high enough. Deep learning, which is a multi-layer neural network that can have millions of parameters is the quintessence of a high dimensional model. Yet, it seems to be able to work just fine using the back propagation algorithm, which is a form of gradient descent. The reason could be that in high enough dimensions, local minima are rare and the majority of critical points (places where the slope is zero) are high dimensional saddle points, where there is always a way out in some direction. In order to have a local minimum, the matrix of second derivatives in all directions (i.e. Hessian matrix) must be positive definite (i.e. have all positive eigenvalues). As the dimension of the matrix gets larger and larger there are simply more ways for one eigenvalue to be negative and that is all you need to provide an escape hatch. So in a high dimensional system, gradient descent may work just fine and there could be an interesting tradeoff between a parsimonious model with few parameters but difficult to fit versus a high dimensional model that is easy to fit. Now the usual danger of having too many parameters is that you overfit and thus you fit the noise at the expense of the signal and have no ability to generalize. However, deep learning models seem to be able to overcome this limitation.

Hence, if the dimension is high enough evolution can work while if it is too low then you need a revolution. So the question is what is the dimensionality of governance and politics. In my opinion, the historical record suggests that revolutions generally do not lead to good outcomes and even when they do small incremental changes seem to get you to a similar place. For example, the US and France had bloody revolutions while Canada and the England did not and they all have arrived at similar liberal democratic systems. In fact, one could argue that a constitutional monarchy (like Canada and Denmark), where the head of state is a figure head is more stable and benign than a republic, like Venezuela or Russia (e.g. see here). This distinction could have pertinence for the current US election if a group of well-meaning people, who believe that the two major parties do not have any meaningful difference, do not vote or vote for a third party. They should keep in mind that incremental change is possible and small policy differences can and do make a difference in people’s lives.