# Technology and inference

In my previous post, I gave an example of how fake news could lead to a scenario of no update of posterior probabilities. However, this situation could occur just from the knowledge of technology. When I was a child, fantasy and science fiction movies always had a campy feel because the special effects were unrealistic looking. When Godzilla came out of Tokyo Harbour it looked like little models in a bathtub. The Creature from the Black Lagoon looked like a man in a rubber suit. I think the first science fiction movie that looked astonishing real was Stanley Kubrick’s 1968 masterpiece 2001: A Space Odyssey, which adhered to physics like no others before and only a handful since. The simulation of weightlessness in space was marvelous and to me the ultimate attention to detail was the scene in the rotating space station where a mild curvature in the floor could be perceived. The next groundbreaking moment was the 1993 film Jurassic Park, which truly brought dinosaurs to life. The first scene of a giant sauropod eating from a tree top was astonishing. The distinction between fantasy and reality was forever gone.

The effect of this essentially perfect rendering of anything into a realistic image is that we now have a plausible reason to reject any evidence. Photographic evidence can be completely discounted because the technology exists to create completely fabricated versions. This is equally true of audio tapes and anything your read on the Internet. In Bayesian terms, we now have an internal model or likelihood function that any data could be false. The more cynical you are the closer this constant is to one. Once the likelihood becomes insensitive to data then we are in the same situation as before. Technology alone, in the absence of fake news, could lead to a world where no one ever changes their mind. The irony could be that this will force people to evaluate truth the way they did before such technology existed, which is that you believe people (or machines) that you trust through building relationships over long periods of time.

# Fake news and beliefs

Much has been written of the role of fake news in the US presidential election. While we will never know how much it actually contributed to the outcome, as I will show below, it could certainly affect people’s beliefs. Psychology experiments have found that humans often follow Bayesian inference – the probability we assign to an event or action is updated according to Bayes rule. For example, suppose $P(T)$ is the probability we assign to whether climate change is real; $P(F) = 1-P(T)$ is our probability that climate change is false. In the Bayesian interpretation of probability, this would represent our level of belief in climate change. Given new data $D$ (e.g. news), we will update our beliefs according to

$P(T|D) = \frac{P(D|T) P(T)}{P(D)}$

What this means is that our posterior probability or belief that climate change is true given the new data, $P(T|D)$, is equal to the probability that the new data came from our internal model of a world with climate change (i.e. our likelihood), $P(D|T),$ multiplied by our prior probability that climate change is real, $P(T),$ divided by the probability of obtaining such data in all possible worlds, $P(D)$. According to the rules of probability, the latter is given by $P(D) = P(D|T)P(T) + P(D|F)P(F)$, which is the sum of the probability the data came from a world with climate change and that from one without.

This update rule can reveal what will happen in the presence of new data including fake news. The first thing to notice is that if $P(T)$ is zero, then there is no update. In this binary case, this means that if we believe that climate change is absolutely false or true then no data will change our mind. In the case of multiple outcomes, any outcome with zero prior (has no support) will never change. So if we have very specific priors, fake news is not having an impact because no news is having an impact. If we have nonzero priors for both true and false then if the data is more likely from our true model then our posterior for true will increase and vice versa. Our posteriors will tend towards the direction of the data and thus fake news could have a real impact.

For example, suppose we have an internal model where we expect the mean annual temperature to be 10 degrees Celsius with a standard deviation of 3 degrees if there is no climate change and a mean of 13 degrees with climate change. Thus if the reported data is mostly centered around 13 degrees then our belief of climate change will increase and if it is mostly centered around 10 degrees then it will decrease. However, if we get data that is spread uniformly over a wide range then both models could be equally likely and we would get no update. Mathematically, this is expressed as – if $P(D|T)=P(D|F)$ then $P(D) = P(D|T)(P(T)+P(F))= P(D|T)$. From the Bayesian update rule, the posterior will be identical to the prior. In a world of lots of misleading data, there is no update. Thus, obfuscation and sowing confusion is a very good strategy for preventing updates of priors. You don’t need to refute data, just provide fake examples and bury the data in a sea of noise.

# Probability of gun death

The tragedy in Oregon has reignited the gun debate. Gun control advocates argue that fewer guns mean fewer deaths while gun supporters argue that if citizens were armed then shooters could be stopped through vigilante action. These arguments can be quantified in a simple model of the probability of gun death, $p_d$:

$p_d = p_gp_u(1-p_gp_v) + p_gp_a$

where $p_g$ is the probability of having a gun, $p_u$ is the probability of being a criminal or  mentally unstable enough to become a shooter, $p_v$ is the probability of effective vigilante action, and $p_a$ is the probability of accidental death or suicide.  The probability of being killed by a gun is given by the probability of someone having a gun times the probability that they are unstable enough to use it. This is reduced by the probability of a potential victim having a gun times the probability of acting effectively to stop the shooter. Finally, there is also a probability of dying through an accident.

The first derivative of $p_d$ with respect to $p_g$ is $p_u - 2 p_u p_g p_v + p_a$ and the second derivative is negative. Thus, the minimum of $p_d$ cannot be in the interior $0 < p_g < 1$ and must be at the boundary. Given that $p_d = 0$ when $p_g=0$ and $p_d = p_u(1-p_v) + p_a$ when $p_g = 1$, the absolute minimum is found when no one has a gun. Even if vigilante action was 100% effective, there would still be gun deaths due to accidents. Now, some would argue that zero guns is not possible so we can examine if it is better to have fewer guns or more guns. $p_d$ is maximal at $p_g = (p_u + p_a)/(2p_u p_v)$. Thus, unless $p_v$ is greater than one half then even in the absence of accidents there is no situation where increasing the number of guns makes us safer. The bottom line is that if we want to reduce gun deaths we should either reduce the number of guns or make sure everyone is armed and has military training.

# Sampling from a probability distribution

When simulating any system with randomness, sampling from a probability distribution is necessary. Usually, you’ll just need to sample from a normal or uniform distribution and thus can use a built-in random number generator. However, for the time when a built-in function does not exist for your distribution, here’s a simple algorithm.

Let’s say you have some probability density function (PDF) $\rho(x)$ on some domain $[x_{\min},x_{\max}]$ and you want to generate a set of numbers that follows this probability law. This is actually very simple to do although those new to the field may not know how. Generally, any programming language or environment has a built-in random number generator for real numbers (up to machine precision) between 0 and 1. Thus, what you want to do is to find a way to transform these random numbers, which are uniformly distributed on the domain $[0,1]$ to the random numbers you want. The first thing to notice is that the cumulative distribution function (CDF) for your PDF, $P(x) = \int_{x_{\min}}^x \rho(x') dx'$ is a function that ranges over the interval $[0,1]$, since it is a probability. Thus, the algorithm is: 1) generate a random number $r$ between o and 1, and 2) plug it into the inverse of $P(x)$. The numbers you get out satisfy your distribution (i.e. $P(x)=r \Rightarrow x = P^{-1}(r)$).

If you’re lucky enough to have a closed form expression of the inverse of the CDF $P^{-1}(r)$ then you can just plug the random numbers directly into your function, otherwise you have to do a little more work. If you don’t have a closed form expression for the CDF then you can just solve the equation $P(y)-r=0$ for $x$. Newton’s method will generally converge very quickly.  You just loop over the command x = x – (P(x)-r)/rho(x) until you reach the tolerance you want. If you don’t have a closed form expression of the CDF then the simplest way is construct the CDF numerically by computing the N dimensional vector $v[j]=\sum_{i=0}^j \rho(jh-x_{\min}) h$, for some discretization parameter $h$ such that $x_{\max}= Nh$ .  If the PDF is defined on the entire real line then you’ll have to decide on what you want the minimum $x_{\min}$ to be. If the distribution has thin tails, then you just need to go out until the probability is smaller than your desired error tolerance. You then find the index $j$ such that $v[j]$ is closest to $r$ (i.e. in Julia you could use findmin(abs(v[j]-r))) and your random number is $x = hj$.

Here is why the algorithm works. What we are doing is transforming the PDF of $r$, which we can call $u(r)$, to the desired PDF for $x$. The transformation we use is $r = P(x)$. Thus, $u(r) dr = \rho(x) dx$ or $\rho(x) = u(r) dr/dx$. Now, $u(r)$ is just 1 on the interval $[0,1]$ and $dr/dx$ is the PDF of $x$ as expected. Thus, inverting the CDF for uniform random numbers always gives you a new set of random numbers that satisfies the new PDF. Trivial perhaps, but really useful.

# New paper on path integrals

Carson C. Chow and Michael A. Buice. Path Integral Methods for Stochastic Differential Equations. The Journal of Mathematical Neuroscience,  5:8 2015.

Abstract: Stochastic differential equations (SDEs) have multiple applications in mathematical neuroscience and are notoriously difficult. Here, we give a self-contained pedagogical review of perturbative field theoretic and path integral methods to calculate moments of the probability density function of SDEs. The methods can be extended to high dimensional systems such as networks of coupled neurons and even deterministic systems with quenched disorder.

This paper is a modified version of our arXiv paper of the same title.  We added an example of the stochastically forced FitzHugh-Nagumo equation and fixed the typos.

# Code platform update

It’s a week away from 2015, and I have transitioned completely away from Matlab. Julia is winning the platform attention battle. It is very easy to code in and it is very fast. I just haven’t gotten around to learning python much less pyDSTool (sorry Rob). I kind of find the syntax of Python (with all the periods between words) annoying. Wally Xie and I have also been trying to implement some of our MCMC runs in Stan but we have had trouble making it work.  Our model requires integrating ODEs and the ODE solutions from Stan (using our own solver) do not match our Julia code or the gold standard XPP. Maybe we are missing something obvious in the Stan syntax but our code is really very simple. Thus, we are going back to doing our Bayesian posterior estimates in Julia. However, I do plan to revisit Stan if they (or we) can write a debugger for it.

# MCMC for linear models

I’ve been asked in a comment to give a sample of pseudo code for an MCMC algorithm to fit a linear model $ax + b$ to some data. See here for the original post on MCMC. With a linear model, you can write down the answer in closed form (see here), so it is a good model to test your algorithm and code.  Here it is in pseudo-Julia code:

#  initial guess for parameters a and b a=0b=0
# construct chi squared, where D is the data vector and x is the vector of the # independent quantity
chi = norm(D - (a*x +b))^2;
for n = 1 : total;# Make random guesses for new normally distributed a and b with mean old a and b # and standard deviation asig and bsig
at = a + asig * randn()bt = b + bsig * randn()

chit = norm(D - (at*x + bt))^2;
# Take ratio of likelihoods, sigma is the data uncertaintyratio=exp((-chit + chi)/(2*sigma^2));
# Compare the ratio to a uniform random number between 0 and 1, # keep new parameters if ratio exceeds random numberif rand() < ratio
a = at;b = bt;
chi = chit;
endend# Keep running until convergence