The US presidential election on Nov. 6 is expected to be particularly close. The polling has been vigorous and there are many statistical prediction web sites. One of them, the Princeton Election Consortium, is run by neuroscientist Sam Wang at Princeton University. For any non-American readers, the president is not elected directly by the citizens but through what is called the electoral college. This is a set of 538 electoral voters that are selected by the individual states. The electoral votes are allotted according to the number of congressional districts per state plus two. Hence, low population states are over-represented. Almost all states agree that the candidate that takes the plurality of the votes in that state wins all the electoral votes of that state. Maine and Nebraska are the two exceptions that allot electoral votes according to who wins the congressional district. Thus in order to predict who will win, one must predict who will get at least 270 electoral votes. Most of the states are not competitive so the focus of the candidates (and media) are on a handful of so-called battleground states like Ohio and Colorado. Currently, Sam Wang predicts that President Obama will win the election with a median of 319 votes. Sam estimates the Bayesian probability for Obama’s re-election to be 99.6%. Nate Silver at another popular website (Five Thirty Eight), predicts that Obama will win 305 electoral votes and has a re-election probability of 83.7%.
These estimates are made by using polling data with a statistical model. Nate Silver uses national and state polls along with some economic indicators, although the precise model is unknown. Sam Wang uses only state polls. I’ll describe his method here. The goal is to estimate the probability distribution for the number of electoral votes a specific candidate will receive. The state space consists of possibilities (50 states plus the District of Columbia). I will assume that Maine and Nebraska do not split their votes along congressional districts although it is a simple task to include that possibility. Sam assumes that the individual states are statistically independent so that the joint probability distribution factorizes completely. He then takes the median of the polls for each state over some time window to represent the probability of that given state. The polling data is comprised of the voting preferences of a sample for a given candidate. The preferences are converted into probabilities using a normal distribution. He then computes the probability for all combinations. Suppose that there are just two states with win probabilities for your candidate of and . The probability of your candidate winning both states is , state 1 but not state 2 is , and so forth. If the states have and electoral votes respectively then if they win both states they will win votes and so forth. To keep the bookkeeping simple, Sam uses the trick of expressing the probability distribution as a polynomial of a dummy variable . So the probability distribution is
Hence, the coefficient of each term is the probability for the number of electoral votes given by the exponent of The expression for 51 “states” is and this can be evaluated quickly on a desktop computer. One can then take the median or mean of the distribution for the predicted number of electoral votes. The sum of the probabilities for electoral votes greater than 269 gives the winning probability, although Sam uses a more sophisticated method for his predicted probabilities. The model does assume that the probabilities are independent. Sam tries to account for this by using what he calls a meta-margin, in which he calculates how much the probabilities (in terms of preference) need to move for the leading candidate to lose. Also, the state polls will likely pick up any correlations as the election gets closer.
Most statistical models predict that Obama will be re-elected with fairly high probability but the national polls are showing that the race is almost tied. This discrepancy is a puzzle. Silver’s hypothesis for why is here and Sam’s is here. One of the sources for error in polls is that they must predict who will vote. The 2008 election had a voter turnout of a little less than 62%. That means that an election can be easily won or lost based on turnout alone, which makes one wonder about democracy.
Nov 4: dead link is updated