## Archive for the ‘Pedagogy’ Category

### Bayesian model comparison Part 2

May 11, 2013

In a previous post, I summarized the Bayesian approach to model comparison, which requires the calculation of the Bayes factor between two models. Here I will show one computational approach that I use called thermodynamic integration borrowed from molecular dynamics. Recall, that we need to compute the model likelihood function

$P(D|M)=\int P((D|M,\theta)P(\theta|M) d\theta$     (1)

for each model where $P(D|M,\theta)$ is just the parameter dependent likelihood function we used to find the posterior probabilities for the parameters of the model.

The integration over the parameters can be accomplished using the Markov Chain Monte Carlo, which I summarized previously here. We will start by defining the partition function

$Z(\beta) = \int P(D|M,\theta)^\beta P(\theta| M) d\theta$    (2)

where $\beta$ is an inverse temperature. The derivative of the log of the partition function gives

$\frac{d}{d\beta}\ln Z(\beta)=\frac{\int d\theta \ln[P(D |\theta,M)] P(D | \theta, M)^\beta P(\theta|M)}{\int d\theta \ P(D | \theta, M)^\beta P(\theta | M)}$    (3)

which is equal to the ensemble average of $\ln P(D|\theta,M)$. However, if we assume that the MCMC has reached stationarity then we can replace the ensemble average with a time average $\frac{1}{T}\sum_{i=1}^T \ln P(D|\theta, M)$.  Integrating (3) over $\beta$ from 0 to 1 gives

$\ln Z(1) = \ln Z(0) + \int \langle \ln P(D|M,\theta)\rangle d\beta$

From (1) and (2), we see that  $Z(1)=P(D|M)$, which is what we want to compute  and $Z(0)=\int P(\theta|M) d\theta=1$.

Hence, to perform Bayesian model comparison, we simply run the MCMC for each model at different temperatures (i.e. use $P(D|M,\theta)^\beta$ as the likelihood in the standard MCMC) and then integrate the log likelihoods $Z(1)$ over $\beta$ at the end. For a Gaussian likelihood function, changing temperature is equivalent to changing the data “error”. The higher the temperature the larger the presumed error. In practice, I usually run at seven to ten different values of $\beta$ and use a simple trapezoidal rule to integrate over $\beta$.  I can even do parameter inference and model comparison in the same MCMC run.

Erratum, 2013-5-2013,  I just fixed an error in the final formula

### Mass

February 10, 2013

Since the putative discovery of the Higgs boson this past summer, I have heard and read multiple attempts at explaining what exactly this discovery means. They usually go along the lines of “The Higgs mechanism gives mass to particles by acting like molasses in which particles move around …” More sophisticated accounts will then attempt to explain that the Higgs boson is an excitation in the Higgs field. However, most of the explanations I have encountered assume that most people already know what mass actually is and why particles need to be endowed with it. Given that my seventh grade science teacher didn’t really understand what mass was, I have a feeling that most nonphysicists don’t really have a full appreciation of mass.

To start out, there are actually two kinds of mass. There is inertial mass, which is the resistance to acceleration and is mass that goes into Newton’s second law of  $F = m a$ and then there is gravitational mass which is like the “charge” of gravity. The more gravitational mass you have the stronger the gravitational force. Although they didn’t need to be, these two masses happen to be the same.  The equivalence of inertial and gravitational mass is one of the deepest facts of the universe and is the reason that all objects fall at the same rate. Galileo’s apocryphal Leaning Tower of Pisa experiment was a proof that the two masses are the same. You can see this by noting that the gravitational force is given by

### Revised SDE and path integral paper

October 10, 2012

At the MBI last week, I gave a tutorial on using path integrals to compute moments of stochastic differential equations perturbatively.  The slides are the same as the tutorial I gave a few years ago (see here).  I slightly modified the review paper that goes with the talk. I added the explicit computation for the generating functional of the complex Gaussian PDF. The new version can be found here.

### Strogatz in the Times

September 19, 2012

Don’t miss Steve Strogatz’s new series on math in the New York Times. Once again Steve manages to make math both interesting and understandable.

### A new strategy for the iterated prisoner’s dilemma game

September 4, 2012

The game theory world was stunned recently when Bill Press and Freeman Dyson found a new strategy to the iterated prisoner’s dilemma (IPD) game. They show how you can extort an opponent such that the only way they can maximize their payoff is to give you an even higher payoff. The paper, published in PNAS (link here) with a commentary (link here), is so clever and brilliant that I thought it would be worthwhile to write a pedagogical summary for those that are unfamiliar with some of the methods and concepts they use. This paper shows how knowing a little bit of linear algebra can go a really long way to exploring deep ideas.

In the classic prisoner’s dilemma, two prisoner’s are interrogated separately. They have two choices. If they both stay silent (cooperate) they get each get a year in prison. If one confesses (defects) while the other stays silent then the defector is released while the cooperator gets 5 years.  If both defect then they both get 3 years in prison. Hence, even though the highest utility for both of them is to both cooperate, the only logical thing to do is to defect. You can watch this played out on the British television show Golden Balls (see example here). Usually the payout is expressed as a reward so if they both cooperate they both get 3 points, if one defects and the other cooperates then the defector gets 5 points and the cooperator gets zero,  and if they both defect they both get 1  point each. Thus, the combined reward is higher if they both cooperate but since they can’t trust their opponent it is only logical to defect and get at least 1 point.

The prisoner’s dilema changes if you play the game repeatedly because you can now adjust to your opponent and it is not immediately obvious what the best strategy is. Robert Axelrod brought the IPD to public attention when he organized a tournament three decades ago. The results are published in his 1984 book The Evolution of Cooperation.  I first learned about the results in Douglas Hofstader’s Metamagical Themas column in Scientific American in the early 1980s. Axelrod invited a number of game theorists to submit strategies to play IPD and the winner submitted by Anatol Rappaport was called tit-for-tat, where you always cooperate first and then do whatever your opponent does.  Since this was a cooperative strategy with retribution, people have been using this example of how cooperation could evolve ever since those results. Press and Dyson now show that you can win by being nasty. Details of the calculations are below the fold.

### Nonlinearity in your wallet

May 25, 2012

Many human traits like height, IQ, and 50 metre dash times are very close to being normally distributed. The normal distribution (more technically the normal probability density function) or Gaussian function

$f(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{-(x-\mu)^2/2\sigma^2}$

is the famous bell shaped curve that the histogram of class grades fall on. The shape of the Gaussian is specified by two parameters the mean $\mu$, which coincides with the peak of the bell, and the standard deviation $\sigma$, which is a measure of how wide the Gaussian is. Let’s take height as an example. There is a 68% chance that any person will be within one standard deviation of the mean and a little more than 95% that you will be within two standard deviations. The tallest one percent are about 2.3 standard deviations from the mean.

The fact that lots of things are normally distributed  is not an accident but a consequence of the central limit theorem (CLT), which may be the most important mathematical law in your life. The theorem says that the probability distribution of a sum of a large number of random things will be normal (i.e. a Gaussian). In the example of height, it suggests that there are perhaps hundreds or thousands of genetic and environmental factors that determine your height, each contributing a little amount. When you add them together you get your height and the distribution is normal.

Now, the one major thing in your life that bucks the normal trend is income and especially wealth distribution. Incomes are extremely non-normal. They have what are called fat tails, meaning that the income of the top earners are much higher than would be expected by a normal distribution. A general rule of thumb called the Pareto Principle is that 20% of the population controls 80% of the wealth. It may even be more skewed these days.

There are many theories as to why income and wealth is distributed the way it is and I won’t go into any of these. What I want to point out is that whatever it is that governs income and wealth, it is definitely nonlinear. The key ingredient in the CLT is that the factors add linearly. If there were some nonlinear combination of the variables then the result need not be normal. It has been argued that some amount of inequality is unavoidable given that we are born with unequal innate traits but the translation of those differences into  income inequality is a social choice to some degree. If we rewarded the contributors to income more linearly, then incomes would be distributed more normally (there would be some inherent skew because incomes must be positive). In some sense, the fact that some sectors of the economy seem to have much higher incomes than other sectors implies a market failure.

### Causality and obesity

May 23, 2012

The standard adage for complex systems as seen in biology and economics is that “correlation does not imply causation.”  The question then is how do you ever prove that something causes something. In the example of obesity, I stated in my New York Times interview that the obesity epidemic was caused by an increase in food availability.  What does that mean? If you strictly follow formal logic then this means that a) an increase in food supply will lead to an increase in obesity (i.e. modus ponens) and b) if there were no obesity epidemic then there would not have been an increase in food availability (i.e. modus tollens). It doesn’t mean that if there were not an increase in food availability then there would be no obesity epidemic.  This is where many people seem to be confused.  The obesity epidemic could have been caused by many things.  Some argue that it was a decline in physical activity. Some say that it is due to some unknown environmental agent. Some believe it is caused by an overconsumption of sugar and high fructose corn syrup. They could all be true and that still doesn’t mean that increased food supply was not a causal factor. Our validated model shows that if you feed the US population the extra food then there will be an  increase in body weight that more than compensates for the observed rise.  We have thus satisfied a) and thus I can claim that the obesity epidemic was caused by an increase in food supply.

Stating that obesity is a complex phenomenon that involves lots of different factors and that there cannot be a simple explanation is not an argument against my assertion. This is what I called hiding behind complexity. Yes, it is true that obesity is complex but that is not an argument for saying that food is not a causal factor. If you want to disprove my assertion then what you need to do is to find a country that does not have an obesity epidemic but did exhibit an increase in food supply that was sufficient to cause it. My plan is to do this by applying our model to other nations as soon as I am able to get ahold of data of body weights over time. This has proved more difficult than I expected. The US should be commended for having good easily accessible data. Another important point to consider is that even if increased food supply caused the obesity epidemic, this does not mean that reducing food supply will reverse it. There could be other effects that maintain it even in the absence of excess food.  As we all know, it’s complicated.

### Criticality

May 4, 2012

I attended a conference on Criticality in Neural Systems at NIH this week.  I thought I would write a pedagogical post on the history of critical phenomena and phase transitions since it is a long and somewhat convoluted line of thought to link criticality as it was originally defined in physics to neuroscience.  Some of this is a recapitulation of a previous post.

Criticality is about phase transitions, which is a change in the state of matter, such as between gas and liquid. The classic paradigm of phase transitions and critical phenomena is the Ising model of magnetization. In this model, a bunch of spins that can be either up or down (north or south) sit on lattice points. The lattice is said to be magnetized if all the spins are aligned and unmagnetized or disordered if they are randomly oriented. This is a simplification of a magnet where each atom has a magnetic moment which is aligned with a spin degree of freedom of the atom. Bulk magnetism arises when the spins are all aligned.  The lowest energy state of the Ising model is for all the spins to be aligned and hence magnetized. If the only thing that spins had to deal with was the interaction energy then we would be done.  What makes the Ising model interesting and for that matter all of statistical mechanics is that the spins are also coupled to a heat bath. This means that the spins are subjected to random noise and the size of this noise is given by the temperature. The noise wants to randomize the spins. The presence of randomness is why there is the word “statistical” in statistical mechanics. What this means is that we can never say for certain what the configuration of a system is but only assign probabilities and compute moments of the probability distribution. Statistical mechanics really should have been called probabilistic mechanics.

### The Mandelbrot set

March 16, 2012

The Mandelbrot set is often held up as an example of how amazing complexity can be generated from a simple dynamical system.  In comments to my previous posts on the information content or Kolmogorov complexity of the brain, it was brought up as an example of how the brain could be very complex yet still be fully specified by the genome.  While, I agree with this premise, the Mandelbrot set is not the best example to show this.   Now, the Mandelbrot set is a beautiful example of how you can generate incredibly complex fractal landscapes using a simple algorithm.  However, it takes an uncountably infinite amount of information to specify it.

Let’s be more precise.  Consider the iterative map $z \rightarrow z^2 +C$.  Pick any complex number for $C$ and iterate the map starting at $z=0$.  The ensuing iterates or orbit will either go to infinity or stay bounded.  The Mandelbrot set is the set of all points that you use for $C$ that stay bounded.  In essence, it consists of all complex numbers such that the series $C$, $C^2 +C$, $(C^2+C)^2 +C$ stays bounded.  You can immediately rule out some numbers.  You know that zero will always stay bounded and you also know that any number with absolute magnitude greater than 2 will also go to infinity.  In fact, to compute the Mandelbrot set, you just have to see if any iterate exceeds 2 because after that you know it is gone.  The question then is what happens in between and it turns out that the boundary of the Mandelbrot set is this beautiful fractal shape that looks like sea horses within sea horses and so forth.

The question then is how much information do you need to construct the Mandelbrot set.  The answer as proved by Blum, Cucker, Shub, and Smale (see their book Complexity and Real Computation), is that the Mandelbrot set is undecidable.  There is no algorithm to obtain the boundaries of the Mandelbrot set.  In other words, you would need an uncountable amount of information to specify it.  The beautiful pictures we see, as shown above, are only approximations to the set.

However, I am not hostile to the idea that simple things can generate complexity.  One could say that my career is based on this idea.  It is what chaos theory is all about.  I use the argument all the time.  I’m just saying that the Mandelbrot set is not a great example.  Perhaps, a better example is to say let’s consider the logistic map on real numbers $x \rightarrow rx(x-1)$.  If r is between zero and one, then all orbits will eventually go to zero but as you increase r, the nature of the orbits will change and eventually you’ll reach a periodic doubling cascade to chaos. If you choose an r that is slightly bigger than 3.57 then you’ll get chaos.  This implies that small changes to the initial conditions will give you completely different results and also that if you just plot the iterates coming out of the map, they will seem to have no apparent pattern.  If you were to naively estimate the complexity or information content of the orbit, you could be led to believe that it has high information content even though the Kolmogorov complexity is actually quite small and is given by the logistic map and the initial condition.  However, this may also not be the greatest example because there are ways to deduce that the orbit came from a low dimensional chaotic system rather than a high dimensional system.

### Globalization and income distribution

September 18, 2011

In the past two decades we have seen both an increase in globalization and income inequality. The question is whether the two are directly or indirectly related.  The GDP of the United States is about 14 trillion dollars, which works out to be about 45 thousand per person.  However, the median household income, which is about 50 thousand dollars per household, has not increased over this time period and has even dropped a little this year. The world GDP is approximately 75 trillion dollars (in terms of purchase power parity), which is an amazingly high 11 thousand per person per year given that over a billion people live on under two dollars a day. Thus one way to explain the decline in median income is that the US worker is now competing in a world where per capita GDP has effectively been reduced by a factor of four.  However, does this also explain the concurrent increase in wealth at the top of the income distribution.

I thought I would address this question with an extremely simple income distribution model called the Pareto distribution.  It simply assumes that incomes are distributed according to a power law with a lower cutoff: $P(I) = \alpha A I^{-1-\alpha}$, for $I>L$, where $A$ is a normalization constant. Let’s say the population size is $N$ and the GDP is $G$. Hence, we have the conditions $\int_L^\infty P(I) dI = N$ and $\int_L^\infty IP(I) dI = G$.  Inserting the Pareto distribution gives the following conditions $N=AL^{-\alpha}$ and $G = \alpha N L/(1-\alpha)$, or $A = NL^\alpha$ and $L=(\alpha-1)/\alpha (G/N)$.   The Pareto distribution is thought to be valid mostly for the tail fo the income distribution so $L$ should only be thought of as an effective minimum income.  We can now calculate the income threshold for the top  1% say.  This is given by the condition $F(H) = N-\int_L^H P(I) dI = 0.01N$, which results in $(L/H)^\alpha=0.01$ or  $H = L/0.01^{1/\alpha}$. For $\alpha = 2$ then the 99 percentile income threshold is about two hundred thousand dollars, which is a little low, implying that $\alpha$ is less than two.  However, the crucial point is that $H$ scales with the average income $G/N$.  The median income would have the same scaling, which clearly goes against recent trends where median incomes have stagnated while top incomes have soared.  What this implies is that the top end obeys a different income distribution from the rest of us.

### Correlations

September 10, 2011

If I had to compress everything that ails us today into one word it would be correlations.  Basically, everything bad that has happened recently from the financial crisis to political gridlock is due to undesired correlations.  That is not to say that all correlations are bad. Obviously, a system without any correlations is simply noise.  You would certainly want the activity on an assembly line in a factory to be correlated. Useful correlations are usually serial in nature like an invention leads to a new company.  Bad correlations are mostly parallel like all the members in Congress voting exclusively along party lines, which reduces an assembly with hundreds of people into just two. A recession is caused when everyone in the economy suddenly decides to decrease spending all at once.  In a healthy economy, people would be uncorrelated so some would spend more when others spend less and the aggregate demand would be about constant. When people’s spending habits are tightly correlated and everyone decides to save more at the same time then there would be less demand for goods and services in the economy so companies must lay people off resulting in even less demand leading to a vicious cycle.

The financial crisis that triggered the recession was due to the collapse of the housing bubble, another unwanted correlated event.  This was exacerbated by collateralized debt obligations (CDOs), which  are financial instruments that were doomed by unwanted correlations.  In case you haven’t followed the crisis, here’s a simple explanation. Say you have a set of loans where you think the default rate is 50%. Hence, given a hundred mortgages, you know fifty will fail but you don’t know which. The way to make a triple A bond out of these risky mortgages is to lump them together and divide the lump into tranches that have different seniority (i.e. get paid off sequentially).  So the most senior tranche will be paid off first and have the highest bond rating.  If fifty of the hundred loans go bad, the senior tranche will still get paid. This is great as long as the mortgages are only weakly correlated and you know what that correlation is. However, if the mortgages fail together then all the tranches will be bad.  This is what happened when the bubble collapsed. Correlations in how people responded to the collapse made it even worse.  When some CDOs started to fail, people panicked collectively and didn’t trust any CDOs even though some of them were still okay. The market for CDOs became frozen so people who had them and wanted to sell them couldn’t even at a discount. This is why the federal government stepped in.  The bail out was deemed necessary because of bad correlations.  Just between you and me, I would have let all the banks just fail.

We can quantify the effect of correlations in a simple example, which will also show the difference between sample mean and population mean. Let’s say you have some variable $x$ that estimates some quantity. The expectation value (population mean) is $\langle x \rangle = \mu$.  The variance of $x$, $\langle x^2 \rangle - \langle x \rangle^2=\sigma^2$ gives an estimate of the square of the error. If you wanted to decrease the error of the estimate then you can take more measurements. So let’s consider a sample of $n$ measurements.  The sample mean is $(1/n)\sum_i^n x_i$. The expectation value of the sample mean is  $(1/n)\sum_i \langle x_i \rangle = (n/n)\langle x \rangle = \mu$. The variance of the sample mean is

$\langle [(1/n)\sum_i x_i]^2 \rangle - \langle x \rangle ^2 = (1/n^2)\sum_i \langle x_i^2\rangle + (1/n^2) \sum_{j\ne k} \langle x_j x_k \rangle - \langle x \rangle^2$

Let $C=\langle (x_j-\mu)(x_k-\mu)\rangle$ be the correlation between two measurements. Hence, $\langle x_j x_k \rangle = C +\mu^2$. The variance of the sample mean is thus $\frac{1}{n} \sigma^2 + \frac{n-1}{n} C$.  If the measurements are uncorrelated ($C=0$) then the variance is $\sigma^2/n$, i.e. the standard deviation or error is decreased by the square root of the number of samples.  However, if there are nonzero correlations then the error can only be reduced to the amount of correlations $C$.  Thus, correlations give a lower bound in the error on any estimate.  Another way to think about this is that correlations reduce entropy and entropy reduces information.  One way to cure our current problems is to destroy parallel correlations.

### Computability

September 2, 2011

Many of my recent posts centre around the concept of computability so I thought I would step back today and give a review of the topic for the uninitiated.  Obviously, there are multiple text books on the topic so I won’t be bringing anything new.  However, I would like to focus on one small aspect of it that is generally glossed over in books.  The main thing to take away from computability for me is that it involves functions of integers. Essentially, a computation is something that maps an integer to another integer.  The Church-Turing thesis states that all forms of computation are equivalent to a Turing machine.  Hence, the lambda calculus, certain cellular automata, and your MacBook have the same computational capabilities as a Turing machine (actually your MacBook is finite so it has less capability but it could be extended arbitrarily to be Turing complete). The thesis cannot be formally proven but it seems to hold.

The fact that computation is about manipulation of integers has profound consequences, the main one being that it cannot deal directly with real numbers.  Or to put it another way, computation is constrained to countable processes.  If anything requires an uncountable number of operations then it is uncomputable. However, uncomputability or undecidability as it is often called, is generally not presented in such a simple way.  In many popular books like Godel, Escher, Bach, the emphasis is on the magical aspect of it.  The reason is that the proof on uncomputability, which is similar to Godel’s proof  of the First Incompleteness Theorem, relies on demonstrating that a certain self-referential function or program cannot exist by use of Cantor’s diagonal slash argument that the reals are uncountable.  In very simple nonrigorous terms, the proof works by considering a list of all possible computable functions $f_i(j)$ on all the integers $j$. This is the same as saying you have a list of all possible Turing machines $i$, of all possible initial states $j$. Now you suppose that one of the functions $f_a(j)$ takes the output of function j given input $j$ and puts a negative sign in front.  So $f_a(j)= -f_j(j)$.  The problem then comes if you suppose that the function acts on itself because then $f_a(a)=-f_a(a)$, which is a contradiction and thus such a computable function $f_a$ cannot exist.

### Advice to young researchers

June 12, 2011

If I were ever asked, this is what I would tell  young researchers embarking on their career.  They are in no particular order.  In fact, 8) may be the most important.

1)   Understand your problem as deeply as possible.  You should know everything that there is to know about your topic. Always ask the next question and think hard about how feasible it is to answer it.  Know why it would be hard or easy to do so.

2)   Learn as many tools as possible.  Get into the habit of constantly learning about new methods.  You may not need to implement everything yourself but be aware of what is out there and even more importantly who knows how to use it.

3)   Be known as an expert in something.  You don’t necessarily want to be pigeonholed but it will always serve you well if you are known as the expert in a certain area.

4)   Knowing what you don’t know is as important as knowing what you do know.  This goes with having deep knowledge about your subject.  You should know whether or not the reason you don’t know something is because no one knows or just you don’t know.

5)   Do not slack on scholarship; always do a thorough search of what has been done before.  Never be lazy about checking references.  It is your job to know everything that has been done before.  Also, just because it is not on the web doesn’t mean it doesn’t exist.

6)   Talk to as many people about your ideas as you can.  Getting feedback is extremely important to sharpen your ideas.

7)   Never let the lack of effort be an excuse for not getting something done. Sometimes, research is tedious.  Sometimes one more calculation or simulation will make a huge difference in your result.

8)   Learn to finish.  On the flip side of 7) you also have to know when a project is done.  There will always be unanswered questions and loose ends. Be aware of which are critical to your result and which would represent future projects.  The inability to finish papers is probably the biggest problem young people have.

### Stochastic differential equations

June 5, 2011

One of the things I noticed at the recent Snowbird meeting was an increase in interest in stochastic differential equations (SDEs) or Langevin equations.  They arise wherever noise is involved in a dynamical process.  In some instances, an SDE comes about as the continuum approximation of a discrete stochastic process, like the price of a stock.  In other cases, they arise as a way to reintroduce stochastic effects to mean field differential equations originally obtained by averaging over a large number of stochastic molecules or neurons.  For example, the Hodgkin-Huxley equation describing action potential generation in neurons is a mean field approximation of the stochastic transport of ions (through ion channels) across the cell membrane, which can be modeled as a multi-state Markov process usually simulated with the Gillespie algorithm (for example see here).  This is computationally expensive so adding a noise term to the mean field equations is a more efficient way to account for stochasticity.

### Jump Math

April 22, 2011

David Bornstein wrote a very interesting opinion article in the New York Times this week.  He tells the story about a new way of teaching math called Jump Math.  The basic concept is that you teach math by breaking it down to the smallest steps and getting the students to understand these steps.  Here is an excerpt of the article

New York Times:  Children come into school with differences in background knowledge, confidence, ability to stay on task and, in the case of math, quickness. In school, those advantages can get multiplied rather than evened out. One reason, says Mighton, is that teaching methods are not aligned with what cognitive science tells us about the brain and how learning happens.

In particular, math teachers often fail to make sufficient allowances for the limitations of working memory and the fact that we all need extensive practice to gain mastery in just about anything. Children who struggle in math usually have difficulty remembering math facts, handling word problems and doing multi-step arithmetic (pdf). Despite the widespread support for “problem-based” or “discovery-based” learning, studies indicate that current teaching approaches underestimate the amount of explicit guidance, “scaffolding” and practice children need to consolidate new concepts. Asking children to make their own discoveries before they solidify the basics is like asking them to compose songs on guitar before they can form a C chord.

Mighton, who is also an award-winning playwright and author of a fascinating book called “The Myth of Ability,” developed Jump over more than a decade while working as a math tutor in Toronto, where he gained a reputation as a kind of math miracle worker. Many students were sent to him because they had severe learning disabilities (a number have gone on to do university-level math). Mighton found that to be effective he often had to break things down into minute steps and assess each student’s understanding at each micro-level before moving on.

Take the example of positive and negative integers, which confuse many kids. Given a seemingly straightforward question like, “What is -7 + 5?”, many will end up guessing. One way to break it down, explains Mighton, would be to say: “Imagine you’re playing a game for money and you lost seven dollars and gained five. Don’t give me a number. Just tell me: Is that a good day or a bad day?”

I completely agree.  I’ve always felt that we should teach math like we teach sports.  If you want to be a good golfer then you should go to the range and hit thousands of golf balls.  Almost everyone thinks that they can improve in golf or any sport if they practiced more.  Well the same is true for math.  If you want to get better you should practice.  I’ve always felt that this idea that we need to make math more pertinent to students lives to get them motivated to study it to be completely misguided.  From my experience as a former math professor, I found that most students liked to do math for its own sake and didn’t really care if it was useful for their lives (even though it is).

### Scientific arbitrage

February 9, 2011

In many of my research projects, I spend a nontrivial amount of my time wondering if I am reinventing the wheel.   I try  to make sure that what I’m trying to do hasn’t already been done but this is not always simple because  a solution from another field may be hidden in another form using unfamiliar jargon and concepts.  Hence, I think that there is a huge opportunity out there for scientific arbitrage, where people can look for open problems that can be easily solved by adapting solutions from other areas.  One could argue that my own research program is a form of arbitrage since I use methods of applied mathematics and theoretical physics to tackle problems in biology. However, generally in my work, the problem comes first and then I look for the best tool to use rather than specifically work on problems that are open to arbitrage.

I’m certain that some fields will be more amenable to arbitrage than others. My guess is that fields that are very vertical like pure mathematics and theoretical physics will be less susceptible  because many people have thought about the same problem and have tried all of the available techniques. Breakthroughs in these fields will generally require novel ideas that build upon previous ones, such as in the recent proofs of the Poincare Conjecture and the Kepler sphere packing problem.   Using economics language, these fields are efficient. Ironically, economics itself may be a field  that is not as efficient and be open to arbitrage since many of the standard models, such as for supply and demand, are based on reaching an equilibrium.  It seems like a dose of dynamical systems may be in order.

### Linear Regression

January 21, 2011

As someone who was trained in nonlinear dynamics, I never gave much thought to linear regression.  After all, what could be more boring than fitting data with a straight line.  Now I use it all the time and find it rather beautiful.  I’ll start with the simplest example and show how it generalizes easily.  Consider a list of $N$ ordered pairs $(x_i,y_i)$ and you want to fit a straight line through the points. You want to find parameters such that

$y_i = b_0 + b_1 x_i +\epsilon_i$   (1)

has the smallest errors $\epsilon_i$ where  smallest usually, although not always, means in the least squares sense.

### Podcast update

January 4, 2011

Here’s whats on my iPod these days.  I definitely try to listen to the following three each week.  They are all about an hour so they fit into my drive home from work.

Quirks and Quarks:  Canadian Broadcasting Corporation’s weekly radio science show.  I used to listen to it as a child.  The first host was former scientist and current environmentalist David Suzuki.  It is now hosted by Bob McDonald.

The Science Show:  This is Australia’s long running radio science show hosted by the inimitable Robyn Williams

Radio Lab:  Possibly the most innovative thing ever on radio.  If you’ve never listened to radio lab, your missing out on a fantastic experience.

I sometimes listen to these.  The philosophy shows are half an hour or shorter while Econtalk is often longer than an hour so they are not as convenient to listen to on my drive.

Philosopher’s Zone: A show that is probably only viable in Australia, which has a vibrant philosophical community.  Host Alan Saunders is also a food expert.

Philosophy Bites: These are usually quite short but informative

Econtalk: Salon-like conversations between George Mason economist Russ Roberts and a guest covering a wide range of topics in economics and beyond  Although Roberts is a self-professed believer in markets his show is fairly well-balanced with different viewpoints.

I used to listen to these shows more but find myself dialing them up less for some reason these days:

All in the mind:  I find this half hour radio show a little too melodramatic at times but it can be interesting

The Naked Scientists:  This is a very popular radio show/podcast out of Cambridge, England.  I find it a little too flip at times and the hosts sometimes make mistakes.

In addition to these regular podcasts, I also listen to university lectures, mostly in philosophy, available on iTunes U.

### Bayesian parameter estimation

November 11, 2010

This is the third post on Bayesian inference.  The other two are here and here. This probably should be the first one to read if you are completely unfamiliar with the topic.  Suppose you are trying to model some system and you have a model that you want to match some data.  The model could be a function, a set of differential equations, or anything with parameters that can be adjusted.  To make this concrete, consider this classic differential equation model of the response of glucose to insulin in the blood:

$\dot G = -S_I X G - S_G(G-G_b)$

$\dot X = c_X[I(t) - X -I_b]$

where $G$ is glucose concentration, $I$ is insulin, $X$ is the action of insulin in some remote compartment, and there are five free parameters $S_I, S_G, G_b, c_X, I_b$.  If you include the initial values of $G$ and $X$ there are seven free parameters.  The data consist of measurements of glucose and insulin at discrete time points $\{t_1,t_2,\dots\}$.  The goal is to find a set of free parameters so that the model fits the data at those time points.

### Paulos in the Times

October 25, 2010

Mathematician John Allen Paulos, author of Innumeracy and other popular books on math, has a beautifully written column in the New York Times.  He articulates a dichotomy, which most people probably have never thought of,  between stories and statistics.  Here is a small excerpt from the article:

Despite the naturalness of these notions, however, there is a tension between stories and statistics, and one under-appreciated contrast between them is simply the mindset with which we approach them. In listening to stories we tend to suspend disbelief in order to be entertained, whereas in evaluating statistics we generally have an opposite inclination to suspend belief in order not to be beguiled. A drily named distinction from formal statistics is relevant: we’re said to commit a Type I error when we observe something that is not really there and a Type II error when we fail to observe something that is there. There is no way to always avoid both types, and we have different error thresholds in different endeavors, but the type of error people feel more comfortable may be telling. It gives some indication of their intellectual personality type, on which side of the two cultures (or maybe two coutures) divide they’re most comfortable.

People who love to be entertained and beguiled or who particularly wish to avoid making a Type II error might be more apt to prefer stories to statistics. Those who don’t particularly like being entertained or beguiled or who fear the prospect of making a Type I error might be more apt to prefer statistics to stories. The distinction is not unrelated to that between those (61.389% of us) who view numbers in a story as providing rhetorical decoration and those who view them as providing clarifying information.

I highly recommend reading the whole article.