How to be a scientific dilettante

I have worked in a lot of disparate fields from plasma physics, to nonlinear dynamics, to posture control, to neuroscience, to inflammation, to obesity, to gene transcription, and population genetics. I have had a lot of fun doing this but I definitely do not recommend it as a career path for a young person. I may have gotten away with it but I was really lucky and it certainly carried a cost. First of all, by working in so many fields, I definitely lack the deep knowledge that specialists have. Thus, I don’t always see the big picture and am often a step behind others. Secondly, while I am pretty well cited, my citations are diluted over multiple fields. Thus, while my total H index (number of papers where number of citations exceeds rank) is pretty decent, my H index in each given field is relatively small. I thus do not have much impact in any given area. To be taken seriously as a scientist, one must be a world expert in something. The system is highly nonlinear; being pretty good in a lot of things is much worse than being really good in one thing. There is a threshold for relevance and if you don’t cross it then it is like you don’t exist.

However, if you do want to work in a lot of fields, the worse thing to do is to say, “Hey I really find field X to be interesting so I’m just going to read some books and papers on it and try to do something.” I have reviewed quite a few papers, where some mathematician or physicist has read some popular science book or newspaper article on the topic and then tried to publish a paper on a problem mentioned in the book. I then have to tell them to read up on four decades of previous work first, and then resubmit. The way I have managed to meander through multiple fields is that someone will either contact me directly about some specific question or mention something to me either in a casual setting or at a conference. I could not possibly have made any progress at all if I didn’t have great collaborators who really knew the field and the literature. Still, people constantly ask me if I still work in neuroscience, to which I can only respond “Just because you don’t cite me doesn’t mean I don’t publish!”

 

Heritability in twins

Nature Genetics recently published a meta-analysis of virtually all twin studies over the last half century:

Tinca J C Polderman, Beben Benyamin, Christiaan A de Leeuw, Patrick F Sullivan, Arjen van Bochoven, Peter M Visscher & Danielle Posthuma. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nature Genetics 47,702–709 (2015) doi:10.1038/ng.3285.

One of the authors, Peter Visscher, is perhaps the most influential and innovative thinker in human genetics at this moment and this paper continues his string of insightful results. The paper examined close to eighteen thousand traits in almost three thousand publications, representing fifteen million twins. The main goal was to use all the available data to recompute the heritability estimates for all of these traits. The first thing they found was that the traits were highly skewed towards psychiatric and cognitive phenotypes. People who study heritability are mostly interested in mental function. They then checked to see if there was any publication bias where people only published results with high heritability. They used multiple methods but they basically checked if the predictions of effect size was correlated with sample size and they found none. Their most interesting result, which I will comment on more below was that the average heritability across all traits was 0.488, which means that on average genes and environment contribute equally. However, heritability does vary widely across domains where eye, ear, nose and throat function are most heritable, and social values were least heritable. The largest influence of shared environmental effects was for bodily functions, infections, and social values. Hence, staying healthy depends on your environment, which is why a child who may be stunted in their impoverished village can thrive if moved to Minnesota. It also shows why attitudes on social issues can and do change. Finally, the paper addressed two important technical issues which I will expand on below – 1) previous studies may be underestimating heritability and 2) heritability is mostly additive.

Heritability is the fraction of the variance of a trait due to genetic variance. Here is a link to a previous post explaining heritability although as my colleague Vipul Periwal points out, it is full of words and has no equations. Briefly, there are two types of heritability – broad sense and narrow sense. Broad sense heritability, H^2 = Var(G)/Var(P), is the total genetic variance divided by the phenotypic variance. Narrow sense heritability h^2 = Var(A)/Var(P) is the linear or additive genetic variance divided by the phenotypic variance. A linear regression of the standardized trait of the children against the average of the standardized trait of the parents is an estimate of the narrow sense heritability. It captures the linear part while the broad sense heritability includes the linear and nonlinear contributions, which include dominance and gene-gene effects (epistasis). To estimate (narrow-sense) heritability from twins, Polderman et al. used what is called Falconer’s formula and took twice the difference in the correlation of a trait between identical (monozygotic) and fraternal (dizygotic) twins (h^2 =2 (r_{MZ}-r_{DZ})). The idea being that the any difference between identical twins must be environmental (nongenetic), while the difference between dyzgotic twins is half genetic and environmental, so the difference between the two is half genetic. They also used another Falconer formula to estimate the shared environmental variance, which is c^2 = 2 r_{DZ} - r_{MZ}, since this “cancels out” the genetic part. Their paper then boiled down to doing a meta-analysis of r_{DZ} and r_{MZ}. Meta-analysis is a nuanced topic but it boils down to weighting results from different studies by some estimate of how large the errors are. They used the DerSimonian-Laird random-effects approach, which is implemented in R. The Falconer formulas estimate the narrow sense heritability but many of the previous studies were interested in nonadditive genetic effects as well. Typically, what they did was to use either an ACE (Additive, common environmental, environmental) or an ADE (Additive, dominance, environmental) model. They decided on which model to use by looking at the sign of c^2. If it is positive then they used ACE and if it is negative they used ADE. Polderman et al. showed that this decision algorithm biases the heritability estimate downward.

If the heritability of a trait is mostly additive then you would expect that r_{MZ}=2 r_{DZ} and they found that this was observed in 69% of the traits. Of the top 20 traits, 8 traits showed nonadditivity and these mostly related to behavioral and cognitive functions. Of these eight, 7 showed that the correlation between monozygotic twins was smaller than twice that of dizygotic twins, which implies that nonlinear genetic effects tend to work against each other. This makes sense to me since it would seem that as you start to accumulate additive variants that increase a phenotype you will start to hit rate limiting effects that will tend to dampen these effects. In other words, it seems plausible that the major nonlinearity in genetics is a saturation effect.

The most striking result was that the average heritability across all of the traits was about 0.5. Is an average value of 0.5 obvious or deep? I honestly do not know. When I told theoretical neuroscientist Fred Hall this result, he thought it was obvious and should be expected from maximum entropy considerations, which would assume that the distribution of h^2 would be uniform or at least symmetric about 0.5. This sounds plausible but as I have asserted many times – biology is the result of an exponential amplification of exponentially unlikely events. Traits that are heritable are by definition those that have variation across the population. Some traits, like the number of limbs, have no variance but are entirely genetic. Other traits, like your favourite sports team, are highly variable but not genetic even though there is a high probability that your favourite team will be the same as your parent’s or sibling’s favourite team. Traits that are highly heritable include height and cognitive function. Personality on the other hand, is not highly heritable. One of the biggest puzzles in population genetics is why there is any variability in a trait to start with. Natural selection prunes out variation exponentially fast so if any gene is selected for, it should be fixed very quickly. Hence, it seems equally plausible that traits with high variability would have low heritability. The studied traits were also biased towards mental function and different domains have different heritabilities. Thus, if the traits were sampled differently, the averaged heritability could easily deviate from 0.5. Thus, I think the null hypothesis should be that the h^2 = .5 value is a coincidence but I’m open to a deeper explanation.

A software tool to investigate these results can be found here. An enterprising student could do some subsampling of the traits to see how likely 0.5 would hold up if our historical interests in phenotypes were different.

Thanks go to Rick Gerkin for suggesting this topic.

Hopfield on the difference between physics and biology

Here is a short essay by theoretical physicist John Hopfield of the Hopfield net and kinetic proofreading fame among many other things (hat tip to Steve Hsu). I think much of the hostility of biologists towards physicists and mathematicians that Hopfield talks about have dissipated over the past 40 years, especially amongst the younger set. In fact these days, a good share of Cell, Science, and Nature papers have some computational or mathematical component. However, the trend is towards brute force big data type analysis rather than the simple elegant conceptual advances that Hopfield was famous for. In the essay, Hopfield gives several anecdotes and summarizes them with pithy words of advice. The one that everyone should really heed and one I try to always follow is “Do your best to make falsifiable predictions. They are the distinction between physics and ‘Just So Stories.’”

Sampling from a probability distribution

When simulating any system with randomness, sampling from a probability distribution is necessary. Usually, you’ll just need to sample from a normal or uniform distribution and thus can use a built-in random number generator. However, for the time when a built-in function does not exist for your distribution, here’s a simple algorithm.

Let’s say you have some probability density function (PDF) \rho(x) on some domain [x_{\min},x_{\max}] and you want to generate a set of numbers that follows this probability law. This is actually very simple to do although those new to the field may not know how. Generally, any programming language or environment has a built-in random number generator for real numbers (up to machine precision) between 0 and 1. Thus, what you want to do is to find a way to transform these random numbers, which are uniformly distributed on the domain [0,1] to the random numbers you want. The first thing to notice is that the cumulative distribution function (CDF) for your PDF, P(x) = \int_{x_{\min}}^x \rho(x') dx' is a function that ranges over the interval [0,1], since it is a probability. Thus, the algorithm is: 1) generate a random number r between o and 1, and 2) plug it into the inverse of P(x). The numbers you get out satisfy your distribution (i.e. P(x)=r \Rightarrow x = P^{-1}(r) ).

If you’re lucky enough to have a closed form expression of the inverse of the CDF P^{-1}(r) then you can just plug the random numbers directly into your function, otherwise you have to do a little more work. If you don’t have a closed form expression for the CDF then you can just solve the equation P(y)-r=0 for x. Newton’s method will generally converge very quickly.  You just loop over the command x = x – (P(x)-r)/rho(x) until you reach the tolerance you want. If you don’t have a closed form expression of the CDF then the simplest way is construct the CDF numerically by computing the N dimensional vector v[j]=\sum_{i=0}^j \rho(jh-x_{\min}) h, for some discretization parameter h such that x_{\max}= Nh .  If the PDF is defined on the entire real line then you’ll have to decide on what you want the minimum x_{\min} to be. If the distribution has thin tails, then you just need to go out until the probability is smaller than your desired error tolerance. You then find the index j such that v[j] is closest to r (i.e. in Julia you could use findmin(abs(v[j]-r))) and your random number is x = hj.

Here is why the algorithm works. What we are doing is transforming the PDF of r, which we can call u(r), to the desired PDF for x. The transformation we use is r = P(x). Thus, u(r) dr = \rho(x) dx or \rho(x) = u(r) dr/dx. Now, u(r) is just 1 on the interval [0,1] and dr/dx is the PDF of x as expected. Thus, inverting the CDF for uniform random numbers always gives you a new set of random numbers that satisfies the new PDF. Trivial perhaps, but really useful.

The perils of math pedagogy

It seems that the prevailing wisdom in teaching mathematics is to make abstract concepts as concrete as possible. The thinking is that if math can be related to everyday concepts or pictures then it will be more palatable and understandable. I happen to disagree. I think part of what makes math fun is the abstractness of it. You make up some rules and follow them to their logical conclusions. This is also how I see children play. They like to invent make-believe worlds and then play within them according to the rules of the world.

Usually these attempts at concreteness seem harmless enough but I have recently come up with an example where making things concrete is much more confusing than just teaching a rule. The example is in division with remainders. My third grader was asked to “draw” 7 divided by 2 in terms of items and groups. Her instinct was to draw 2 groups with three balls each with one ball remaining, like this

(x x x)

(x x x)

x

She then was supposed to write this as a mixed number, which looking at the diagram she wrote 3 1/3. When she asked me if this is correct I asked her to multiply this by 2 and see if it gets back 7 and when she got 6 2/3, she was extremely confused as to why she didn’t get the right answer. I tried to explain to her that the way she grouped things, the remainder was in terms of the fraction of the number of groups, which is very unintuitive and almost impossible to explain. It would have been even worse if the example was 8 divided by 3.

I then tried to tell her that a better way to think of division is not to ask how many elements would you get if you divided 7 into 2 groups because this amounts to begging the question (phrase used the correct way), since you need to know the answer before you can do the operation. Rather, what you really want to ask is how many groups would you have if you divided 7 items into groups of size 2 (which is a local rule), whereupon the diagram would be

( x x)

(x x)

(x x)

x

Now if you write down the mixed number you get 3 1/2, which is the correct answer. She then argued vehemently with me that this is not what the teacher taught her, which may or may not be true.

I think even most adults would get confused by this example and maybe working through it would give them a new appreciation of division. However, if you wanted children to learn to divide correctly than teaching them the rule is better. To divide 7 by 2 you find the largest integer that multiplied by 2 fits into 7 and what’s left over is divided by 2. Even better, which introduces and motivates fractions, is that you write 7 divided by 2 as 7/2 and this then becomes 3 1/2. If you learn the rule, you will never end up with 3 1/3.

MCMC for linear models

I’ve been asked in a comment to give a sample of pseudo code for an MCMC algorithm to fit a linear model ax + b to some data. See here for the original post on MCMC. With a linear model, you can write down the answer in closed form (see here), so it is a good model to test your algorithm and code.  Here it is in pseudo-Julia code:

#  initial guess for parameters a and b 
a=0
b=0
# construct chi squared, where D is the data vector and x is the vector of the
# independent quantity
chi = norm(D - (a*x +b))^2;
for n = 1 : total;
# Make random guesses for new normally distributed a and b with mean old a and b
# and standard deviation asig and bsig
at = a + asig * randn()
bt = b + bsig * randn() chit = norm(D - (at*x + bt))^2;
# Take ratio of likelihoods, sigma is the data uncertainty
ratio=exp((-chit + chi)/(2*sigma^2));
# Compare the ratio to a uniform random number between 0 and 1, 
# keep new parameters if ratio exceeds random number
if rand() < ratio a = at;
b = bt; chi = chit; end

end

# Keep running until convergence