Paper on new myopia associated gene

The prevalence of near sightedness or myopia has almost doubled in the past thirty years from about 25% to 44%. No one knows why but it is probably a gene-environment effect, like obesity. This recent paper in PLoS Genetics: APLP2 Regulates Refractive Error and Myopia Development in Mice and Humans, sheds light on the subject. It reports that a variant of the APLP2 gene is associated with myopia in people if they read a lot as children. Below is a figure of the result of a GWAS study showing the increase in myopia (more negative is more myopic) with age for those with the risk variant (GA) and for time spent reading. The effect size is pretty large and a myopic effect of APLP2 is seen in monkeys, mice, and humans. Thus, I think that this result will hold up. The authors also show that the APLP2 gene is involved in retinal signaling, particularly in amacrine cells. It is thus consistent with the theory that myopia is the result of feedback from the retina during development.  Hence, if you are constantly focused on near objects, the eye will develop to accommodate for that. So maybe you should send your 7 year old outside to play instead of sitting inside reading or playing video games.

Brave New World

Read Steve Hsu’s Nautilus article on Super-Intelligence. If so-called IQ-related genetic variants are truly additive then his estimates are probably correct. His postulated being could possibly understand the fine details of any topic in less than a day or shorter. Instead of taking several years to learn enough differential geometry to develop Einstein’s General Relativity (which is what it took for Einstein), a super-intelligence could perhaps do it in an afternoon or during a coffee break. Personally, I believe that nothing is free and that there will always be tradeoffs. I’m not sure what the cost of super-intelligence will be but there will likely be something. Variability in a population is always good for the population although not so great for each individual. An effective way to make a species go extinct is to remove variability. If pests had no genetic variability then it would be a simple matter to eliminate them with some toxin. Perhaps, humans will be able to innovate fast enough to buffer them against environmental changes. Maybe cognitive variability can compensate for genetic variability. I really don’t know.

Heritability in twins

Nature Genetics recently published a meta-analysis of virtually all twin studies over the last half century:

Tinca J C Polderman, Beben Benyamin, Christiaan A de Leeuw, Patrick F Sullivan, Arjen van Bochoven, Peter M Visscher & Danielle Posthuma. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nature Genetics 47,702–709 (2015) doi:10.1038/ng.3285.

One of the authors, Peter Visscher, is perhaps the most influential and innovative thinker in human genetics at this moment and this paper continues his string of insightful results. The paper examined close to eighteen thousand traits in almost three thousand publications, representing fifteen million twins. The main goal was to use all the available data to recompute the heritability estimates for all of these traits. The first thing they found was that the traits were highly skewed towards psychiatric and cognitive phenotypes. People who study heritability are mostly interested in mental function. They then checked to see if there was any publication bias where people only published results with high heritability. They used multiple methods but they basically checked if the predictions of effect size was correlated with sample size and they found none. Their most interesting result, which I will comment on more below was that the average heritability across all traits was 0.488, which means that on average genes and environment contribute equally. However, heritability does vary widely across domains where eye, ear, nose and throat function are most heritable, and social values were least heritable. The largest influence of shared environmental effects was for bodily functions, infections, and social values. Hence, staying healthy depends on your environment, which is why a child who may be stunted in their impoverished village can thrive if moved to Minnesota. It also shows why attitudes on social issues can and do change. Finally, the paper addressed two important technical issues which I will expand on below – 1) previous studies may be underestimating heritability and 2) heritability is mostly additive.

Heritability is the fraction of the variance of a trait due to genetic variance. Here is a link to a previous post explaining heritability although as my colleague Vipul Periwal points out, it is full of words and has no equations. Briefly, there are two types of heritability – broad sense and narrow sense. Broad sense heritability, H^2 = Var(G)/Var(P), is the total genetic variance divided by the phenotypic variance. Narrow sense heritability h^2 = Var(A)/Var(P) is the linear or additive genetic variance divided by the phenotypic variance. A linear regression of the standardized trait of the children against the average of the standardized trait of the parents is an estimate of the narrow sense heritability. It captures the linear part while the broad sense heritability includes the linear and nonlinear contributions, which include dominance and gene-gene effects (epistasis). To estimate (narrow-sense) heritability from twins, Polderman et al. used what is called Falconer’s formula and took twice the difference in the correlation of a trait between identical (monozygotic) and fraternal (dizygotic) twins (h^2 =2 (r_{MZ}-r_{DZ})). The idea being that the any difference between identical twins must be environmental (nongenetic), while the difference between dyzgotic twins is half genetic and environmental, so the difference between the two is half genetic. They also used another Falconer formula to estimate the shared environmental variance, which is c^2 = 2 r_{DZ} - r_{MZ}, since this “cancels out” the genetic part. Their paper then boiled down to doing a meta-analysis of r_{DZ} and r_{MZ}. Meta-analysis is a nuanced topic but it boils down to weighting results from different studies by some estimate of how large the errors are. They used the DerSimonian-Laird random-effects approach, which is implemented in R. The Falconer formulas estimate the narrow sense heritability but many of the previous studies were interested in nonadditive genetic effects as well. Typically, what they did was to use either an ACE (Additive, common environmental, environmental) or an ADE (Additive, dominance, environmental) model. They decided on which model to use by looking at the sign of c^2. If it is positive then they used ACE and if it is negative they used ADE. Polderman et al. showed that this decision algorithm biases the heritability estimate downward.

If the heritability of a trait is mostly additive then you would expect that r_{MZ}=2 r_{DZ} and they found that this was observed in 69% of the traits. Of the top 20 traits, 8 traits showed nonadditivity and these mostly related to behavioral and cognitive functions. Of these eight, 7 showed that the correlation between monozygotic twins was smaller than twice that of dizygotic twins, which implies that nonlinear genetic effects tend to work against each other. This makes sense to me since it would seem that as you start to accumulate additive variants that increase a phenotype you will start to hit rate limiting effects that will tend to dampen these effects. In other words, it seems plausible that the major nonlinearity in genetics is a saturation effect.

The most striking result was that the average heritability across all of the traits was about 0.5. Is an average value of 0.5 obvious or deep? I honestly do not know. When I told theoretical neuroscientist Fred Hall this result, he thought it was obvious and should be expected from maximum entropy considerations, which would assume that the distribution of h^2 would be uniform or at least symmetric about 0.5. This sounds plausible but as I have asserted many times – biology is the result of an exponential amplification of exponentially unlikely events. Traits that are heritable are by definition those that have variation across the population. Some traits, like the number of limbs, have no variance but are entirely genetic. Other traits, like your favourite sports team, are highly variable but not genetic even though there is a high probability that your favourite team will be the same as your parent’s or sibling’s favourite team. Traits that are highly heritable include height and cognitive function. Personality on the other hand, is not highly heritable. One of the biggest puzzles in population genetics is why there is any variability in a trait to start with. Natural selection prunes out variation exponentially fast so if any gene is selected for, it should be fixed very quickly. Hence, it seems equally plausible that traits with high variability would have low heritability. The studied traits were also biased towards mental function and different domains have different heritabilities. Thus, if the traits were sampled differently, the averaged heritability could easily deviate from 0.5. Thus, I think the null hypothesis should be that the h^2 = .5 value is a coincidence but I’m open to a deeper explanation.

A software tool to investigate these results can be found here. An enterprising student could do some subsampling of the traits to see how likely 0.5 would hold up if our historical interests in phenotypes were different.

Thanks go to Rick Gerkin for suggesting this topic.

Paper on new version of Plink

The paper describing the updated version of the genome analysis software tool Plink has just been published.

Second-generation PLINK: rising to the challenge of larger and richer datasets
Christopher C Chang, Carson C Chow, Laurent CAM Tellier, Shashaank Vattikuti, Shaun M Purcell, and James J Lee

GigaScience 2015, 4:7  doi:10.1186/s13742-015-0047-8

PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotype likelihoods, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1’s primary data format.

To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, View MathML-time/constant-space Hardy-Weinberg equilibrium and Fisher’s exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. We have also developed an extension to the data format which adds low-overhead support for genotype likelihoods, phase, multiallelic variants, and reference vs. alternate alleles, which is the basis of our planned second release (PLINK 2.0).

The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.

Keywords: GWAS; Population genetics; Whole-genome sequencing; High-density SNP genotyping; Computational statistics


This project started out with us trying to do some genomic analysis that involved computing various distance metrics on sequence space. Programming virtuoso Chris Chang stepped in and decided to write some code to speed up the computations. His program, originally called wdist, was so good and fast that we kept asking him to put in more capabilities. Eventually,  he had basically replicated the suite of functions that Plink performed so he contacted Shaun Purcell, the author of Plink, if he could just call his code Plink too and Shaun agreed. We then ran a series of tests on various machines to check the speed-ups compared to the original Plink and gcta. If you do any GWAS analysis at all, I highly recommend you check out Plink 1.9.

Journal Club

Here is the paper I’ll be covering in the Laboratory of Biological Modeling, NIDDK, Journal Club tomorrow

Morphological and population genomic evidence that human faces have evolved to signal individual identity

Michael J. Sheehan & Michael W. Nachman

Abstract: Facial recognition plays a key role in human interactions, and there has been great interest in understanding the evolution of human abilities for individual recognition and tracking social relationships. Individual recognition requires sufficient cognitive abilities and phenotypic diversity within a population for discrimination to be possible. Despite the importance of facial recognition in humans, the evolution of facial identity has received little attention. Here we demonstrate that faces evolved to signal individual identity under negative frequency-dependent selection. Faces show elevated phenotypic variation and lower between-trait correlations compared with other traits. Regions surrounding face-associated single nucleotide polymorphisms show elevated diversity consistent with frequency-dependent selection. Genetic variation maintained by identity signalling tends to be shared across populations and, for some loci, predates the origin of Homo sapiens. Studies of human social evolution tend to emphasize cognitive adaptations, but we show that social evolution has shaped patterns of human phenotypic and genetic diversity as well.

Incompetence is the norm

People have been justly anguished by the recent gross mishandling of the Ebola patients in Texas and Spain and the risible lapse in security at the White House. The conventional wisdom is that these demonstrations of incompetence are a recent phenomenon signifying a breakdown in governmental competence. However, I think that incompetence has always been the norm; any semblance of competence in the past is due mostly to luck and the fact that people do not exploit incompetent governance because of a general tendency towards docile cooperativity (as well as incompetence of bad actors). In many ways, it is quite amazing at how reliably citizens of the US and other OECD members respect traffic laws, pay their bills and service their debts on time. This is a huge boon to an economy since excessive resources do not need to be spent on enforcing rules. This does not hold in some if not many developing nations where corruption is a major problem (c.f. this op-ed in the Times today). In fact, it is still an evolutionary puzzle as to why agents cooperate for the benefit of the group even though it is an advantage for an individual to defect. Cooperativity is also not likely to be all genetic since immigrants tend to follow the social norm of their adopted country, although there could be a self-selection effect here. However, the social pressure to cooperate could evaporate quickly if there is the perception of the lack of enforcement as evidenced by looting following natural disasters or the abundance of insider trading in the finance industry. Perhaps, as suggested by the work of Karl Sigmund and other evolutionary theorists, cooperativity is a transient phenomenon and will eventually be replaced by the evolutionarily more stable state of noncooperativity. In that sense, perceived incompetence could be rising but not because we are less able but because we are less cooperative.