# Phase transitions may explain why SARS-CoV-2 spreads so fast and why new variants are spreading faster

J.C.Phillips, Marcelo A.Moret, Gilney F.Zebende, Carson C.Chow

## Abstract

The novel coronavirus SARS CoV-2 responsible for the COVID-19 pandemic and SARS CoV-1 responsible for the SARS epidemic of 2002-2003 share an ancestor yet evolved to have much different transmissibility and global impact 1. A previously developed thermodynamic model of protein conformations hypothesized that SARS CoV-2 is very close to a new thermodynamic critical point, which makes it highly infectious but also easily displaced by a spike-based vaccine because there is a tradeoff between transmissibility and robustness 2. The model identified a small cluster of four key mutations of SARS CoV-2 that predicts much stronger viral attachment and viral spreading compared to SARS CoV-1. Here we apply the model to the SARS-CoV-2 variants Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.1) and Delta (B.1.617.2)3 and predict, using no free parameters, how the new mutations will not diminish the effectiveness of current spike based vaccines and may even further enhance infectiousness by augmenting the binding ability of the virus.

https://www.sciencedirect.com/science/article/pii/S0378437122002576?dgcid=author

This paper is based on the ideas of physicist Jim Phillips, (formerly of Bell Labs, a National Academy member, and a developer of the theory behind Gorilla Glass used in iPhones). It was only due to Jim’s dogged persistence and zeal that I’m even on this paper although the persistence and zeal that ensnared me is the very thing that alienates most everyone else he tries to recruit to his cause.

Jim’s goal is to understand and characterize how a protein will fold and behave dynamically by utilizing an amino acid hydrophobicity (hydropathy) scale developed by Moret and Zebende. People have been developing hydropathy scores for many decades as a way to understand proteins with the idea that hydrophobic amino acids (residues) will tend to be on the inside of proteins while hydrophillic residues will be on the outside where the water is. There are several existing scores but Moret and Zebende, who are physicists and not chemists, took a different tack and found how the solvent-accessible surface area (ASA) scales with the size of a protein fragment with a specific residue in the center. The idea being that the smaller the ASA the more hydrophobic the residue. As protein fragments get larger they will tend to fold back on themselves and thus reduce the ASA. They looked at several thousand protein fragments and computed the average ASA with a given amino acid in the center. When they plotted the ASA vs length of fragment they found a power law and each amino acid had its own exponent. The more negative the exponent the smaller the ASA and thus the more hydrophobic the residue. The (negative) exponent could then be used as a hydropathy score. It differs from other scores in that it is not calculated in isolation based on chemical properties but accounts for the background of the amino acid.

M and Z’s score blew Jim’s mind because power laws are indicative of critical phenomena and phase transitions. Jim computed the coarse-grained hydropathy score (over a window of 35 residues) at each residue of a protein for a number of protein families. When COVID came along he naturally applied it to coronaviruses. He found that the coarse-grained hydropathy score profile of the spike protein of SARS-CoV-1 and SARS-CoV-2 had several deep hydrophobic wells. The well depths were nearly equal with SARS-CoV-2 being more equal than SARS-CoV-1. He then hypothesized that there was a selection advantage for well-depth symmetry and evolutionary pressure had pushed the SARS-CoV-2 spike to be near optimal. He argues that the symmetry allows the protein to coordinate activity better much like the way oscillators synchronize easier if their frequencies are more uniform. He predicted that given this optimality the spike was fragile and thus spike vaccines would be highly effective and that spike mutations could not change the spike much without diminishing function.

My contribution was to write some Julia code to automate this computation and apply it to some SARS-CoV-2 variants. I also scanned window sizes and found that the well depths are most equal close to Jim’s original value of 35. Below is Figure 3 from the paper.

What you see is the coarse-grained hydropathy score of the spike protein which is a little under 1300 residues long. Between residue 400 and 1200 there are 6 hydropathic wells. The well depths are more similar for SARS-CoV-2 and variants than SARS-CoV-1. Omicron does not look much different from the wild type, which makes me think that Omicron’s increased infectiousness is probably due to mutations that affect viral growth and transmission rather than spike binding to ACE2 receptors.

Jim is encouraging (strong arming) me into pushing this further, which I probably will given that there are still so many unanswered questions as to how and why it works, if at all. If anyone is interested in this, please let me know.

# New paper in eLife

I never thought this would ever be finished but it’s out. We hedge in the paper but my bet is that MYC is a facilitator of an accelerator essential for gene transcription.

# Dissecting transcriptional amplification by MYC

eLife 2020;9:e52483

Zuqin Nie, Chunhua Guo, Subhendu K Das, Carson C Chow, Eric Batchelor, S Stoney Simons Jr, David Levens

## Abstract

Supraphysiological MYC levels are oncogenic. Originally considered a typical transcription factor recruited to E-boxes (CACGTG), another theory posits MYC a global amplifier increasing output at all active promoters. Both models rest on large-scale genome-wide ”-omics’. Because the assumptions, statistical parameter and model choice dictates the ‘-omic’ results, whether MYC is a general or specific transcription factor remains controversial. Therefore, an orthogonal series of experiments interrogated MYC’s effect on the expression of synthetic reporters. Dose-dependently, MYC increased output at minimal promoters with or without an E-box. Driving minimal promoters with exogenous (glucocorticoid receptor) or synthetic transcription factors made expression more MYC-responsive, effectively increasing MYC-amplifier gain. Mutations of conserved MYC-Box regions I and II impaired amplification, whereas MYC-box III mutations delivered higher reporter output indicating that MBIII limits over-amplification. Kinetic theory and experiments indicate that MYC activates at least two steps in the transcription-cycle to explain the non-linear amplification of transcription that is essential for global, supraphysiological transcription in cancer.

# New paper on Wilson-Cowan Model

I forgot to post that my fellow Yahya and I have recently published a review paper on the history and possible future of the Wilson-Cowan Model in the Journal of Neurophysiology tribute issue to Jack Cowan. Thanks to the patient editors for organizing this.

# Before and beyond the Wilson–Cowan equations

## Abstract

The Wilson–Cowan equations represent a landmark in the history of computational neuroscience. Along with the insights Wilson and Cowan offered for neuroscience, they crystallized an approach to modeling neural dynamics and brain function. Although their iconic equations are used in various guises today, the ideas that led to their formulation and the relationship to other approaches are not well known. Here, we give a little context to some of the biological and theoretical concepts that lead to the Wilson–Cowan equations and discuss how to extend beyond them.

PDF here.

Here is the medrXiv link to our Covid-19 paper.  We actually have more updated versions with nicer graphs, which we will upload shortly.

# COVID-19 Paper

Global prediction of unreported SARS-CoV2 infection from observed COVID-19 cases

Carson C. Chow 1*, Joshua C. Chang 2,3, Richard C. Gerkin 4*, Shashaank Vattikuti 1*

1Mathematical Biology Section, LBM, NIDDK, National Institutes of Health2Epidemiology and Biostatistics Section, Rehabilitation Medicine, Clinical Center, National Institutes of Health 3 mederrata 4 School of Life Sciences, Arizona State University

*For correspondence contact carsonc@nih.gov, josh@mederrata.com, rgerkin@asu.edu, vattikutis@nih.gov

Summary: Estimation of infectiousness and fatality of the SARS-CoV-2 virus in the COVID-19 global pandemic is complicated by ascertainment bias resulting from incomplete and non-representative samples of infected individuals.  We developed a strategy for overcoming this bias to obtain more plausible estimates of the true values of key epidemiological variables.  We fit mechanistic Bayesian latent-variable SIR models to confirmed COVID-19 cases, deaths, and recoveries, for all regions (countries and US states) independently. Bayesian averaging over models, we find that the raw infection incidence rate underestimates the true rate by a factor, the case ascertainment ratio CARt that depends upon region and time. At the regional onset of COVID-19, the predicted global median was 13 infections unreported for each case confirmed (CARt = 0.07 C.I. (0.02, 0.4)). As the infection spread, the median CARt rose to 9 unreported cases for every one diagnosed as of April 15, 2020 (CARt = 0.1 C.I. (0.02, 0.5)).  We also estimate that the median global initial reproduction number R0 is 3.3 (C.I (1.5, 8.3)) and the total infection fatality rate near the onset is 0.17% (C.I. (0.05%, 0.9%)). However the time-dependent reproduction number Rt and infection fatality rate as of April 15 were 1.2 (C.I. (0.6, 2.5)) and 0.8% (C.I. (0.2%,4%)), respectively.  We find that there is great variability between country- and state-level values.  Our estimates are consistent with recent serological estimates of cumulative infections for the state of New York, but inconsistent with claims that very large fractions of the population have already been infected in most other regions.  For most regions, our estimates imply a great deal of uncertainty about the current state and trajectory of the epidemic.

# Response to Oxford paper on covid-19

Here is my response to the paper from Oxford (Lourenco et al.) arguing that novel coronavirus infection may already be widespread in the UK and Italy.  The result is based on fitting a disease spreading model, called an SIR model, to the cumulative number of deaths. SIR models usually consist of ordinary differential equations (ODEs) for the fraction of people in a given population who are susceptible to the infectious agent (S), the number infected (I),  and the number recovered (R). There is one other state in the model, which is the fraction who die from the disease (D).  The SIR model considers transitions between these states.  In the case of ODEs, the states are treated as continuous quantities, which is not a bad approximation for a large population, and each equation in the model describes the rate of change of a state (hence differential equation).  There are parameters in the model governing the rate of different interactions in each  equation, for example there is a parameter for the rate of increase in S whenever an S interacts with an I, and then there is a rate of loss of an I, which transitions into either R or D.  The Oxford group model D somewhat differently.  Instead of a transition from I into D they consider that a fraction of (1-S) will die with some delay between time of infection and death.

They estimate the model parameters by fitting the model to the cumulative number of deaths.  They did this instead of fitting directly to I because that is unreliable as many people who have Covid-19 have not been tested. They also only fit to the first 15 days from the first recorded death since they want to model what happens before social distancing was implemented.  They find that the model is consistent with a scenario where the probability that an infected person gets severe enough to be flagged is low and thus the disease is much more wide spread than expected. I redid the analysis without assuming that the parameters need to have particular values (called priors in Bayesian inference and machine learning) and showed that a wide range of parameters will fit the data. This is because the model is under-constrained by death data alone so even unrealistic parameters can work.  To be fair, the authors only proposed that this is a possibility and thus the population should be tested for anti-bodies to the coronavirus (SARS-CoV-2) to see if indeed there may already be herd immunity in place. However, the press has run with the result and that is why I think it is important to examine the result more closely.

# New paper in Cell

2018 Dec 10. pii: S0092-8674(18)31518-6. doi: 10.1016/j.cell.2018.11.026. [Epub ahead of print]

# Intrinsic Dynamics of a Human Gene Reveal the Basis of Expression Heterogeneity.

### Abstract

Transcriptional regulation in metazoans occurs through long-range genomic contacts between enhancers and promoters, and most genes are transcribed in episodic “bursts” of RNA synthesis. To understand the relationship between these two phenomena and the dynamic regulation of genes in response to upstream signals, we describe the use of live-cell RNA imaging coupled with Hi-C measurements and dissect the endogenous regulation of the estrogen-responsive TFF1 gene. Although TFF1 is highly induced, we observe short active periods and variable inactive periods ranging from minutes to days. The heterogeneity in inactive times gives rise to the widely observed “noise” in human gene expression and explains the distribution of protein levels in human tissue. We derive a mathematical model of regulation that relates transcription, chromosome structure, and the cell’s ability to sense changes in estrogen and predicts that hypervariability is largely dynamic and does not reflect a stable biological state.

#### KEYWORDS:

RNA; chromosome; estrogen; fluorescence; heterogeneity; imaging; live-cell; single-molecule; steroid; transcription

PMID: 30554876

DOI: 10.1016/j.cell.2018.11.026

# New paper on GWAS

2018 Dec;42(8):783-795. doi: 10.1002/gepi.22161. Epub 2018 Sep 24.

# The accuracy of LD Score regression as an estimator of confounding and genetic correlations in genome-wide association studies.

### Author information

1
Department of Psychology, University of Minnesota Twin Cities, Minneapolis, Minnesota.
2
Mathematical Biology Section, Laboratory of Biological Modeling, NIDDK, National Institutes of Health, Bethesda, Maryland.

### Abstract

To infer that a single-nucleotide polymorphism (SNP) either affects a phenotype or is linkage disequilibrium with a causal site, we must have some assurance that any SNP-phenotype correlation is not the result of confounding with environmental variables that also affect the trait. In this study, we study the properties of linkage disequilibrium (LD) Score regression, a recently developed method for using summary statistics from genome-wide association studies to ensure that confounding does not inflate the number of false positives. We do not treat the effects of genetic variation as a random variable and thus are able to obtain results about the unbiasedness of this method. We demonstrate that LD Score regression can produce estimates of confounding at null SNPs that are unbiased or conservative under fairly general conditions. This robustness holds in the case of the parent genotype affecting the offspring phenotype through some environmental mechanism, despite the resulting correlation over SNPs between LD Scores and the degree of confounding. Additionally, we demonstrate that LD Score regression can produce reasonably robust estimates of the genetic correlation, even when its estimates of the genetic covariance and the two univariate heritabilities are substantially biased.

#### KEYWORDS:

causal inference; genetic correlation; heritability; population stratification; quantitative genetics

PMID:

30251275

DOI:

10.1002/gepi.22161

# New Papers

Li, Y., Chow, C. C., Courville, A. B., Sumner, A. E. & Periwal, V. Modeling glucose and free fatty acid kinetics in glucose and meal tolerance test. Theoretical Biology and Medical Modelling 1–20 (2016). doi:10.1186/s12976-016-0036-3

These two papers took painfully long times to be published, which was completely perplexing and frustrating given that they both seemed rather straightforward and noncontroversial. The first is a generalization of our previously developed minimal model of the fatty acid and glucose as a function of insulin to a response to an ingested meal, where the rate of appearance of fat and glucose in the blood was modeled by an empirically determined time dependent function. The second was a reanalysis of the effects of substituting sugar-sweetened beverages with non-sugar ones. We applied our childhood growth model to predict what the children ate to account for their growth. Interestingly, what we found is that the model predicted that children with higher BMI are less able to compensate for a reduction of calories than children with lower BMI. This could imply that children with higher BMI have a less sensitive caloric sensing system and thus could be prone to overeating but on the flip side, can also be “tricked” into eating less.

# New paper in PLoS Comp Bio

Shashaank Vattikuti , Phyllis Thangaraj, Hua W. Xie, Stephen J. Gotts, Alex Martin, Carson C. Chow. Canonical Cortical Circuit Model Explains Rivalry, Intermittent Rivalry, and Rivalry Memory. PLoS Computational Biology (2016).

Abstract

It has been shown that the same canonical cortical circuit model with mutual inhibition and a fatigue process can explain perceptual rivalry and other neurophysiological responses to a range of static stimuli. However, it has been proposed that this model cannot explain responses to dynamic inputs such as found in intermittent rivalry and rivalry memory, where maintenance of a percept when the stimulus is absent is required. This challenges the universality of the basic canonical cortical circuit. Here, we show that by including an overlooked realistic small nonspecific background neural activity, the same basic model can reproduce intermittent rivalry and rivalry memory without compromising static rivalry and other cortical phenomena. The background activity induces a mutual-inhibition mechanism for short-term memory, which is robust to noise and where fine-tuning of recurrent excitation or inclusion of sub-threshold currents or synaptic facilitation is unnecessary. We prove existence conditions for the mechanism and show that it can explain experimental results from the quartet apparent motion illusion, which is a prototypical intermittent rivalry stimulus.

Author Summary

When the brain is presented with an ambiguous stimulus like the Necker cube or what is known as the quartet illusion, the perception will alternate or rival between the possible interpretations. There are neurons in the brain whose activity is correlated with the perception and not the stimulus. Hence, perceptual rivalry provides a unique probe of cortical function and could possibly serve as a diagnostic tool for cognitive disorders such as autism. A mathematical model based on the known biology of the brain has been developed to account for perceptual rivalry when the stimulus is static. The basic model also accounts for other neural responses to stimuli that do not elicit rivalry. However, these models cannot explain illusions where the stimulus is intermittently switched on and off and the same perception returns after an off period because there is no built-in mechanism to hold the memory. Here, we show that the inclusion of experimentally observed low-level background neural activity is sufficient to explain rivalry for static inputs, and rivalry for intermittent inputs. We validate the model with new experiments.

This paper is the latest of a continuing series of papers outlining how a canonical cortical circuit of excitatory and inhibitory cells can explain psychophysical and electrophysiological data of perceptual and cortical dynamics under a wide range of stimuli and conditions. I’ve summarized some of the work before (e.g. see here). In this particular paper, we show how the same circuit previously shown to explain winner-take-all behavior, normalization, and oscillations at various time scales, can also possess memory in the absence of input. Previous work has shown that if you have a circuit with effective mutual inhibition between two pools representing different percepts and include some type of fatigue process such as synaptic depression or spike frequency adaptation, then the circuit exhibits various dynamics depending on the parameters and input conditions. If the inhibition strength is relatively low and the two pools receive equal inputs then the model will have a symmetric fixed point where both pools are equally active. As the inhibition strength (or input strength) increases, then there can be a bifurcation to oscillations between the two pools with a frequency that is dependent on the strengths of inhibition, recurrent excitation, input, and the time constant of the fatigue process. A further increase in inhibition leads to a bifurcation to a winner-take-all (WTA) state where one of the pools dominates the others. However, the same circuit would be expected to not possess “rivalry memory”, where the same percept returns after the stimulus is completely removed for a duration that is long compared to the average oscillation period (dominance time). The reason is that during rivalry, the dominant pool is weakened while the suppressed pool is strengthened by the fatigue process. Thus when the stimulus is removed and returned, the suppressed pool would be expected to win the competition and become dominant. This reasoning had led people, including myself, to believe that rivalry memory could not be explained by this same model.

However, one thing Shashaank observed and that I hadn’t really noticed before was that the winner-take-all state can persist for arbitrarily low input strength. We prove a little theorem in the paper showing that if the gain function (or FI curve) is concave (i.e. does not bend up), then the winner-take-all will persist for arbitrarily low input if the inhibition is strong enough. Most importantly, the input does not need to be tuned and could be provided by the natural background activity known to exist in the brain. Even zero mean noise is sufficient to maintain the WTA state. This low-activity WTA state can then serve as a memory since whatever was active during a state with strong input can remain active when the input is turned off and the neurons just receive low level background activity. It is thus a purely mutual inhibition maintained memory. We dubbed this “topological memory” because it is like a kink in the carpet that never disappears and persists over a wide range of parameter values and input strengths. Although, we only consider rivalry memory in this paper, the mechanism could also apply in other contexts such as working memory. In this paper, we also focus on a specific rivalry illusion called the quartet illusion, which makes the model slightly more complicated but we show how it naturally reduces to a two pool model. We are currently finishing a paper quantifying precisely how excitatory and inhibitory strengths affect rivalry and other cortical phenomena so watch this space. We also have submitted an abstract to neuroscience demonstrating how you can get WTA and rivalry in a balanced-state network.

Update: link to paper is fixed.

# Two new papers

Pradhan MA1, Blackford JA Jr1, Devaiah BN2, Thompson PS2, Chow CC3, Singer DS2, Simons SS Jr4.  Kinetically Defined Mechanisms and Positions of Action of Two New Modulators of Glucocorticoid Receptor-regulated Gene Induction.  J Biol Chem. 2016 Jan 1;291(1):342-54. doi: 10.1074/jbc.M115.683722. Epub 2015 Oct 26.

Abstract: Most of the steps in, and many of the factors contributing to, glucocorticoid receptor (GR)-regulated gene induction are currently unknown. A competition assay, based on a validated chemical kinetic model of steroid hormone action, is now used to identify two new factors (BRD4 and negative elongation factor (NELF)-E) and to define their sites and mechanisms of action. BRD4 is a kinase involved in numerous initial steps of gene induction. Consistent with its complicated biochemistry, BRD4 is shown to alter both the maximal activity (Amax) and the steroid concentration required for half-maximal induction (EC50) of GR-mediated gene expression by acting at a minimum of three different kinetically defined steps. The action at two of these steps is dependent on BRD4 concentration, whereas the third step requires the association of BRD4 with P-TEFb. BRD4 is also found to bind to NELF-E, a component of the NELF complex. Unexpectedly, NELF-E modifies GR induction in a manner that is independent of the NELF complex. Several of the kinetically defined steps of BRD4 in this study are proposed to be related to its known biochemical actions. However, novel actions of BRD4 and of NELF-E in GR-controlled gene induction have been uncovered. The model-based competition assay is also unique in being able to order, for the first time, the sites of action of the various reaction components: GR < Cdk9 < BRD4 ≤ induced gene < NELF-E. This ability to order factor actions will assist efforts to reduce the side effects of steroid treatments.

Li Y, Chow CC, Courville AB, Sumner AE, Periwal V. Modeling glucose and free fatty acid kinetics in glucose and meal tolerance test. Theor Biol Med Model. 2016 Mar 2;13:8. doi: 10.1186/s12976-016-0036-3.

Abstract:
BACKGROUND:
Quantitative evaluation of insulin regulation on plasma glucose and free fatty acid (FFA) in response to external glucose challenge is clinically important to assess the development of insulin resistance (World J Diabetes 1:36-47, 2010). Mathematical minimal models (MMs) based on insulin modified frequently-sampled intravenous glucose tolerance tests (IM-FSIGT) are widely applied to ascertain an insulin sensitivity index (IEEE Rev Biomed Eng 2:54-96, 2009). Furthermore, it is important to investigate insulin regulation on glucose and FFA in postprandial state as a normal physiological condition. A simple way to calculate the appearance rate (Ra) of glucose and FFA would be especially helpful to evaluate glucose and FFA kinetics for clinical applications.
METHODS:
A new MM is developed to simulate the insulin modulation of plasma glucose and FFA, combining IM-FSIGT with a mixed meal tolerance test (MT). A novel simple functional form for the appearance rate (Ra) of glucose or FFA in the MT is developed. Model results are compared with two other models for data obtained from 28 non-diabetic women (13 African American, 15 white).
RESULTS:
The new functional form for Ra of glucose is an acceptable empirical approximation to the experimental Ra for a subset of individuals. When both glucose and FFA are included in FSIGT and MT, the new model is preferred using the Bayes Information Criterion (BIC).
CONCLUSIONS:
Model simulations show that the new MM allows consistent application to both IM-FSIGT and MT data, balancing model complexity and data fitting. While the appearance of glucose in the circulation has an important effect on FFA kinetics in MT, the rate of appearance of FFA can be neglected for the time-period modeled.

# New review paper on GWAS

Comput Struct Biotechnol J. 2015 Nov 23;14:28-34
Uncovering the Genetic Architectures of Quantitative Traits.
Lee JJ, Vattikuti S, Chow CC.

Abstract
The aim of a genome-wide association study (GWAS) is to identify loci in the human genome affecting a phenotype of interest. This review summarizes some recent work on conceptual and methodological aspects of GWAS. The average effect of gene substitution at a given causal site in the genome is the key estimand in GWAS, and we argue for its fundamental importance. Implicit in the definition of average effect is a linear model relating genotype to phenotype. The fraction of the phenotypic variance ascribable to polymorphic sites with nonzero average effects in this linear model is called the heritability, and we describe methods for estimating this quantity from GWAS data. Finally, we show that the theory of compressed sensing can be used to provide a sharp estimate of the sample size required to identify essentially all sites contributing to the heritability of a given phenotype.
KEYWORDS:
Average effect of gene substitution; Compressed sensing; GWAS; Heritability; Population genetics; Quantitative genetics; Review; Statistical genetics

# New Papers

I have been negligent about posting new papers but here is a list.  I’ll expound on them in the future.

Lee, J. J., Vattikuti, S. & Chow, C. C. Uncovering the Genetic Architectures of Quantitative Traits. Computational and Structural Biotechnology Journal 14, 28–34 (2016).

Lenstra, T. L., Coulon, A., Chow, C. C. & Larson, D. R. Single-Molecule Imaging Reveals a Switch between Spurious and Functional ncRNA Transcription. MOLCEL 60, 597–610 (2015).

Stavreva, D. A. et al. Dynamics of chromatin accessibility and long-range interactions in response to glucocorticoid pulsing. Genome Res 25, 845–857 (2015).

# Paper on the effect of food intake fluctuations on body weight

Chow, C. C. & Hall, K. D. Short and long-term energy intake patterns and their implications for human body weight regulation. Physiology & Behavior 134:60–65 (2014). doi:10.1016/j.physbeh.2014.02.044

Abstract: Adults consume millions of kilocalories over the course of a few years, but the typical weight gain amounts to only a few thousand kilocalories of stored energy. Furthermore, food intake is highly variable from day to day and yet body weight is remarkably stable. These facts have been used as evidence to support the hypothesis that human body weight is regulated by active control of food intake operating on both short and long time scales. Here, we demonstrate that active control of human food intake on short time scales is not required for body weight stability and that the current evidence for long term control of food intake is equivocal. To provide more data on this issue, we emphasize the urgent need for developing new methods for accurately measuring energy intake changes over long time scales. We propose that repeated body weight measurements can be used along with mathematical modeling to calculate long-term changes in energy intake and thereby quantify adherence to a diet intervention and provide dynamic feedback to individuals that seek to control their body weight.

# New paper on global obesity

We have a new paper out in the World Health Organization Bulletin looking at the association between an increase in food supply and average weight gain:

Stefanie Vandevijvere, Carson C Chow, Kevin D Hall, Elaine Umali & Boyd A Swinburn. Increased food energy supply as a major driver of the obesity epidemic: a global analysis, Bulletin of the WHO 2015;93:446–456.

This paper extends the analysis we did in our paper on the US food supply to the rest of the world. In the US paper, we showed that an increase in food supply more than explains the increase in average body weight over the duration of the obesity epidemic, as predicted by our experimentally validated body weight model. I had been hoping to do the analysis on the rest of the world and was very happy that my colleagues in Australia and New Zealand were able to collate the global data, which was not a simple undertaking.

What we found was almost completely consistent with the hypothesis that food is the main driver of obesity everywhere. In more than half of the countries (45/83), the increase in food supply more than explains the increase in weight. In other mostly less developed nations (11/83), an increase in food was associated with an increase in body weight although it was not sufficient to explain all of the weight gain. Five countries had a decrease in both food and body weight. Five countries had decreases in food supply and an increase in body weight and finally three countries (Iran, Rwanda, and South Africa) had an increase in food but a decrease in body weight.

Now by formal logic, only one of these observations is inconsistent with the food push hypothesis. Recall that if A implies B then the only logical conclusion you can draw is that not B implies not A. Hence, if we hypothesize that increased food causes increased obesity then that means if we see no obesity then that implies no increase in food. Thus only three countries defied our hypothesis and they were Iran, Rwanda, and South Africa where obtaining accurate data is difficult.

The five countries that had a decrease in food but an increase in body weight do not dispute our hypothesis. They just show that increased food is not necessary, which we know is true. Decreased activity could also lead to increased weight and it is possible that this played a role in these countries and the 11 others where food was not sufficient to explain all of the weight increase.

I was already pretty convinced that food was the main driver of the obesity epidemic and this result puts it to rest for me. This is the main reason that I don’t believe that the obesity epidemic is a health problem per se. It is a social and economic problem.

# New paper on steroid-regulated gene expression

I am extremely pleased that the third leg of our theory on steroid-regulated gene expression is finally published.

Theory of partial agonist activity of steroid hormones
Abstract: The different amounts of residual partial agonist activity (PAA) of antisteroids under assorted conditions have long been useful in clinical applications but remain largely unexplained. Not only does a given antagonist often afford unequal induction for multiple genes in the same cell but also the activity of the same antisteroid with the same gene changes with variations in concentration of numerous cofactors. Using glucocorticoid receptors as a model system, we have recently succeeded in constructing from first principles a theory that accurately describes how cofactors can modulate the ability of agonist steroids to regulate both gene induction and gene repression. We now extend this framework to the actions of antisteroids in gene induction. The theory shows why changes in PAA cannot be explained simply by differences in ligand affinity for receptor and requires action at a second step or site in the overall sequence of reactions. The theory also provides a method for locating the position of this second site, relative to a concentration limited step (CLS), which is a previously identified step in glucocorticoid-regulated transactivation that always occurs at the same position in the overall sequence of events of gene induction. Finally, the theory predicts that classes of antagonist ligands may be grouped on the basis of their maximal PAA with excess added cofactor and that the members of each class differ by how they act at the same step in the overall gene induction process. Thus, this theory now makes it possible to predict how different cofactors modulate antisteroid PAA, which should be invaluable in developing more selective antagonists.

Steroids are crucial hormones in the body, which are involved in development and homeostasis. They regulate gene expression by first binding to nuclear receptors that freely float in the cytosol. The receptor-steroid complex is activated somehow and transported to the nucleus, where it binds to a hormone response element and initiates transcription. Steroids can either induce or repress genes in a dose dependent way and the dose-response function is generally a linear-fractional function. In our work, we modeled the whole sequence of events as a complex-building biochemical reaction sequence and showed that a linear-fractional dose response could only arise under some specific but biophysically plausible conditions. See herehere, and here for more background.

Given the importance of steroids and hormones, several important drugs target these receptors. They include tamoxifen and raloxifene, and RU486. These drugs are partial agonists in that bind to nuclear receptors and either, block, reduce, or even increase gene expression. However, it was not really known how partial agonists or antagonists work. In this paper, we show that they work by altering the affinity of some reaction downstream of receptor-ligand binding and thus they can do this in a gene specific way. We show that the activity of a given partial agonist can be reversed by some other downstream transcription factor provided it act after this reaction. The theory also explains why receptor-ligand binding affinity has no affect on the partial agonist activity. The theory makes specific predictions on the mechanisms of partial agonists based on how the maximal activity and the EC50 of the dose response change as you add various transcription factors.

The big problem with these drugs is that nuclear receptors act all over the body and thus the possibility of side effects is high. I think our theory could be used as a guide for developing new drugs or combinations of drugs that can target specific genes and reduce side effects.

# New paper on gene repression

CC Chow, KK Finn, GB Storchan, X Lu, X Sheng, SS Simons Jr., Kinetically-Defined Component Actions in Gene Repression. PLoS Comp Bio. 11:e1004122, (2015)

Abstract

Gene repression by transcription factors, and glucocorticoid receptors (GR) in particular, is a critical, but poorly understood, physiological response. Among the many unresolved questions is the difference between GR regulated induction and repression, and whether transcription cofactor action is the same in both. Because activity classifications based on changes in gene product level are mechanistically uninformative, we present a theory for gene repression in which the mechanisms of factor action are defined kinetically and are consistent for both gene repression and induction. The theory is generally applicable and amenable to predictions if the dose-response curve for gene repression is non-cooperative with a unit Hill coefficient, which is observed for GR-regulated repression of AP1LUC reporter induction by phorbol myristate acetate. The theory predicts the mechanism of GR and cofactors, and where they act with respect to each other, based on how each cofactor alters the plots of various kinetic parameters vs. cofactor. We show that the kinetically-defined mechanism of action of each of four factors (reporter gene, p160 coactivator TIF2, and two pharmaceuticals [NU6027 and phenanthroline]) is the same in GR-regulated repression and induction. What differs is the position of GR action. This insight should simplify clinical efforts to differentially modulate factor actions in gene induction vs. gene repression.

Author Summary

While the initial steps in steroid-regulated gene induction and repression are known to be identical, the same cannot be said of cofactors that modulate steroid-regulated gene activity. We describe the conditions under which a theoretical model for gene repression reveals the kinetically-defined mechanism and relative position of cofactor action. This theory has been validated by experimental results with glucocorticoid receptors. The mode and position of action of four factors is qualitatively identical in gene repression to that previously found in gene induction. What changes is the position of GR action. Therefore, we predict that the same kinetically-defined mechanism usually will be utilized by cofactors in both induction and repression pathways. This insight and simplification should facilitate clinical efforts to maximize desired outcomes in gene induction or repression.

I am so happy that this paper is finally published.  It was a two-year ordeal from the time I had the idea of what to do until it finally came out. This is the second leg of the three-legged stool for a theory of steroid-regulated gene expression. The first was developing the theory for gene induction (e.g. see here) that started over ten years ago when Stoney and I first talked about trying to understand his data and really took off when Karen Ong turned her summer internship into a two-year baccalaureate fellowship. She’s now finishing up the PhD part of her MD-PhD at the Courant Institute at NYU.

In the first leg, we showed that if the dose-response curve for steroid-regulated gene induction (i.e. gene product as a function of ligand concentration), had the form  $a x/ (c+x)$, (which has been variously called noncooperative, Michaelis-Menten function, Hill function with Hill coefficient equal to 1, hyperbolic, first order Hill dose response curve, to give a few), then the dose-response could be written down in closed form.  The theory considers gene induction to be a sequence of complex forming reactions $Y_{i-1} + X_{i} \leftrightarrow Y_i$ for $i = 1, 2, ..., n$, and the dose-response is given by $[Y_n]$ as a function of $[Y_0]$, which in general is a very high order polynomial which is not Michaelis-Menten. However,  when some biophysically plausible conditions on the parameters are met, the polynomial can be represented by the group of lower triangular matrices and can be solved exactly.  We can then use the formulae to make predictions for the mechanisms of various transcription factors.

However, steroids also repress genes and interestingly enough the repression curve is also noncooperative and is given by the linear fractional function $a + bx/(1 + c x)$. The question then was how does this work. I was puzzled for a while on how to solve this but then thought that if we believe that the transcription machinery after initiation is mostly conserved then the induction theory we had previously derived should still be in place. What is different is that in repression instead of steroids initiating the cascade, there was some other agonist and steroid repressed this. In our induction theory, we included the effects of activators and inhibitors from enzyme kinetics, which we called accelerators and decelerators to avoid confusing with previously used terms. Because of the group property of the reactions, basically any function you are interested in has linear-fractional form. I thus postulated that steroids, after binding to a nuclear steroid receptor, acts like a decelerator. I then had to work out all the possible cases for where the decelerator could act and the large number of them made the calculations rather tedious. As a result, I made lots of mistakes initially and the theory just wouldn’t fit the data. I finally had a breakthrough in the fall of 2013 when I was in Taiwan for a workshop and everything started to come together. It then took another six months to work out the details and write the paper, which was then followed by several back and forth’s with the referees, a major rewriting and a final acceptance a few months ago. In the process of working on this paper, I discovered a lot of properties about the induction system that I didn’t realize. I still didn’t believe it was finished until I saw it posted on the PLoS Comp Bio website this week.

I’m currently putting on the finishing touches for revisions on the third leg of the stool now. We have even reunited the band and convinced Karen to take some time away from her thesis to help finish it. This paper is about how partial agonists or antagonists like tamoxifen work, which could have implications for drug development and avoiding side effects. Steroids are not the only ligand that can activate a steroid-regulated gene. The steroid cream that you use for rashes consists of a highly potent steroid agonist. There are also molecules that block or impede the action of steroids by binding to steroid receptors and these are called partial agonists, antagonists or antisteroids. However, steroid receptors are widely expressed and that is why when you take them they can have severe side effects. Hence, it would be nice to be able to control where they act and by how much. This third leg paper is the theory behind how to do this.

# New paper on path integrals

Carson C. Chow and Michael A. Buice. Path Integral Methods for Stochastic Differential Equations. The Journal of Mathematical Neuroscience,  5:8 2015.

Abstract: Stochastic differential equations (SDEs) have multiple applications in mathematical neuroscience and are notoriously difficult. Here, we give a self-contained pedagogical review of perturbative field theoretic and path integral methods to calculate moments of the probability density function of SDEs. The methods can be extended to high dimensional systems such as networks of coupled neurons and even deterministic systems with quenched disorder.

This paper is a modified version of our arXiv paper of the same title.  We added an example of the stochastically forced FitzHugh-Nagumo equation and fixed the typos.

# New paper on steroid-regulated gene expression

Research Resource: Modulators of glucocorticoid receptor activity identified by a new high-throughput screening assay

John A. Blackford, Jr., Kyle R. Brimacombe, Edward J. Dougherty , Madhumita Pradhan, Min Shen, Zhuyin Li, Douglas S. Auld, Carson C. Chow, Christopher P. Austin, and S. Stoney Simons, Jr.

Abstract: Glucocorticoid steroids affect almost every tissue-type and thus are widely used to treat a variety of human pathologies. However, the severity of numerous side-effects limits the frequency and duration of glucocorticoid treatments. Of the numerous approaches to control off-target responses to glucocorticoids, small molecules and pharmaceuticals offer several advantages. Here we describe a new, extended high throughput screen in intact cells to identify small molecule modulators of dexamethasone-induced glucocorticoid receptor (GR) transcriptional activity. The novelty of this assay is that it monitors changes in both GR maximal activity (Amax) and EC50, or the position of the dexamethasone dose-response curve. Upon screening 1280 chemicals, ten with the greatest change in the absolute value of Amax or EC50 were selected for further examination. Qualitatively identical behaviors for 60 –90% of the chemicals were observed in a completely different system, suggesting that other systems will be similarly affected by these chemicals. Additional analysis of the ten chemicals in a recently described competition assay determined their kinetically-defined mechanism and site of action. Some chemicals had similar mechanisms of action despite divergent effects on the level of GR-induced product. These combined assays offer a straightforward method of identifying numerous new pharmaceuticals that can alter GR transactivation in ways that could be clinically useful.

# Paper on new version of Plink

The paper describing the updated version of the genome analysis software tool Plink has just been published.

Second-generation PLINK: rising to the challenge of larger and richer datasets
Christopher C Chang, Carson C Chow, Laurent CAM Tellier, Shashaank Vattikuti, Shaun M Purcell, and James J Lee

GigaScience 2015, 4:7  doi:10.1186/s13742-015-0047-8

Abstract
Background
PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotype likelihoods, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1’s primary data format.

Findings
To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, View MathML-time/constant-space Hardy-Weinberg equilibrium and Fisher’s exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. We have also developed an extension to the data format which adds low-overhead support for genotype likelihoods, phase, multiallelic variants, and reference vs. alternate alleles, which is the basis of our planned second release (PLINK 2.0).

Conclusions
The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.

Keywords: GWAS; Population genetics; Whole-genome sequencing; High-density SNP genotyping; Computational statistics

This project started out with us trying to do some genomic analysis that involved computing various distance metrics on sequence space. Programming virtuoso Chris Chang stepped in and decided to write some code to speed up the computations. His program, originally called wdist, was so good and fast that we kept asking him to put in more capabilities. Eventually,  he had basically replicated the suite of functions that Plink performed so he contacted Shaun Purcell, the author of Plink, if he could just call his code Plink too and Shaun agreed. We then ran a series of tests on various machines to check the speed-ups compared to the original Plink and gcta. If you do any GWAS analysis at all, I highly recommend you check out Plink 1.9.