# New paper in eLife

Kinetic competition during the transcription cycle results in stochastic RNA processing

Matthew L FergusonValeria de TurrisMurali PalangatCarson C ChowDaniel R Larson

Abstract

Synthesis of mRNA in eukaryotes involves the coordinated action of many enzymatic processes, including initiation, elongation, splicing, and cleavage. Kinetic competition between these processes has been proposed to determine RNA fate, yet such coupling has never been observed in vivo on single transcripts. In this study, we use dual-color single-molecule RNA imaging in living human cells to construct a complete kinetic profile of transcription and splicing of the β-globin gene. We find that kinetic competition results in multiple competing pathways for pre-mRNA splicing. Splicing of the terminal intron occurs stochastically both before and after transcript release, indicating there is not a strict quality control checkpoint. The majority of pre-mRNAs are spliced after release, while diffusing away from the site of transcription. A single missense point mutation (S34F) in the essential splicing factor U2AF1 which occurs in human cancers perturbs this kinetic balance and defers splicing to occur entirely post-release.

# New Papers

Two new papers are now in print:
The first is on applying compressed sensing to genomics is now published in Gigascience. The summary of the paper is here and the link is here.
The second is on steroid-regulated gene induction and can be obtained here.
Biochemistry. 2014 Mar 25;53(11):1753-67. doi: 10.1021/bi5000178. Epub 2014 Mar 11.

### Abstract

A gene induction competition assay has recently uncovered new inhibitory activities of two transcriptional cofactors, NELF-A and NELF-B, in glucocorticoid-regulated transactivation. NELF-A and -B are also components of the NELF complex, which participates in RNA polymerase II pausing shortly after the initiation of gene transcription. We therefore asked if cofactors (Cdk9 and ELL) best known to affect paused polymerase could reverse the effects of NELF-A and -B. Unexpectedly, Cdk9 and ELL augmented, rather than prevented, the effects of NELF-A and -B. Furthermore, Cdk9 actions are not blocked either by Ckd9 inhibitors (DRB or flavopiridol) or by two Cdk9 mutants defective in kinase activity. The mode and site of action of NELF-A and -B mutants with an altered NELF domain are similarly affected by wild-type and kinase-dead Cdk9. We conclude that Cdk9 is a new modulator of GR action, that Ckd9 and ELL have novel activities in GR-regulated gene expression, that NELF-A and -B can act separately from the NELF complex, and that Cdk9 possesses activities that are independent of Cdk9 kinase activity. Finally, the competition assay has succeeded in ordering the site of action of several cofactors of GR transactivation. Extension of this methodology should be helpful in determining the site and mode of action of numerous additional cofactors and in reducing unwanted side effects.

PMID: 24559102 [PubMed – indexed for MEDLINE]
PMCID: PMC3985961 [Available on 2015/2/21]

# New paper on genomics

James Lee and I have a new paper out: Lee and Chow, Conditions for the validity of SNP-based heritability estimation, Human Genetics, 2014. As I summarized earlier (e.g. see here and here), heritability is a measure of the proportion of the variance of some trait (like height or cholesterol levels) due to genetic factors. The classical way to estimate heritability is to regress standardized (mean zero, standard deviation one) phenotypes of close relatives against each other. In 2010, Jian Yang, Peter Visscher and colleagues developed a way to estimate heritability directly from the data obtained in Genome Wide Association Studies (GWAS), sometimes called GREML.  Shashaank Vattikuti and I quickly adopted this method and computed the heritability of metabolic syndrome traits as well as the genetic correlations between the traits (link here). Unfortunately, our methods section has a lot of typos but the corrected Methods with the Matlab code can be found here. However, I was puzzled by the derivation of the method provided by the Yang et al. paper.  This paper is our resolution.  The technical details are below the fold.

# New paper on neural networks

Michael Buice and I have a new paper in Frontiers in Computational Neuroscience as well as on the arXiv (the arXiv version has fewer typos at this point). This paper partially completes the series of papers Michael and I have written about developing generalized activity equations that include the effects of correlations for spiking neural networks. It combines two separate formalisms we have pursued over the past several years. The first was a way to compute finite size effects in a network of coupled deterministic oscillators (e.g. see here, herehere and here).  The second was to derive a set of generalized Wilson-Cowan equations that includes correlation dynamics (e.g. see here, here, and here ). Although both formalisms utilize path integrals, they are actually conceptually quite different. The first formalism adapted kinetic theory of plasmas to coupled dynamical systems. The second used ideas from field theory (i.e. a two-particle irreducible effective action) to compute self-consistent moment hierarchies for a stochastic system. This paper merges the two ideas to generate generalized activity equations for a set of deterministic spiking neurons.

# Paper on compressed sensing and genomics

New paper on the arXiv. The next step after the completion of the Human Genome Project, was the search for genes associated with diseases such as autism or diabetes. However, after spending hundreds of millions of dollars, we find that there are very few common variants of genes with large effects. This doesn’t mean that there aren’t genes with large effect. The growth hormone gene definitely has a large effect on height. It just means that variations of genes that are common among people have small effects on the phenotype. Given the results of Fisher, Wright, Haldane and colleagues, this was probably expected as the most likely scenario and recent results measuring narrow-sense heritability directly from genetic markers (e.g. see this) confirms this view.

Current GWAS microarrays consider about a million or two markers and this is increasing rapidly. Narrow-sense heritability refers to the additive or linear genetic variance, which means the phenotype is given by the linear model $y= Z\beta + \eta$, where $y$ is the phenotype vector, $Z$ is the genotype matrix, $\beta$ are all the genetic effects we want to recover, and $\eta$ are all the nonadditive components including environmental effects. This is a classic linear regression problem. The problem comes when the number of coefficients $\beta$ far exceeds the number of people in your sample, which is the case for genomics. Compressed sensing is a field of high dimensional statistics that addresses this specific problem. People such as David Donoho, Emmanuel Candes and Terence Tao have proven under fairly general conditions that if the number of nonzero coefficients are sparse compared to the number samples, then the effects can be completely recovered using L1 penalized optimization algorithms such as the lasso or approximate message passing. In this paper, we show that these ideas can be applied to genomics.

Here is Steve Hsu’s summary of the paper

Application of compressed sensing to genome wide association studies and genomic selection

(Submitted on 8 Oct 2013)

We show that the signal-processing paradigm known as compressed sensing (CS) is applicable to genome-wide association studies (GWAS) and genomic selection (GS). The aim of GWAS is to isolate trait-associated loci, whereas GS attempts to predict the phenotypic values of new individuals on the basis of training data. CS addresses a problem common to both endeavors, namely that the number of genotyped markers often greatly exceeds the sample size. We show using CS methods and theory that all loci of nonzero effect can be identified (selected) using an efficient algorithm, provided that they are sufficiently few in number (sparse) relative to sample size. For heritability h2 = 1, there is a sharp phase transition to complete selection as the sample size is increased. For heritability values less than one, complete selection can still occur although the transition is smoothed. The transition boundary is only weakly dependent on the total number of genotyped markers. The crossing of a transition boundary provides an objective means to determine when true effects are being recovered. For h2 = 0.5, we find that a sample size that is thirty times the number of nonzero loci is sufficient for good recovery.

 Comments: Main paper (27 pages, 4 figures) and Supplement (5 figures) combined Subjects: Genomics (q-bio.GN); Applications (stat.AP) Cite as: arXiv:1310.2264 [q-bio.GN] (or arXiv:1310.2264v1 [q-bio.GN] for this version)

# New paper on population genetics

James Lee and I just published a paper entitled “The causal meaning of Fisher’s average effect” in the journal Genetics Research. The paper can be obtained here. This paper is the brainchild of James and I just helped him out with some of the proofs.  James’s take on the paper can be read here. The paper resolves a puzzle about the incommensurability of Ronald Fisher’s two definitions of the average effect noted by population geneticist D.S. Falconer three decades ago.

Fisher was well known for both brilliance and obscurity and people have long puzzled over the meaning of some of his work.  The concept of the average effect is extremely important for population genetics but it is not very well understood. The field of population genetics was invented in the early twentieth century by luminaries such as Fisher, Sewall Wright, and JBS Haldane to reconcile Darwin’s theory of evolution with Mendelian genetics. This is a very rich field that has been somewhat forgotten. People in mathematical,  systems, computational, and quantitative biology really should be fully acquainted with the field.

For those who are unacquainted with genetics, here is a quick primer to understand the paper. Certain traits, like eye colour or the ability to roll your tongue, are affected by your genes. Prior to the discovery of the structure of DNA, it was not clear what genes were, except that they were the primary discrete unit of genetic inheritance. These days it usually refers to some region on the genome. Mendel’s great insight was that genes come in pairs, which we now know to correspond to the two copies of each of the 23 chromosomes we have.  A variant of a particular gene is called an allele.  Traits can depend on genes (or more accurately genetic loci) linearly or nonlinearly. Consider a quantitative trait that depends on a single genetic locus that has two alleles, which we will call a and A. This means that a person will have one of three possible genotypes: 1) homozygous in A (i.e. have two A alleles), 2) heterozygous (have one of each), or 3) homozygous in a (i.e. have no A alleles). If the locus is linear then if you plot the measure of the trait (e.g. height) against the number of A alleles, you will get a straight line. For example, suppose allele A contributes a tenth of a centimetre to height. Then people with one A allele will be on average one tenth of a centimetre taller than those with no A alleles and those with two A alleles will be two tenths taller. The familiar notion of dominance is a nonlinear effect. So for example, the ability to roll your tongue is controlled by a single gene. There is a dominant rolling allele and a recessive nonrolling allele. If you have at least one rolling allele, you can roll your tongue.

The average effect of a gene substitution is the average change in a trait if one allele is substituted for another. A crucial part of population genetics is that you always need to consider averages. This is because genes are rarely completely deterministic. They can be influenced by the environment or other genes. Thus, in order to define the effect of the gene, you need to average over these other influences. This then leads to a somewhat ambiguous definition of average effect and Fisher actually came up with two. The first, and as James would argue the primary definition, is a causal one in that we want to measure the average effect of a gene if you experimentally substituted one allele for another prior to development and influence by the environment. A second correlation definition would simply be to plot the trait against the number of alleles as in the example above. The slope would then be the average effect. This second definition looks at the correlation between the gene and the trait but as the old saying goes “correlation does not imply causation”. For example, the genetic loci may not have any effect on the trait but happens to be strongly correlated with a true causal locus (in the population you happen to be examining). Distinguishing between genes that are merely associated with a trait from ones that are actually causal remains an open problem in genome wide association studies.

Our paper goes over some of the history and philosophy of the tension between these two definitions. We wrote the paper because these two definitions do not always agree and we show under what conditions they do agree. The main reason they don’t agree is that averages will depend on the background over which you average. For a biallelic gene, there are 2 alleles but 3 genotypes. The distribution of alleles in a population is governed by two parameters. It’s not enough to specify the frequency of one allele. You also need to know the correlation between alleles. The regression definition matches the causal definition if a particular function representing this correlation is held fixed while the experimental allele substitutions under the causal definition are carried out. We also considered the multi-allele and multi-loci case in as much generality as we could.

# New paper on childhood growth and obesity

Kevin D Hall, Nancy F Butte, Boyd A Swinburn, Carson C Chow. Dynamics of childhood growth and obesity: development and validation of a quantitative mathematical model. Lancet Diabetes and Endocrinology 2013 .

You can read the press release here.

In order to curb childhood obesity, we need a good measure of how much food kids should eat. Although people like Claire Wang have proposed quantitative models in the past that are plausible, Kevin Hall and I have insisted that this is a hard problem because we don’t fully understand childhood growth. Unlike adults, who are more or less in steady state, growing children are a moving target. After a few fits and starts we finally came up with a satisfactory model that modifies our two compartment adult body composition model to incorporate growth. That previous model partitioned excess energy intake into fat and lean compartments according to the Forbes rule, which basically says that the ratio of added fat to lean is proportional to how much fat you have so the more fat you have the more excess Calories go to fat. The odd consequence of that model is that the steady state body weight is not unique but falls on a one dimensional curve. Thus there is a whole continuum of possible body weights for a fixed diet and lifestyle. I actually don’t believe this and have a modification to fix it but that is a future story.

What puzzled me about childhood growth was how do we know how much more to eat as we grow? After some thought, I realized that what we could do is to eat enough to maintain the fraction of body fat at some level, using leptin as a signal perhaps, and then tap off the energy stored in fat when we needed to grow. So just like we know how much gasoline (petrol) to add by simply filling the tank when it’s empty, we simply eat to keep our fat reserves at some level. In terms of the model, this is a symmetry breaking term that transfers energy from the fat compartment to the lean compartment. In my original model, I made this term a constant and had food intake increase to maintain the fat to lean ratio and showed using singular perturbation theory that his would yield growth that was qualitatively similar to the real thing. This then sat languishing until Kevin had the brilliant idea to make the growth term time dependent and fit it to actual data that Nancy Butte and Boyd Swinburn had taken. We could then fit the model to normal weight and obese kids to quantify how much more obese kids eat, which is more than previously believed. Another nice thing is that when the child stops growing the model is automatically the adult model!