Julia vs Python

I was about to start my trek up Python mountain until Bard Ermentrout tipped me to the Julia language and I saw this speed table from here (lower is faster):

Fortran Julia Python R Matlab Octave Mathe-matica JavaScript Go
gcc 4.8.1 0.2 2.7.3 3.0.2 R2012a 3.6.4 8.0 V8 go1
fib 0.26 0.91 30.37 411.36 1992.00 3211.81 64.46 2.18 1.03
parse_int 5.03 1.60 13.95 59.40 1463.16 7109.85 29.54 2.43 4.79
quicksort 1.11 1.14 31.98 524.29 101.84 1132.04 35.74 3.51 1.25
mandel 0.86 0.85 14.19 106.97 64.58 316.95 6.07 3.49 2.36
pi_sum 0.80 1.00 16.33 15.42 1.29 237.41 1.32 0.84 1.41
rand_mat_stat 0.64 1.66 13.52 10.84 6.61 14.98 4.52 3.28 8.12
rand_mat_mul 0.96 1.01 3.41 3.98 1.10 3.41 1.16 14.60 8.51

Julia is a dynamic high level language like MATLAB and Python that is open source and developed at MIT. The syntax looks fairly simple and it is about as fast as C (Fortran looks like it still is the Ferrari of scientific computing). Matlab is fast for vector and matrix operations but deadly slow for loops. I had no idea that Mathematica was so fast. Although Julia is still relatively new and not nearly as expansive as Python, should I drop Python for Julia?

The MATLAB handcuff

The first computer language I learned was BASIC back in the stone age, which led directly to Fortran. These are procedural languages that allow the infamous GOTO statement, now shunned by the computer literati. Programming with the GOTO gives you an appreciation for why the Halting problem is undecidable.  Much of what I did in those days was to track down infinite loops. I was introduced to structured programming in university, where I learned Pascal. I didn’t really know what structured programming meant except that I no longer could use GOTO and there were data structures like records. I was forced to use APL at a summer job. I have little recollection of the language except that it was extremely terse and symbolic. It was fun to try to construct the shortest program possible to do the task. The ultimate program was the so-called “APL one liner”. APL gave me first hand experience of the noncomputability of Kolmogorov complexity. In graduate school I went back to Fortran, which was the default language to do scientific computing at that time. I also used the computer algebra system called Macsyma, which was much better than Mathematica. I used it to do Taylor expansions and perturbation theory. I was introduced to C and C++ in my first postdoc. That was an eye-opening experience as I never really understood how a computer worked until I programmed in C. Pointer arithmetic was a revelation. I now had such control and power. C++ was the opposite of C for me. Object oriented programming takes you very far away from the workings of a computer. I basically programmed exclusively in C for a decade – just C and XPP, which was a real game changer. I had no need for anything else until I got to NIH. It was only then that I finally sat down and programmed in MATLAB. I had resisted up to that point and still feel like it is cheating but I now almost do all of my programming in MATLAB, with a smattering of R and XPP of course. I’m also biased against MATLAB because it gave a wrong answer in a previous version. At first, I programmed in MATLAB as I would in C or Fortran but when it came down to writing the codes to estimate heritability directly from GWAS (see here), the matrix manipulating capabilities of MATLAB really became useful. I also learned that statistics is basically applied linear algebra. Now, when I code I think instinctively in matrix terms and it is very hard for me to go back to programming in C. (Although I did learn Objective C recently to write an iPhone App to predict body weight. But that was mostly point-and-click and programming by trial and error. The App does work though (download it here). I did that because I wanted to get a sense of what real programmers actually do.) My goal is to switch from MATLAB to Python and not rely on proprietary software. I encourage my fellows to use Python instead of MATLAB because it will be a cinch to learn MATLAB later if they already know Python. The really big barrier for me for all languages is to learn the ancillary stuff like what do you actually type to run programs, how does Python know where programs are, how do you read in data, how do you plot graphs, etc? In MATLAB, I just click on an icon and everything is there. I keep saying that I will uncuff myself from MATLAB one day and maybe this is the year that I actually do.

New Papers

Two new papers are now in print:
The first is on applying compressed sensing to genomics is now published in Gigascience. The summary of the paper is here and the link is here.
The second is on steroid-regulated gene induction and can be obtained here.
Biochemistry. 2014 Mar 25;53(11):1753-67. doi: 10.1021/bi5000178. Epub 2014 Mar 11.

A kinase-independent activity of Cdk9 modulates glucocorticoid receptor-mediated gene induction.


A gene induction competition assay has recently uncovered new inhibitory activities of two transcriptional cofactors, NELF-A and NELF-B, in glucocorticoid-regulated transactivation. NELF-A and -B are also components of the NELF complex, which participates in RNA polymerase II pausing shortly after the initiation of gene transcription. We therefore asked if cofactors (Cdk9 and ELL) best known to affect paused polymerase could reverse the effects of NELF-A and -B. Unexpectedly, Cdk9 and ELL augmented, rather than prevented, the effects of NELF-A and -B. Furthermore, Cdk9 actions are not blocked either by Ckd9 inhibitors (DRB or flavopiridol) or by two Cdk9 mutants defective in kinase activity. The mode and site of action of NELF-A and -B mutants with an altered NELF domain are similarly affected by wild-type and kinase-dead Cdk9. We conclude that Cdk9 is a new modulator of GR action, that Ckd9 and ELL have novel activities in GR-regulated gene expression, that NELF-A and -B can act separately from the NELF complex, and that Cdk9 possesses activities that are independent of Cdk9 kinase activity. Finally, the competition assay has succeeded in ordering the site of action of several cofactors of GR transactivation. Extension of this methodology should be helpful in determining the site and mode of action of numerous additional cofactors and in reducing unwanted side effects.

PMID: 24559102 [PubMed – indexed for MEDLINE]
PMCID: PMC3985961 [Available on 2015/2/21]

Marc Andreesen on EconTalk

If you have any interest in technology and the internet then you should definitely listen to this EconTalk podcast with Marc Andreesen, who wrote the first web browser Mosaic that led to the explosive growth of the internet. He has plenty of insightful things to say.  I remember first seeing Mosaic in 1994 as a postdoc in Boulder, Colorado. There I was, doing research that involved programming in C and C++. I was not really happy with what I was doing. I was having a hard time finding the next job. I was one of the first to play around with HTML, and it never occurred to me once that I could pack my bags, move to Silicon Valley, and try to get involved in the burgeoning tech revolution. It just makes me wonder what other obvious things I’m missing right now.

Addendum, 2014-6-5:  Actually, it may have been 1993 that I first saw Mosaic.

Integrated Information Theory

Neuroscientist Giulio Tononi has proposed that consciousness is integrated information and can be measured by a quantity called \phi, which is a measure of the amount of information that involves the entire system as a whole. I have never really found this theory to be entirely compelling. While I think that consciousness probably does require some amount of integrated information, I am skeptical that it is the only relevant measure. See here and here for some of my previous thoughts on the topic. One of the reasons that Tononi has proposed a single measure is because it is a way to sidestep what is known as “the hard problem of consciousness”. Instead of trying to explain how a collection of neurons would be endowed with a sense of self-awareness, he posits that consciousness is a property of information and the more \phi one has, the more conscious you become. So in this theory, rocks are not conscious but thermostats are minimally conscious.

Theoretical computer scientist Scott Aaronson has now weighed in on the topic (see here and here). In his inimitable style, Aaronson shows essentially that a large grid of XOR gates could have arbitrarily large \phi and hence be even more conscious than you or me.  He finds this to be highly implausible. Tononi then produced a 14 page response where he essentially doubles down on IIT and claims that indeed a planar array of XOR gates is conscious and we should not be surprised it is so. Aaronson also proposes that we try to solve the “pretty hard problem of consciousness”, which is to come up with a theory or means for deciding when something has consciousness. To me, the fact that we can’t come up with an empirical way to tell whether something is conscious is the best argument for dualism we have. It may even be plausible that the PHPC is undecidable in that solving it would entail the solution of the halting problem. I agree with philosopher David Chalmers (see here) that there are only two possible consistent theories of consciousness. The first is that it is an emergent property of the brain but it has no “causal influence” on events. In other words, consciousness is an epiphenomenon that just allows “us” to be an audience for the dynamical evolution of the universe. The second is that we live in a dualistic world of mind and matter. It is definitely worth reading the posts and the comments, where Chalmers chimes in.