Redundancy and importance, or “is an oyster baby any different from an aircraft?”

A very frequent statement we hear concerning biological systems can be expressed as follow:

The degree of functional redundancy observed in a subsystem reflects the importance of this subsystem within the system it belongs to.

This idea is very anthropomorphic, and based on what an engineer would consider a good design: Any critical subsystem should be redundant, so if one instance fails, the others maintain the system in a functional state. But a biological system has not been designed. It evolved. And the two processes are completely different.If I am an engineer who wants to build a new type of aircraft, I can hardly afford to loose one of them when packed with passengers. Or actually even one with only the test pilot on board. I will therefore create all the critical subsystems redundant, so the plane succeeds at least to land before a general failure makes it unflyable.

However, if I am an oyster the situation is quite different. How then, can I be sure to preserve my offspring from transmitting any negative deviations I would happen to have? If the critical subsystems are redundant, they will be able to survive the effects of those minor deleterious variations. As a result they will transmit those variations to their own offspring. Following the neutral theory of molecular evolution (Motoo Kimura), these variations can invade the population by genetic drift. Now, if my species encounters situations where an optimal function of the all redundant subsystems is required, it will be wiped-out from the surface of the earth. Unlucky. An alternative scenario: The redundancy of my critical subsystems has been kept to its minimum. If any of my offsprings experiences a small deviation in one of those subsystems, it will die quickly. But I do not care, because I am an oyster. I spawn millions of eggs. I can afford to loose 90% or more of them.

In fact, it seems very few of the critical subsystems in a cell are redundant. For instance, most of the enzymes producing the energy in the cell are unique. Same for the RNA production machinery. If you happend to have a problem in one of those enzymes, you die at a very early age, so you cannot “polute” the genome of the species. Redundancy appears only for “secondary” subsystems, typically dealing with feeding, signalling etc. Yes those subsystems are important for the proper life of higher organisms, and probably provide a selective advantage in a stable environment. But they are not crucial for life itself. Furthermore the diversity observed is rarely a true redundancy but allows to feed on a larger variety of substrates, sense more compounds etc. The redundancy only appears when the systems is stretched by the disappearance of other subsystems. True redundant systems would most probably just be eliminated (for a more informed discussion of gene duplications and losses, read this recent review)

Continuing on the topics of “importance”, one of the most irritating remarks one can still hear too often from hard-core molecular biologists is mutatis mutandis:

This gene is not important because the knock-outed mice display no phenotype.

A term has even been coined for that: the essentiality.

Gene essentiality is a pretty busy domain of research, and I am as far as it is possible to be an expert. So I am not going to discuss it. But I ressent the notion that equate “important” with “phenotype immediately apparent”. The problem comes from the way we analyse mutant animals (which is designed this way for very good reasons, that is not the issue here).

Consider the case of a car. What is a car supposed to do? To progress on a flat surface propulsed by its own engine. So we set-up an experimental environment, with a perfect flat surface. To eliminate any uncontrolled variables, we place this surface indoor, under a constant temperature and illumination. And of course we remove all the other vehicles from the environment. On my right, a control car. On my left, the same car without chock absorbers, ABS, without lighting at all, without hooter, with all doors but the driver’s one fused to the frame, and with pure water in the cooling system. Let’s start both engines, and drive the cars during 50 metres. Noticed the difference? No? Therefore none of the parts we removed was important. Well, how long do you thing you will drive the modified car at night on the London Orbital when it’s -10 degree Celsius and the surface is covered in ice? I will tell you: that day, you will find the ABS, lightning, anti-freeze etc. damned important. Even essential.

I once worked in a team studying a mouse mutant strain “without phenotype” (*). Until someone (**) decided to study aged mice (what a weird idea) and discovered that the brain degenerated quickly in those animals. Hard to find out, when for practical reason one uses only young animals. See: Zoli M, Picciotto MR, Ferrari R, Cocchi D, Changeux JP. Increased neurodegeneration during ageing in mice lacking high-affinity nicotine receptors. EMBO J. 1999 Mar 1;18(5):1235-44.

(*) Well, with very a very mild phenotype.


Can-we simulate a whole-cell at atomistic level? I don’t think so

[disclaimer: Most of this has been written in 2008. My main opinion did not change, but some of the data backing up the arguments might seem dated]

Over the last 15 years, it has become fashionable to launch “Virtual Cell Projects”. Some of those are sensible, and based on sounds methods (one of the best recent examples being the fairly complete model of an entire Mycoplasma cell – if we except membrane processes and spatial considerations.) However, some call for “whole-cell simulation at atomic resolution”. Is it a reasonable goal to pursue? Can-we count on increase computing power to help us?

I do not think so. Not only do I believe whole-cell simulations at atomic resolutions are not only envisionable in 10 or 15 years, but IMHO they are not envisionable in any foreseable future. I actualy consider such claims damageable by 1) feeding wrong expectancies to funders and the public, 2) diverting funding from feasible, even if less ambitious, projects and 3) down-scaling the achievments of real scientific modelling efforts (see my series on modelling success stories).

Two types of problems appear when one wants to model cellular functions at the atomic scale: practical and theoretical. Let’s evacuate the practical ones, because I think they are insurmountable and therefore less interesting. As of spring 2008, the largest molecular dynamic simulation I heard of involved ~1 million atoms during 50 nanoseconds (molecular dynamics of tobacco mosaic virus capside). Even this simulation used massive power, (>30 years of a desktop CPU). With much smaller systems (10 000 atoms), people succeded to go up to half a millisecond (as of 2008). In terms of spatial size, we are very far from even the smallest cells. The simulation of an E coli sized cell would require to simulate roughly 1 000 000 000 000 atomes, that is 1 million times what we do today. But the problem is that molecular dynamics does not scale linearly. Even with space discretisation, long-range interactions (e.g. electrostatic) mean we would need far more than 1 million times more power, several orders of magnitude more. In addition, we are talking about 50 nanosecond here. To model a simple cellular behaviour, we need to reach the second time scale. So in summary, we are talking about an increase of several orders of magnitude more than 10 to the power of 14. Even if the corrected Moore law (doubling every 2 years) stayed valid, we would be talking of more than a century here, not a couple of decades!

Now, IMHO the real problems are the theoretical ones. The point is that we do not really know how to perform those simulations. The force fields I am aware of (the ones I fiddle with in the past) AMBER, CHARMM and GROMACS, are perfectly fine to describe fine movements of atoms, formation of hydrogen bonds, rotation of side-chains etc. We learnt a lot from such molecular dynamics simulations, and a Nobel prize was granted for them in 2013. But as far as I know, those methods do not permit to describe (adequately) the large scale movements of large atomic assemblies such as protein secondary structure elements, and even less the formation of such structurat elements. We cannot simulate the opening of an ion channel or the large movements of motor proteins (although we can predict them, for instance using normal modes). Therefore, even if we could simulate milliseconds of biochemistry, the result would most probably be fairly inaccurate.

There are (at least) three ways out of there, anHere we go againd they all require to leave the atomic level. Plus they also all bump into computation problems.

* Coarse-grain simulations: We lump several atoms into one particle. That worked in many cases, and this is a very promising approach, particularly if the timescale of atom and atom ensembles are quite different. See for instance the worked being done on the tobacco mosaic virus mentioned above. However, (in 2008 and according to my limited knowledge) the methods are even more inaccurate than atomic resolution molecular dynamics. And we are just pushing the computation problem further, even with very coarse models (note that the accuracy decreases with the coarseness) we are only gaining a few orders of magnitude. One severe problem here, is that one cannot rely solely on physics principles (newtonian laws or quantum physics) to design the methods. But we are still at scales that make real time experimental quantitative measurements very difficult.

* Standard Computational Systems Biology approaches. We model the cellular processes at macroscopic levels, using differential equations to represent reaction diffusion processes. The big advantage is that we can measure the constants, the concentrations etc. That worked well in the past (think about Hodgkin-Huxley predicting ion channels, Dennis Noble predicting the heart pacemaker and Goldbeter and Koshland predicting the MAP kinase cascade), and still works well. But does-it work for whole cell simulation? No, it does not really. It does not because of what we call combinatorial explosion. If you have a protein that possess several state variables such as phosphorylation sites, you have to enumerate all the possible states. If you take the example of Calcium/calmodulin kinase II, and you decide to model only the main features, binding of ATP and calmoldulin, phosphorylation in T286 and T306, activity, and the fact that it is a dodecamer, you need 2 to the power of 60 different states, that is 1 billion of billions ordinary differential equations. In a cell, you would have thousands of such cases (think of the EGF receptor with its 38 phosphorylation sites!).

* Agent-based modelling (aka single-particle simulations or mesoscopic modelling). Here we abstract the molecules to their main features, far far above the atomic level, and we represent each molecule as an agent, that knows its state. That avoids the combinatorial explosion described above. But those simulation are still super-heavy. We simulated hundreds of molecules moving and interacting in a 3D block of 1 micrometer during seconds. Those simulations take days to months to run on the cluster (and they spit out terabytes of data, but that is another problem). However, the problem is that they scale even worsely than molecular dynamics. Dominic does not simulate the molecules he is not interested in. If he did simulate all the molecules of the dendritic spine, it would take all the CPUs of the planet during years.

So where is the solution? The solution is in multiscale simulations. Let’s simulate at atomic level when we need atomic level, and at higher levels when we need higher level descriptions. A simulation should always be done at a level where we can gain useful insights and possess experimental information to set-up the model and validate its predictions. The Nobel committee did not miss it when it attributed the 2013 chemistry prize “for the development of multiscale models for complex chemical systems”.

Update 19 December 2013

Here we go again, this time with the magic names of Stanford and Google. What they achieved with their “exaclyde cloud computing system” is of the same order of magnitude that what was done in 2008. So no extraordinary feat here. 2.5 millisecond of 60000 atoms. But that does not stop them to launch the “whole-cell-at-atomic-resolution” again.

Modelling success stories (3) Goldbeter and Koshland 1981

The third example of this series will be a bit controversial (I was already told so). Understanding its full impact requires more subtle than average understanding of biochemistry and enzyme kinetics, and in particular of the difference between zero and first order kinetics. At least it took me a fair bit of reading and thinking. It will perhaps be easier for you, my bright readers. In 1981, Albert Goldbeter (who will become the “Mr oscillation” of modelling, see his recent book “La vie oscillatoire : Au coeur des rythmes du vivant“) and Daniel Koshland, of “induced-fit” fame, proposed that cascades of coupled interconvertible enzymes could generate ultrasensitive response to an upstream signal. The main body of the work is described in a fairly well cited paper (733 citations according to Google scholar as of 2 July 2013):

Goldbeter A, Koshland DE Jr. An amplified sensitivity arising from covalent modification in biological systems. Proc Natl Acad Sci USA 1981 78(11): 6840-6844. PDF2

Stadtman and Chock had already shown that cascades of such enzymes could generate amplified responses, and if the same substrate was consumed at the different levels, “cooperativity” could appear for the consumption of this substrate in the first order domain (when the enzyme is limiting) (Stadtman and Chock (1977) Proc Natl Acad Sci USA, 74: 2761-2765 and 2766-2770). However Golbeter and Koshland showed that in the zeroth order range (when the substrate is limiting), ultrasensitivity could occur, without the need of multiple inputs at each level. In my opinion this paper is important for at least two reasons. First it described how ultrasensitive (“cooperative”, although nothing cooperate here) behaviours in signalling cascade can appear without mutimeric allosteric assemblies. Second, it predicted the possible existence of MAPK cascades a decade before their discovery (Gomez and Cohen (1991) Nature 353: 170-173). As for the famous Crick and Watson understatement, Golbeter and Koshland land their bomb in the discussion:

“Simple extension of the mathematics shows that the sensitivity can be propagated and enhanced in a multicycle network.”

They go on to add:

“It should be emphasized that the data are not yet available to say with certainty that this device for added sensitivity is actually utilized in biological systems […]”

Indeed. Of course since then, it has been shown with clear certainty that this device is actually utilised in biological systems. Below I show a figure from the Goldbeter and Koshland paper followed by a figure of the first computational model of MAP kinase cascade by Huang and Ferrell (Huang, Ferrell (1996) Ultrasensitivity in the mitogen-activated protein kinase cascade. Proc Natl Acad Sci USA 93: 10078-10083.)



Where are the paradigm shifts in biology?

Marc Robinson-Rechavi followed up in his blog on a post by Dan Graur about the use and abuse of “paradigm shifts” in biology

Dan is a bit cheeky with the way he uses “paradigm shift” as if in the “Kuhnian” sense. The articles he mentions that abuse the expression are not about Kuhn-like scientific revolutions. As far as I could understand from my readings, Kuhn’s paradigm shift is not a sudden result or discovery. It is the acceptance by the community of a new set of references. Kuhn’s paradigm shift is very much like Kimura’s substitution. Kimura’s substitution rate is not the rate of a molecular event at the level of DNA, but the rate at which this allele invades the population. Similarly, the paradigm shift is not itself a discovery or an article, although if often includes the common agreement of a few discoveries as key discrete events in a more continuous scientific revolution. The difference between Popper and Kuhn is that Popper draws a single continuous line of progress, while Kuhn draws a set of sigmoids, each one replacing the previous and hopefully explaining more of the world. This could be dispiriting because it means that whatever research you do, it will always come a time where it won’t be built on by future generations. However, it is important to note that past research is not erased. It is re-analysed within the context of the new paradigm. Newtonian mechanics is a limit-case of Einstein mechanics, Darwin’s natural selection is a new explanation of the earlier “transformism” etc.


Schematic view of scientific progress according to the Popper and Kuhn paradigm (ha ha …). The blue curves represent the increase of understanding of the world over time (by the entire population of scientists. I.e. it includes the pure increase of knowledge abd the increase of acceptance). The red dots are crucial discoveries/experiences/reinterpretations. Note that absolutely nothing in these drawings is random, from the slope of curves (which indicates the rate of new knowledge acquisition) to the position of the dots.

There were indeed several paradigm shifts in modern biology. Marc cited a few in the second part of his blog post (I believe the first part made the same mistake than Graur about the definition of paradigm shift).

One can also mention the shift from physiology – where one tries to explain quantitatively black-boxes (organs and cells) – to molecular biology – where one breaks the box and describe its components. Molecular biology really started between the two world wars, with the purification of proteins (Kossel), nucleic acids (Avery) etc. and became famous in the 1950s and 1960s with the description of the central dogma. However we are only talking about a few handfuls of scientists there. Of course so many of them got Nobel prizes (like a mutation hot spot, to continue on the genetic analogy). However, as a frame of reference to interpret the bulk of biological and biomedical data, we had to wait for the 1970s. In some fields, such as neuroscience, physiological thinking was still the orthodox point of view well into the 80s. Molecular biologists were then not considered as “real” neuroscientists, and their work deemed useless to explain the function of nervous system (interestingly an important factor in the shift came from dysfunctions. Because treating patient with molecules, that bound to pharmacological receptors, actually worked).

I believe we just witnessed another paradigm shift, from molecular biology – where one isolates a molecule, describes its structure and properties – to systems biology – where one puts back together all those molecules and try to understand how the behaviour of a system emerges from their interactions. Systems biology is of course not new. Systems theory originates from the beginning of XXth century and was used in physics soon after (cybernetics). In biology, interest started to arise in the 1960s, mostly in the field of metabolic network. However, the “systems allele” did not provide a better fitness than molecular biology and could therefore not invade the population (very little neutral evolution in science …). High-throughput data generation together with vast computational power and storage provided an altered environment that changed dramatically the fitnesses. Systems biology was “rediscovered” in the 90s and became the new frame of reference at the end of the last decade (under various guises, genomics, network biology, modelling etc.). A short personal view of systems biology history is given in one of my EBI courses.

Modelling success stories (2) Monod-Wyman-Changeux 1965

For the second model of this series, I will break my own rule limiting the topic to “systems-biology-like” models, i.e. models that are simulated with a computer to predict the behaviours of systems. However, a fair number of MWC models resulted in the instantiation of kinetics simulations, so I do not feel too bad about this breach. The reason to include the MWC model here is mainly because I think the work is one of the early examples where a model shed light on biochemical processes and led to a mechanism, rather than merely fit the results.

The model itself is described in a highly cited paper (5776 times according to Google Scholar on March 14th 2013):

Monod J, Wyman J, Changeux JP. On the nature of allosteric transitions: A plausible model. J Mol Biol 1965, 12: 88-118. PDF2

Contrarily to the Hodgkin-Huxley model, described earlier in this post, the main body of the work is located in a single page, the fourth of the paper. The rest of the paper is certainly interesting, and several thesis (or even careers) have been devoted to the analysis of a formula or a figure found in the other pages (several papers were even focused on the various footnotes, the discussions still going on after 50 years). However, the magic is entirely contained in this fourth page.

Cooperativity of binding had been known for a long time, ever since the work of Christian Bohr (the father of Niels Bohr, the quantum physicist) on binding of oxygen to hemoglobin. For an historical account see this article, to be published in PLoS computational biology and then on Wikipedia. Around the year 1960, it was discovered that enzymes also exhibited this kind of ultrasensitive behaviour. In particular the multimeric “allosteric” enzymes, where regulators bind to sites sterically distinct from the substrate, displayed positive cooperativity for the regulation. At that time, the explanations of the cooperativity relied on the Adair-Klotz paradigm, that postulated a progressive increase of affinity as the ligand bound more sites, or the Pauling one, based on only one microscopic affinity and an energy component coming from subunit interactions. In both cases, the mechanisms are inductionist, the ligand “instructing” the protein to change its binding site affinities or its inter-subunit interactions. In addition, the state function (the fraction of active proteins) and the binding function (the fraction of protein bound to the ligand) were identical (more exactly there was not even the notion that two different functions existed), something that was shown to be wrong for the enzymes.

The model developed by Monod and Changeux (Jeffrey Wyman always referred to the paper as “the Monod and Changeux paper”) relied on brutally simple and physically based assumptions:

  1. thermodynamic equilibrium: the proteins which activities are regulated by the binding of ligands exist in different interconvertible conformations, in thermodynamic equilibrium, even in the absence of ligand. This assumption is opposed to the induced-fit mechanism whereby the protein always exists in a conformation in the absence of ligand, and is always in the other conformation when bound to the ligand.
  2. different affinities for the two states: the two conformations display different affinities for the ligand. Consequently, the ligand will shift the equilibrium towards the state with the highest affinity (that is the lowest free energy). This is a selectionist mechanism rather than instructionist. The binding of a ligand no longer provoke the switch of conformation. Proteins flicker, with or without the ligand bound. However, the time spent in any given conformation depends on the presence of ligand (or the probability to be in a given conformation).
  3. all monomers of a multimer are in the same conformation: this assumption was, and still is, the most controversial. It is opposed to the notion of sequential transitions, whereby the monomers switch conformation progressively, as the ligands binds to them.

The rest followed from simple thermodynamics, explained by the two figures below.

MWC reaction scheme

Reaction scheme showing the binding of ligands to an allosteric dimer. c=KR/KT.

MWC energy diagram

Energy diagram showing the stabilisation effect of successive binding events.


The MWC model has been successfully used to explain the behaviour of many proteins, such as hemoglobin or allosteric enzymes, as mentioned above, but also neurotransmitter receptors, transcription factors, intracellular signalling mediators or scaffolding proteins. For an example of how MWC thinking help to understand signalling cascades, see our work on calcium signalling in synaptic function (Stefan et al. PNAS 2008, 105: 10768-10773; Stefan et al.  PLoS ONE 2012, 7(1): e29406; Li et al. PLoS ONE (2012), 7(9): e43810).

As for every useful theory, the MWC framework has since been refined and extended, for instance to encompass the interactions between several regulators, lattices of monomers etc.  I’ll finish by a little advertisement for a conference to celebrate the 50th anniversary of the allosteric regulation

Modelling success stories (1) Hodgkin-Huxley 1952

The model of Hodgkin-Huxley is one of the most (the most?) brilliant examples of computational model explaining quantitatively a living process. In addition, the work involving mathematical modelling, numerical simulation and data-based parametrization, it marks IMHO the starting point of Systems Biology (despite the fact that the name was coined by Bertalanffy in 1928, and the domain really exploded in 1998).

The model provides a mechanistic explanation of the propagation of action potentials in axons, based on the combined behaviours of a system of ionic channels. The model itself is described in a highly cited paper (12635 times according to Google Scholar on Feb 19th 2013; However we know that GS largely underestimate citations of papers published before the “web”):

Hodgkin AL, Huxley AF. A quantitative description of membrane current and its application to conduction and excitation in nerve. J Physiol. 1952 Aug;117(4):500-544. PDF2

However, this paper is the culmination of a fantastic series of articles published back to back in Journal of Physiology.

  • Hodgkin AL, Huxley AF, Katz B. Measurement of current-voltage relations in the membrane of the giant axon of Loligo. J Physiol. 1952 116(4):424-448
  • Hodgkin AL, Huxley AF. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J Physiol. 1952 116(4):449-472
  • Hodgkin AL, Huxley AF. The components of membrane conductance in the giant axon of Loligo. J Physiol. 1952 116(4):473-496
  • Hodgkin AL, Huxley AF. The dual effect of membrane potential on sodium conductance in the giant axon of Loligo. J Physiol. 1952 116(4):497-506

In total, we have 126 pages that changed the way we understand how nervous systems, and ultimately our brain, function. The last article is by far the most extraordinary piece of science I personally read.

Part I analyze their experimental results and should be given to every student as a model of scientific reasoning. Their conclusions predict the existence of voltage-sensing ionic channels, different for sodium and potassium, and even the existence of their segment S4, a charged domain sensing the difference of potential and moving accordingly. Note that in 1952, they had absolutely no clues about transmembrane channels, or the nature of the excitable membrane!

In Part II, Hodgkin and Huxley start from the description of an electrical circuit, a natural starting point for them since they recorded electrical properties of the giant squid axon, and progressively derive a model that mechanistically account for each biochemical event, each structural transition of the ionic channels. They even predict the existence of four gates for the channels. Using extremely accurate experimental measurements, they fit the model and determine the values of the different parameters regulating the opening and closing of sodium and potassium channels.


Electrical circuit equivalent of the Hodgkin-Huxley model. The resistors are ionic channels, the capacitance represents the plasma membrane (made of lipids, hence an insulator) and the battery represents the injected current (or voltage, according to the experimental set-up).

Part III puts Humpty Dumpty together again. Hodgkin and Huxley gathered a system of 4 differential equations, 1 for the voltage and 1 for each type of gate, and 6 assignment rule determining the propensity of each gate type to open or close in function of the voltage. Since the unique electrical computer of Cambridge University was out of order, they then simulated the model using a hand-operated machine! The rest is history, and Hodgkin and Huxley won the 1963 Nobel Prize in Physiology or Medicine. Hodgkin-Huxley models are still used in modern multi-compartment models of neuronal systems, for instance used in the Blue Brain Project. More information can be found in the BioModels Database model of the month writen by Melanie Stefan, and on the relevant Wikipedia page.

Coming out of the closet: I like ENCODE results, and I think the “switches” may be functional

It is fair to say that ENCODE has been the big story of 2012 as far as biological research is concerned. There are many reasons for that, including ambition of the project, size of the collaboration, method of publication, amount of press coverage, and surprising findings. The two latter points also started controversies and heated debates that now promise to entertain us in 2013 too. Recently a very peculiarly worded article increased the heat a little, focusing on the criticism of ENCODE’s “80% of the genome is functional” by evolution scientists. For the French readers, a polite discussion of the criticism can be found in Marc Robinson-Rechavi’s blog.

The main criticism can be roughly summarised as: 1) We cannot dissociate biological function from evolution. 2) Evolution is due to selection. 3) Selection results in sequence conservation. 4) ENCODE’s “80% of the genome” is not conserved, therefore it is not functional. Now, I was not involved in ENCODE, and I am not an expert in genomics. However, I am a systems biologist, I know a bit about evolution, and the way most criticisms of ENCODE are expressed itch me a bit. Assumption 3) in particular is problematic for me.

[NB: Something I find quite funny in an ironical sort of way is that initial criticisms went as “we’ve known for years that there was no junk DNA”, see this post by Michael Eisen. And now it goes “they have not shown that DNA was not junk”.]

Let’s leave ENCODE for a moment, we’ll come back to it later. What annoys me the most with some of the vocal criticisms, is the sort of ultra-Dawkinsian position on evolution claiming that it acts solely at the level of the genetic atom (nucleotide or gene according to the context). It may be mostly true for the fundamental bricks of life. If you change the sequence of a core enzyme, the effects will be drastic, and therefore the sequence is very conserved. The more central the process, the higher the conservation (and the lower the redundancy, see my comment on redundancy and evolution, written before the blog era). However, it is not always so true for less central, and more complex (in terms of number of components), systems.

Even if hereditary information is transmitted via discrete pieces of genetic information, evolutionary selection acts at the level of the systems, which is what produce the phenotypes. This is illustrated in the lower panel of the famous Waddington epigenetic landscape.


Selective pressure acts on the surfaces. What is transmitted are the black blocks. The entangled cables between them represent the genotype-phenotype relation (or genotype-phenotype problem if we are more pessimistic than curious). I like this picture very much (while the upper panel is most often shown, in the context of canalization) because IMHO it represents the object of physiology (the surfaces), of genetics (the blocs) and of systems biology (the cables).

Let’s take the example of olfactory receptors (see for example this article). We have hundreds of them encoded by genes distributed in clusters. Roughly, the clusters are pretty conserved, at least in mammals. However within the clusters, frequent gene duplications and loss of function means that there are no strong conservations (see fig 3 of this article). Variability is even seen between chimps and humans. One possible explanation is the way olfaction works. Although some key receptors are associated with given chemical compounds, olfaction seems to work largely on combinatorics. Because a substance can bind several receptors, and the receptors response is integrated by the downstream neuronal network, a few hundreds receptors allow us to discriminate between a fantastic diversity of chemicals. What is selected is the variety of olfactory receptors rather than the receptors themselves. The fact that the olfactory receptors are not conserved does not mean they are not functional. I am pretty confident that if we took out all olfactory receptors of mice, and replaced them with human olfactory receptors, the mice would still be OK (less than the wild-type mice because of the loss of recognition diversity). But if we just take out all the olfactory receptors, the mice would die in nature. Guaranteed.

Now back to ENCODE. I will focus on the “switches”, the very large set of sites that were found to bind transcription factors. For a discussion of the somehow related issue of pervasive transcription (ENCODE’s “60%”), see the recent blog post by GENCODE. Millions of binding sites for transcription factors have been uncovered by ENCODE. They are not always directly linked with the regulation of gene expression, in that the binding of the transcription factor on the site does not physically affect the activity of an RNA polymerase nearby. Therefore, opponents criticize the adjective “functional” for those binding sites. This is where I think they are wrong, or at least a bit rigid. We are not talking about non-specific binding events here, noise due to random stickiness of transcription factors. We are talking about specific binding sites, that just happened not to be always associated with direct transcription events. To say that they are not functional is equivalent to say that the binding of calmodulin to neurogranin is not functional because it does not trigger the opening of a channel, or the activation of an enzyme. Of course it does not (or we think it does not), but the buffering effect changes completely the dynamics of activation for the other targets of calmodulin. The “function” of neurogranin is to bind calmodulin, and do nothing with it.

The potential consequences of the pervasive transcription factor binding on the regulation of nuclear dynamics are enormous. A few millions transcription factor binding sites mean around a few thousands binding sites per transcription factor. Transcription factors are present in the nucleus in fairly low quantities, from dozens to thousands of molecules. The result of having the same number of binding sites than the number of ligands (or more), is called ligand depletion. It is well known of pharmacologists, but less of cell biologists. For striking effects of ligand depletion in the case of cooperative sensors you can read our paper on the topic. For an example closer to gene regulation, see the paper of Matthieu Louis and Nicolas Buchler. So, if the number of binding sites for a transcription factor affects its dynamics, maybe there is a function associated with a number of binding sites. Maybe each binding site is not conserved. Maybe there is no selection pressure for it to be exactly there. But could we not envision that what is selected is the number of transcription factor binding sites, in a situation similar to the olfactory receptors one?

We know that the distribution of chromatin in the nucleus is not at all a plate of spaghetti, with a random location of each gene. There are precisely located transcription factories, where specific genes are targeted (see this paper from Peter Fraser’s group and a recent review). What if those millions of transcription factor binding sites were not randomly located but concentrated in nuclear subdomains? Would it not affect the dynamics of the transcription factors in the nucleus?

I have to confess I have not yet read the ENCODE papers dealing with epigenetic markings. However, would it no be cool if epigenetic marking could mask or reveal hundreds of binding sites in one go? As proposed in our paper on ligand depletion, that would be a possible mechanism to change quickly the dynamic range (the range of active concentration) and the ultrasensitivity of responses to transcription factors.

All that is perhaps science fiction. Perhaps those millions binding sites are randomly located in 3D. Perhaps their availability is not dynamically regulated. But scientific research is pushed a lot by “what if?” questions. Besides, those binding sites exist.

At the end of the day, what I see on one side is an enormous amount of data put out there by enthusiastic scientists for everyone to pick and study further. And what I see on the other side is bitterness, and now anger and rudeness. The proper answer to ENCODE claims lies in ENCODE data. Dear Dan Graur and followers, rather than criticize, with fuzzy arguments and rude words, the statistics of ENCODE, run you own. The data is out there for you.