Coming out of the closet: I like ENCODE results, and I think the “switches” may be functional

It is fair to say that ENCODE has been the big story of 2012 as far as biological research is concerned. There are many reasons for that, including ambition of the project, size of the collaboration, method of publication, amount of press coverage, and surprising findings. The two latter points also started controversies and heated debates that now promise to entertain us in 2013 too. Recently a very peculiarly worded article increased the heat a little, focusing on the criticism of ENCODE’s “80% of the genome is functional” by evolution scientists. For the French readers, a polite discussion of the criticism can be found in Marc Robinson-Rechavi’s blog.

The main criticism can be roughly summarised as: 1) We cannot dissociate biological function from evolution. 2) Evolution is due to selection. 3) Selection results in sequence conservation. 4) ENCODE’s “80% of the genome” is not conserved, therefore it is not functional. Now, I was not involved in ENCODE, and I am not an expert in genomics. However, I am a systems biologist, I know a bit about evolution, and the way most criticisms of ENCODE are expressed itch me a bit. Assumption 3) in particular is problematic for me.

[NB: Something I find quite funny in an ironical sort of way is that initial criticisms went as “we’ve known for years that there was no junk DNA”, see this post by Michael Eisen. And now it goes “they have not shown that DNA was not junk”.]

Let’s leave ENCODE for a moment, we’ll come back to it later. What annoys me the most with some of the vocal criticisms, is the sort of ultra-Dawkinsian position on evolution claiming that it acts solely at the level of the genetic atom (nucleotide or gene according to the context). It may be mostly true for the fundamental bricks of life. If you change the sequence of a core enzyme, the effects will be drastic, and therefore the sequence is very conserved. The more central the process, the higher the conservation (and the lower the redundancy, see my comment on redundancy and evolution, written before the blog era). However, it is not always so true for less central, and more complex (in terms of number of components), systems.

Even if hereditary information is transmitted via discrete pieces of genetic information, evolutionary selection acts at the level of the systems, which is what produce the phenotypes. This is illustrated in the lower panel of the famous Waddington epigenetic landscape.


Selective pressure acts on the surfaces. What is transmitted are the black blocks. The entangled cables between them represent the genotype-phenotype relation (or genotype-phenotype problem if we are more pessimistic than curious). I like this picture very much (while the upper panel is most often shown, in the context of canalization) because IMHO it represents the object of physiology (the surfaces), of genetics (the blocs) and of systems biology (the cables).

Let’s take the example of olfactory receptors (see for example this article). We have hundreds of them encoded by genes distributed in clusters. Roughly, the clusters are pretty conserved, at least in mammals. However within the clusters, frequent gene duplications and loss of function means that there are no strong conservations (see fig 3 of this article). Variability is even seen between chimps and humans. One possible explanation is the way olfaction works. Although some key receptors are associated with given chemical compounds, olfaction seems to work largely on combinatorics. Because a substance can bind several receptors, and the receptors response is integrated by the downstream neuronal network, a few hundreds receptors allow us to discriminate between a fantastic diversity of chemicals. What is selected is the variety of olfactory receptors rather than the receptors themselves. The fact that the olfactory receptors are not conserved does not mean they are not functional. I am pretty confident that if we took out all olfactory receptors of mice, and replaced them with human olfactory receptors, the mice would still be OK (less than the wild-type mice because of the loss of recognition diversity). But if we just take out all the olfactory receptors, the mice would die in nature. Guaranteed.

Now back to ENCODE. I will focus on the “switches”, the very large set of sites that were found to bind transcription factors. For a discussion of the somehow related issue of pervasive transcription (ENCODE’s “60%”), see the recent blog post by GENCODE. Millions of binding sites for transcription factors have been uncovered by ENCODE. They are not always directly linked with the regulation of gene expression, in that the binding of the transcription factor on the site does not physically affect the activity of an RNA polymerase nearby. Therefore, opponents criticize the adjective “functional” for those binding sites. This is where I think they are wrong, or at least a bit rigid. We are not talking about non-specific binding events here, noise due to random stickiness of transcription factors. We are talking about specific binding sites, that just happened not to be always associated with direct transcription events. To say that they are not functional is equivalent to say that the binding of calmodulin to neurogranin is not functional because it does not trigger the opening of a channel, or the activation of an enzyme. Of course it does not (or we think it does not), but the buffering effect changes completely the dynamics of activation for the other targets of calmodulin. The “function” of neurogranin is to bind calmodulin, and do nothing with it.

The potential consequences of the pervasive transcription factor binding on the regulation of nuclear dynamics are enormous. A few millions transcription factor binding sites mean around a few thousands binding sites per transcription factor. Transcription factors are present in the nucleus in fairly low quantities, from dozens to thousands of molecules. The result of having the same number of binding sites than the number of ligands (or more), is called ligand depletion. It is well known of pharmacologists, but less of cell biologists. For striking effects of ligand depletion in the case of cooperative sensors you can read our paper on the topic. For an example closer to gene regulation, see the paper of Matthieu Louis and Nicolas Buchler. So, if the number of binding sites for a transcription factor affects its dynamics, maybe there is a function associated with a number of binding sites. Maybe each binding site is not conserved. Maybe there is no selection pressure for it to be exactly there. But could we not envision that what is selected is the number of transcription factor binding sites, in a situation similar to the olfactory receptors one?

We know that the distribution of chromatin in the nucleus is not at all a plate of spaghetti, with a random location of each gene. There are precisely located transcription factories, where specific genes are targeted (see this paper from Peter Fraser’s group and a recent review). What if those millions of transcription factor binding sites were not randomly located but concentrated in nuclear subdomains? Would it not affect the dynamics of the transcription factors in the nucleus?

I have to confess I have not yet read the ENCODE papers dealing with epigenetic markings. However, would it no be cool if epigenetic marking could mask or reveal hundreds of binding sites in one go? As proposed in our paper on ligand depletion, that would be a possible mechanism to change quickly the dynamic range (the range of active concentration) and the ultrasensitivity of responses to transcription factors.

All that is perhaps science fiction. Perhaps those millions binding sites are randomly located in 3D. Perhaps their availability is not dynamically regulated. But scientific research is pushed a lot by “what if?” questions. Besides, those binding sites exist.

At the end of the day, what I see on one side is an enormous amount of data put out there by enthusiastic scientists for everyone to pick and study further. And what I see on the other side is bitterness, and now anger and rudeness. The proper answer to ENCODE claims lies in ENCODE data. Dear Dan Graur and followers, rather than criticize, with fuzzy arguments and rude words, the statistics of ENCODE, run you own. The data is out there for you.


2 thoughts on “Coming out of the closet: I like ENCODE results, and I think the “switches” may be functional

  1. Pingback: Spanking #ENCODE | The OpenHelix Blog

  2. The critique of ENCODE paper is that they asserted something – that 80% of the genome is functional – without any direct evidence that this was the case. You may believe that these sites somehow contribute to a 3D mass of binding sites that in aggregate has function, but the ENCODE data are silent as to this point. The paper simply said “we observed it –> it is functional”. Surely you demand some higher standard for proof, no? Or do you think we should simply assert that anything we ever observe in a living system is functional, and leave it at that?

    I also think your model for the function of these sites doesn’t make a lot of sense. Yes, it is likely that the places bound by certain transcription factors are likely to be non-randomly partitioned within the nucleus. And it is possible that that somehow the mass of sites that are not involved directly in regulating a gene in cis nonetheless contribute to the overall function by acting as some kind of carrier, enhancing the activity of those sites that are involved in regulation. You could also argue that each and every one of these sites is functional because it titrates the overall level of factor available to bind to “real” sites. By the same logic, every nucleotide in the genome is functional because it influences the rate and/or energy required to replicate DNA, thereby influencing developmental timing, size of cells, etc…

    But if these are the kinds of functions you are interested in, and if this is what you believe function means, then not only should you not be using ENCODE data, you should have been strongly opposed to the whole concept of ENCODE in the first place. The only reason to map out the precise locations of TF binding sites and other biochemical events across the genome is if you believe that the location of these binding events actually matters. And it is very clear that the ENCODE papers were asserting not that the entire genome is functional is some loose sense, they were asserting that 80% of the genome is functional in the narrow sense that the events they were seeing at a particular location were important because they occurred there.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s