Coming out of the closet: I like ENCODE results, and I think the “switches” may be functional

It is fair to say that ENCODE has been the big story of 2012 as far as biological research is concerned. There are many reasons for that, including ambition of the project, size of the collaboration, method of publication, amount of press coverage, and surprising findings. The two latter points also started controversies and heated debates that now promise to entertain us in 2013 too. Recently a very peculiarly worded article increased the heat a little, focusing on the criticism of ENCODE’s “80% of the genome is functional” by evolution scientists. For the French readers, a polite discussion of the criticism can be found in Marc Robinson-Rechavi’s blog.

The main criticism can be roughly summarised as: 1) We cannot dissociate biological function from evolution. 2) Evolution is due to selection. 3) Selection results in sequence conservation. 4) ENCODE’s “80% of the genome” is not conserved, therefore it is not functional. Now, I was not involved in ENCODE, and I am not an expert in genomics. However, I am a systems biologist, I know a bit about evolution, and the way most criticisms of ENCODE are expressed itch me a bit. Assumption 3) in particular is problematic for me.

[NB: Something I find quite funny in an ironical sort of way is that initial criticisms went as “we’ve known for years that there was no junk DNA”, see this post by Michael Eisen. And now it goes “they have not shown that DNA was not junk”.]

Let’s leave ENCODE for a moment, we’ll come back to it later. What annoys me the most with some of the vocal criticisms, is the sort of ultra-Dawkinsian position on evolution claiming that it acts solely at the level of the genetic atom (nucleotide or gene according to the context). It may be mostly true for the fundamental bricks of life. If you change the sequence of a core enzyme, the effects will be drastic, and therefore the sequence is very conserved. The more central the process, the higher the conservation (and the lower the redundancy, see my comment on redundancy and evolution, written before the blog era). However, it is not always so true for less central, and more complex (in terms of number of components), systems.

Even if hereditary information is transmitted via discrete pieces of genetic information, evolutionary selection acts at the level of the systems, which is what produce the phenotypes. This is illustrated in the lower panel of the famous Waddington epigenetic landscape.

WaddingtonB

Selective pressure acts on the surfaces. What is transmitted are the black blocks. The entangled cables between them represent the genotype-phenotype relation (or genotype-phenotype problem if we are more pessimistic than curious). I like this picture very much (while the upper panel is most often shown, in the context of canalization) because IMHO it represents the object of physiology (the surfaces), of genetics (the blocs) and of systems biology (the cables).

Let’s take the example of olfactory receptors (see for example this article). We have hundreds of them encoded by genes distributed in clusters. Roughly, the clusters are pretty conserved, at least in mammals. However within the clusters, frequent gene duplications and loss of function means that there are no strong conservations (see fig 3 of this article). Variability is even seen between chimps and humans. One possible explanation is the way olfaction works. Although some key receptors are associated with given chemical compounds, olfaction seems to work largely on combinatorics. Because a substance can bind several receptors, and the receptors response is integrated by the downstream neuronal network, a few hundreds receptors allow us to discriminate between a fantastic diversity of chemicals. What is selected is the variety of olfactory receptors rather than the receptors themselves. The fact that the olfactory receptors are not conserved does not mean they are not functional. I am pretty confident that if we took out all olfactory receptors of mice, and replaced them with human olfactory receptors, the mice would still be OK (less than the wild-type mice because of the loss of recognition diversity). But if we just take out all the olfactory receptors, the mice would die in nature. Guaranteed.

Now back to ENCODE. I will focus on the “switches”, the very large set of sites that were found to bind transcription factors. For a discussion of the somehow related issue of pervasive transcription (ENCODE’s “60%”), see the recent blog post by GENCODE. Millions of binding sites for transcription factors have been uncovered by ENCODE. They are not always directly linked with the regulation of gene expression, in that the binding of the transcription factor on the site does not physically affect the activity of an RNA polymerase nearby. Therefore, opponents criticize the adjective “functional” for those binding sites. This is where I think they are wrong, or at least a bit rigid. We are not talking about non-specific binding events here, noise due to random stickiness of transcription factors. We are talking about specific binding sites, that just happened not to be always associated with direct transcription events. To say that they are not functional is equivalent to say that the binding of calmodulin to neurogranin is not functional because it does not trigger the opening of a channel, or the activation of an enzyme. Of course it does not (or we think it does not), but the buffering effect changes completely the dynamics of activation for the other targets of calmodulin. The “function” of neurogranin is to bind calmodulin, and do nothing with it.

The potential consequences of the pervasive transcription factor binding on the regulation of nuclear dynamics are enormous. A few millions transcription factor binding sites mean around a few thousands binding sites per transcription factor. Transcription factors are present in the nucleus in fairly low quantities, from dozens to thousands of molecules. The result of having the same number of binding sites than the number of ligands (or more), is called ligand depletion. It is well known of pharmacologists, but less of cell biologists. For striking effects of ligand depletion in the case of cooperative sensors you can read our paper on the topic. For an example closer to gene regulation, see the paper of Matthieu Louis and Nicolas Buchler. So, if the number of binding sites for a transcription factor affects its dynamics, maybe there is a function associated with a number of binding sites. Maybe each binding site is not conserved. Maybe there is no selection pressure for it to be exactly there. But could we not envision that what is selected is the number of transcription factor binding sites, in a situation similar to the olfactory receptors one?

We know that the distribution of chromatin in the nucleus is not at all a plate of spaghetti, with a random location of each gene. There are precisely located transcription factories, where specific genes are targeted (see this paper from Peter Fraser’s group and a recent review). What if those millions of transcription factor binding sites were not randomly located but concentrated in nuclear subdomains? Would it not affect the dynamics of the transcription factors in the nucleus?

I have to confess I have not yet read the ENCODE papers dealing with epigenetic markings. However, would it no be cool if epigenetic marking could mask or reveal hundreds of binding sites in one go? As proposed in our paper on ligand depletion, that would be a possible mechanism to change quickly the dynamic range (the range of active concentration) and the ultrasensitivity of responses to transcription factors.

All that is perhaps science fiction. Perhaps those millions binding sites are randomly located in 3D. Perhaps their availability is not dynamically regulated. But scientific research is pushed a lot by “what if?” questions. Besides, those binding sites exist.

At the end of the day, what I see on one side is an enormous amount of data put out there by enthusiastic scientists for everyone to pick and study further. And what I see on the other side is bitterness, and now anger and rudeness. The proper answer to ENCODE claims lies in ENCODE data. Dear Dan Graur and followers, rather than criticize, with fuzzy arguments and rude words, the statistics of ENCODE, run you own. The data is out there for you.