Redundancy and importance, or “is an oyster baby any different from an aircraft?”

A very frequent statement we hear concerning biological systems can be expressed as follow:

The degree of functional redundancy observed in a subsystem reflects the importance of this subsystem within the system it belongs to.

This idea is very anthropomorphic, and based on what an engineer would consider a good design: Any critical subsystem should be redundant, so if one instance fails, the others maintain the system in a functional state. But a biological system has not been designed. It evolved. And the two processes are completely different.If I am an engineer who wants to build a new type of aircraft, I can hardly afford to loose one of them when packed with passengers. Or actually even one with only the test pilot on board. I will therefore create all the critical subsystems redundant, so the plane succeeds at least to land before a general failure makes it unflyable.

However, if I am an oyster the situation is quite different. How then, can I be sure to preserve my offspring from transmitting any negative deviations I would happen to have? If the critical subsystems are redundant, they will be able to survive the effects of those minor deleterious variations. As a result they will transmit those variations to their own offspring. Following the neutral theory of molecular evolution (Motoo Kimura), these variations can invade the population by genetic drift. Now, if my species encounters situations where an optimal function of the all redundant subsystems is required, it will be wiped-out from the surface of the earth. Unlucky. An alternative scenario: The redundancy of my critical subsystems has been kept to its minimum. If any of my offsprings experiences a small deviation in one of those subsystems, it will die quickly. But I do not care, because I am an oyster. I spawn millions of eggs. I can afford to loose 90% or more of them.

In fact, it seems very few of the critical subsystems in a cell are redundant. For instance, most of the enzymes producing the energy in the cell are unique. Same for the RNA production machinery. If you happend to have a problem in one of those enzymes, you die at a very early age, so you cannot “polute” the genome of the species. Redundancy appears only for “secondary” subsystems, typically dealing with feeding, signalling etc. Yes those subsystems are important for the proper life of higher organisms, and probably provide a selective advantage in a stable environment. But they are not crucial for life itself. Furthermore the diversity observed is rarely a true redundancy but allows to feed on a larger variety of substrates, sense more compounds etc. The redundancy only appears when the systems is stretched by the disappearance of other subsystems. True redundant systems would most probably just be eliminated (for a more informed discussion of gene duplications and losses, read this recent review)

Continuing on the topics of “importance”, one of the most irritating remarks one can still hear too often from hard-core molecular biologists is mutatis mutandis:

This gene is not important because the knock-outed mice display no phenotype.

A term has even been coined for that: the essentiality.

Gene essentiality is a pretty busy domain of research, and I am as far as it is possible to be an expert. So I am not going to discuss it. But I ressent the notion that equate “important” with “phenotype immediately apparent”. The problem comes from the way we analyse mutant animals (which is designed this way for very good reasons, that is not the issue here).

Consider the case of a car. What is a car supposed to do? To progress on a flat surface propulsed by its own engine. So we set-up an experimental environment, with a perfect flat surface. To eliminate any uncontrolled variables, we place this surface indoor, under a constant temperature and illumination. And of course we remove all the other vehicles from the environment. On my right, a control car. On my left, the same car without chock absorbers, ABS, without lighting at all, without hooter, with all doors but the driver’s one fused to the frame, and with pure water in the cooling system. Let’s start both engines, and drive the cars during 50 metres. Noticed the difference? No? Therefore none of the parts we removed was important. Well, how long do you thing you will drive the modified car at night on the London Orbital when it’s -10 degree Celsius and the surface is covered in ice? I will tell you: that day, you will find the ABS, lightning, anti-freeze etc. damned important. Even essential.

I once worked in a team studying a mouse mutant strain “without phenotype” (*). Until someone (**) decided to study aged mice (what a weird idea) and discovered that the brain degenerated quickly in those animals. Hard to find out, when for practical reason one uses only young animals. See: Zoli M, Picciotto MR, Ferrari R, Cocchi D, Changeux JP. Increased neurodegeneration during ageing in mice lacking high-affinity nicotine receptors. EMBO J. 1999 Mar 1;18(5):1235-44.

(*) Well, with very a very mild phenotype.

Can-we simulate a whole-cell at atomistic level? I don’t think so

[disclaimer: Most of this has been written in 2008. My main opinion did not change, but some of the data backing up the arguments might seem dated]

Over the last 15 years, it has become fashionable to launch “Virtual Cell Projects”. Some of those are sensible, and based on sounds methods (one of the best recent examples being the fairly complete model of an entire Mycoplasma cell – if we except membrane processes and spatial considerations.) However, some call for “whole-cell simulation at atomic resolution”. Is it a reasonable goal to pursue? Can-we count on increase computing power to help us?

I do not think so. Not only do I believe whole-cell simulations at atomic resolutions are not only envisionable in 10 or 15 years, but IMHO they are not envisionable in any foreseable future. I actualy consider such claims damageable by 1) feeding wrong expectancies to funders and the public, 2) diverting funding from feasible, even if less ambitious, projects and 3) down-scaling the achievments of real scientific modelling efforts (see my series on modelling success stories).

Two types of problems appear when one wants to model cellular functions at the atomic scale: practical and theoretical. Let’s evacuate the practical ones, because I think they are insurmountable and therefore less interesting. As of spring 2008, the largest molecular dynamic simulation I heard of involved ~1 million atoms during 50 nanoseconds (molecular dynamics of tobacco mosaic virus capside). Even this simulation used massive power, (>30 years of a desktop CPU). With much smaller systems (10 000 atoms), people succeded to go up to half a millisecond (as of 2008). In terms of spatial size, we are very far from even the smallest cells. The simulation of an E coli sized cell would require to simulate roughly 1 000 000 000 000 atomes, that is 1 million times what we do today. But the problem is that molecular dynamics does not scale linearly. Even with space discretisation, long-range interactions (e.g. electrostatic) mean we would need far more than 1 million times more power, several orders of magnitude more. In addition, we are talking about 50 nanosecond here. To model a simple cellular behaviour, we need to reach the second time scale. So in summary, we are talking about an increase of several orders of magnitude more than 10 to the power of 14. Even if the corrected Moore law (doubling every 2 years) stayed valid, we would be talking of more than a century here, not a couple of decades!

Now, IMHO the real problems are the theoretical ones. The point is that we do not really know how to perform those simulations. The force fields I am aware of (the ones I fiddle with in the past) AMBER, CHARMM and GROMACS, are perfectly fine to describe fine movements of atoms, formation of hydrogen bonds, rotation of side-chains etc. We learnt a lot from such molecular dynamics simulations, and a Nobel prize was granted for them in 2013. But as far as I know, those methods do not permit to describe (adequately) the large scale movements of large atomic assemblies such as protein secondary structure elements, and even less the formation of such structurat elements. We cannot simulate the opening of an ion channel or the large movements of motor proteins (although we can predict them, for instance using normal modes). Therefore, even if we could simulate milliseconds of biochemistry, the result would most probably be fairly inaccurate.

There are (at least) three ways out of there, anHere we go againd they all require to leave the atomic level. Plus they also all bump into computation problems.

* Coarse-grain simulations: We lump several atoms into one particle. That worked in many cases, and this is a very promising approach, particularly if the timescale of atom and atom ensembles are quite different. See for instance the worked being done on the tobacco mosaic virus mentioned above. However, (in 2008 and according to my limited knowledge) the methods are even more inaccurate than atomic resolution molecular dynamics. And we are just pushing the computation problem further, even with very coarse models (note that the accuracy decreases with the coarseness) we are only gaining a few orders of magnitude. One severe problem here, is that one cannot rely solely on physics principles (newtonian laws or quantum physics) to design the methods. But we are still at scales that make real time experimental quantitative measurements very difficult.

* Standard Computational Systems Biology approaches. We model the cellular processes at macroscopic levels, using differential equations to represent reaction diffusion processes. The big advantage is that we can measure the constants, the concentrations etc. That worked well in the past (think about Hodgkin-Huxley predicting ion channels, Dennis Noble predicting the heart pacemaker and Goldbeter and Koshland predicting the MAP kinase cascade), and still works well. But does-it work for whole cell simulation? No, it does not really. It does not because of what we call combinatorial explosion. If you have a protein that possess several state variables such as phosphorylation sites, you have to enumerate all the possible states. If you take the example of Calcium/calmodulin kinase II, and you decide to model only the main features, binding of ATP and calmoldulin, phosphorylation in T286 and T306, activity, and the fact that it is a dodecamer, you need 2 to the power of 60 different states, that is 1 billion of billions ordinary differential equations. In a cell, you would have thousands of such cases (think of the EGF receptor with its 38 phosphorylation sites!).

* Agent-based modelling (aka single-particle simulations or mesoscopic modelling). Here we abstract the molecules to their main features, far far above the atomic level, and we represent each molecule as an agent, that knows its state. That avoids the combinatorial explosion described above. But those simulation are still super-heavy. We simulated hundreds of molecules moving and interacting in a 3D block of 1 micrometer during seconds. Those simulations take days to months to run on the cluster (and they spit out terabytes of data, but that is another problem). However, the problem is that they scale even worsely than molecular dynamics. Dominic does not simulate the molecules he is not interested in. If he did simulate all the molecules of the dendritic spine, it would take all the CPUs of the planet during years.

So where is the solution? The solution is in multiscale simulations. Let’s simulate at atomic level when we need atomic level, and at higher levels when we need higher level descriptions. A simulation should always be done at a level where we can gain useful insights and possess experimental information to set-up the model and validate its predictions. The Nobel committee did not miss it when it attributed the 2013 chemistry prize “for the development of multiscale models for complex chemical systems”.

Update 19 December 2013

Here we go again, this time with the magic names of Stanford and Google. What they achieved with their “exaclyde cloud computing system” is of the same order of magnitude that what was done in 2008. So no extraordinary feat here. 2.5 millisecond of 60000 atoms. But that does not stop them to launch the “whole-cell-at-atomic-resolution” again.