[disclaimer: Most of this has been written in 2008. My main opinion did not change, but some of the data backing up the arguments might seem dated]
Over the last 15 years, it has become fashionable to launch “Virtual Cell Projects”. Some of those are sensible, and based on sounds methods (one of the best recent examples being the fairly complete model of an entire Mycoplasma cell – if we except membrane processes and spatial considerations.) However, some call for “whole-cell simulation at atomic resolution”. Is it a reasonable goal to pursue? Can-we count on increase computing power to help us?
I do not think so. Not only do I believe whole-cell simulations at atomic resolutions are not only envisionable in 10 or 15 years, but IMHO they are not envisionable in any foreseable future. I actualy consider such claims damageable by 1) feeding wrong expectancies to funders and the public, 2) diverting funding from feasible, even if less ambitious, projects and 3) down-scaling the achievments of real scientific modelling efforts (see my series on modelling success stories).
Two types of problems appear when one wants to model cellular functions at the atomic scale: practical and theoretical. Let’s evacuate the practical ones, because I think they are insurmountable and therefore less interesting. As of spring 2008, the largest molecular dynamic simulation I heard of involved ~1 million atoms during 50 nanoseconds (molecular dynamics of tobacco mosaic virus capside). Even this simulation used massive power, (>30 years of a desktop CPU). With much smaller systems (10 000 atoms), people succeded to go up to half a millisecond (as of 2008). In terms of spatial size, we are very far from even the smallest cells. The simulation of an E coli sized cell would require to simulate roughly 1 000 000 000 000 atomes, that is 1 million times what we do today. But the problem is that molecular dynamics does not scale linearly. Even with space discretisation, long-range interactions (e.g. electrostatic) mean we would need far more than 1 million times more power, several orders of magnitude more. In addition, we are talking about 50 nanosecond here. To model a simple cellular behaviour, we need to reach the second time scale. So in summary, we are talking about an increase of several orders of magnitude more than 10 to the power of 14. Even if the corrected Moore law (doubling every 2 years) stayed valid, we would be talking of more than a century here, not a couple of decades!
Now, IMHO the real problems are the theoretical ones. The point is that we do not really know how to perform those simulations. The force fields I am aware of (the ones I fiddle with in the past) AMBER, CHARMM and GROMACS, are perfectly fine to describe fine movements of atoms, formation of hydrogen bonds, rotation of side-chains etc. We learnt a lot from such molecular dynamics simulations, and a Nobel prize was granted for them in 2013. But as far as I know, those methods do not permit to describe (adequately) the large scale movements of large atomic assemblies such as protein secondary structure elements, and even less the formation of such structurat elements. We cannot simulate the opening of an ion channel or the large movements of motor proteins (although we can predict them, for instance using normal modes). Therefore, even if we could simulate milliseconds of biochemistry, the result would most probably be fairly inaccurate.
There are (at least) three ways out of there, anHere we go againd they all require to leave the atomic level. Plus they also all bump into computation problems.
* Coarse-grain simulations: We lump several atoms into one particle. That worked in many cases, and this is a very promising approach, particularly if the timescale of atom and atom ensembles are quite different. See for instance the worked being done on the tobacco mosaic virus mentioned above. However, (in 2008 and according to my limited knowledge) the methods are even more inaccurate than atomic resolution molecular dynamics. And we are just pushing the computation problem further, even with very coarse models (note that the accuracy decreases with the coarseness) we are only gaining a few orders of magnitude. One severe problem here, is that one cannot rely solely on physics principles (newtonian laws or quantum physics) to design the methods. But we are still at scales that make real time experimental quantitative measurements very difficult.
* Standard Computational Systems Biology approaches. We model the cellular processes at macroscopic levels, using differential equations to represent reaction diffusion processes. The big advantage is that we can measure the constants, the concentrations etc. That worked well in the past (think about Hodgkin-Huxley predicting ion channels, Dennis Noble predicting the heart pacemaker and Goldbeter and Koshland predicting the MAP kinase cascade), and still works well. But does-it work for whole cell simulation? No, it does not really. It does not because of what we call combinatorial explosion. If you have a protein that possess several state variables such as phosphorylation sites, you have to enumerate all the possible states. If you take the example of Calcium/calmodulin kinase II, and you decide to model only the main features, binding of ATP and calmoldulin, phosphorylation in T286 and T306, activity, and the fact that it is a dodecamer, you need 2 to the power of 60 different states, that is 1 billion of billions ordinary differential equations. In a cell, you would have thousands of such cases (think of the EGF receptor with its 38 phosphorylation sites!).
* Agent-based modelling (aka single-particle simulations or mesoscopic modelling). Here we abstract the molecules to their main features, far far above the atomic level, and we represent each molecule as an agent, that knows its state. That avoids the combinatorial explosion described above. But those simulation are still super-heavy. We simulated hundreds of molecules moving and interacting in a 3D block of 1 micrometer during seconds. Those simulations take days to months to run on the cluster (and they spit out terabytes of data, but that is another problem). However, the problem is that they scale even worsely than molecular dynamics. Dominic does not simulate the molecules he is not interested in. If he did simulate all the molecules of the dendritic spine, it would take all the CPUs of the planet during years.
So where is the solution? The solution is in multiscale simulations. Let’s simulate at atomic level when we need atomic level, and at higher levels when we need higher level descriptions. A simulation should always be done at a level where we can gain useful insights and possess experimental information to set-up the model and validate its predictions. The Nobel committee did not miss it when it attributed the 2013 chemistry prize “for the development of multiscale models for complex chemical systems”.
Update 19 December 2013
Here we go again, this time with the magic names of Stanford and Google. What they achieved with their “exaclyde cloud computing system” is of the same order of magnitude that what was done in 2008. So no extraordinary feat here. 2.5 millisecond of 60000 atoms. But that does not stop them to launch the “whole-cell-at-atomic-resolution” again.