What to do and not to do in advanced modelling courses

I previously introduced our in silico systems biology course. After 5 years of this course, I collected a few lessons that are probably applicable to any advanced course. Nothing very new or surprising, but worth keeping in mind when organising these teaching events.

Select the students well

Beware of the wrong expectations, and of the students who do not find what they thought they would. Disappointed students can wreak the atmosphere of a course. Beware that terminologies are different in different domains. One of the most overloaded terms is “model”. 3D structure model, Hidden markov model, general linear model, chemical kinetics model, all those are models. But they address different population. Systems Biology itself is problematic. Choose also the level of the course and stick to it when selecting the students. Even if there is not the expected number of applicant (fortunately not a problem for our in silico systems biology course anymore), do not be tempted to select inadequate candidate. Better take on less students than having a few students bored or unable to follow. Our course is advanced, and covers quite a lot of ground. We cannot expect all students to be expert in every aspect of the course. However, by selecting students who are skilled in at least one aspect of the course (and balancing the expertises), we liven up the lessons (more interesting questions and discussions) and students become themselves “associated trainers”.

More hands-on, practicals, tutorials

Students learn with their fingers. A demo will never replace an actual hands-on, where the students make the mistakes and fix them (with the help of trainers). And of course, keep the lecturers from diving in their own research and give scientific presentations. This is a course, not a conference. If needed, organise special scientific presentations a few times during the course, but not in the lessons.

Focus on concrete applications of tools

Avoid lengthy descriptions of the theoretical basis of algorithms. It is good that students learn what is under the bonnet, and can choose solutions. But (in general) they are here to learn how to use those tools for their research, not to develop the next generation of them. Two complementary approaches are 1) building toy examples, that illustrate specific uses, and 2) using famous simple examples from the literature.

Do not try to cram too much in the course

It is better to explain well a typical set of techniques, than cover inadequately the whole field. It is generally not possibly to present all the approaches used in a field of computational biology. Even a seasonned researcher in the field does not master all of them. Introduce very carefully the common basis. And then move on to a few examples of more advanced approaches. If the basics are well understood, and the students are really using the content of the course for their research, they will be able to continue training on their own.

Engage the students

It is very important that the students feel part of the course. Those events last only one week or two. The students needs to bind with the organisers, the trainers and between themselves immediatedly. Make them present their work the first day, maybe with one slide each. Organise poster sessions. Real poster sessions, where students are kept around the posters. Drinks and snacks are a good methods if they are located at the same place and keeps the students there. If you selected the students wisely (see first point), they should be interested in each other research.

Try to keep trainers around

So they can interact with students outside of their presentation/tutorials. It is very difficult. You choose the best trainers, so they are obviously very busy people. But sometimes it is better to choose better trainers than better scientists. Also select your trainers even more carefully than your students. You want good presenters, but also good interactors. Bad trainers will arrive just before their course, spend the coffee breaks reading their mails, and leave just after. Those people do not like teaching, and frankly they don’t deserve your students. Do-not hesitate to replace them, even if they are famous. Observe them also outside the classroom. This is very sad to say, but some trainers cannot behave when interacting with young adults.

These are only a few advices. I am sure there are plenty others. What are your experiences?

Advertisements

“What is systems biology” – the students talk

This year was the 5th instalment of our Wellcome-Trust / EMBL-EBI course “in silico systems biology“.

This course finds its origin a few years ago in a workshop of the EBI industry programme on “Pathways and models”. The workshop, that lasted 2 days, was praised by the attendees. However, the time limitation caused a bit of frustration and made us skip entire aspects we would have liked to cover. I therefore decided to try making it into a full-blown course with the help of Vicky Schneider then responsible of training at the EBI.

The first course, supported by EMBO, lasted 4 days. It was well received. However, we tried to cover too much, from functional genomics and network reconstruction to quantitative modelling of biological processes. Fortunately, the existence of another EBI course “Networks and pathways“, allowed us to focus on modelling. We progressively improved the programme through 1 FEBS course and 3 Wellcome-Trust advanced courses. Without boasting, the current course, co-organised with Julio Saez-Rodriguez and Laura Emery, reached almost perfection. The programme always evolves, but the changes slowed down with time, and we are now more in an optimisation/refinement phase. One of the big advantages is that we kept a core of trainers, who help improving the consistency and quality of the content. We are now happy to see our first generations of students having become active figures in systems biology. Some group leaders who attended the course in the past now send their own students every year. A forthcoming post will discuss a few things I learnt from organising those courses.

Beside the regular training, we always have a few group activities. This year, they were split in small groups at the beginning, and had to answer a few questions. One of them was …

What is systems biology?

Everyone has their own idea about that one, including myself (for more on the history, nature and challenges of systems biology). Here I provide you with the unfiltered and unclustered responses of 25 students (repetitions originate from different groups coming with the same answers):

  • Mechanisms on different levels
  • Wholistic view (tautology intended)
  • Dynamics of biological systems
  • Fun
  • Mathematical modeling
  • Insight to the systems
  • Predictions
  • Looking at the system as a whole and not per component
  • Should also be: formal, unambiguous
  • Holistic approach
  • Using modelling to answer biological questions
  • understanding dynamics of a system in terms of predictability
  • Mechanistic insight
  • A tool to complement experimental data
  • Experiments-modeling cycle leading to discovery
  • formalisms
  • Technology+bio data+ in silico
  • integrating levels of biological processes
  • reaching the experimentally unapproachable

Interesting isn’t it? At first it looks pretty much all over the place. Let-me re-order the answers and group them:

  1. Entire systems
    • Wholistic view (tautology intended)
    • Looking at the system as a whole and not per component
    • Holistic approach
  2. Mechanisms
    • Insight to the systems
    • Mechanistic insight
    • Mechanisms on different levels
    • integrating levels of biological processes
  3. Dynamics
    • Dynamics of biological systems
    • understanding dynamics of a system in terms of predictability
  4. Modeling
    • Mathematical modeling
    • Should also be: formal, unambiguous
    • formalisms
    • Using modelling to answer biological questions
  5. Complement the observation
    • A tool to complement experimental data
    • reaching the experimentally unapproachable
    • Experiments-modeling cycle leading to discovery
    • Predictions
    • Technology+bio data+ in silico
  6. And of course
    • Fun

We basically fall back on the two global positions in the field: a philosophical statement about life sciences (1,2,3), and a set of techniques (4,5). That reminds me a lot the discussions we had about molecular biology at university a few decades ago …

Modelling success stories (4) Birth of synthetic biology 2000

For the fourth entry in the series, I will not introduce one, but two papers, published back to back in a January 2000 issue of Nature. The particularity of these articles is not that the described models presented novel features or revealed new biological insights. However, they can be considered as marking the birth of Synthetic Biology as a defined subfield of bioengineering and an applied face of Systems Biology. It is quite revealing that they focused on systems exhibiting the favourite behaviours of computational systems biologists: oscillation and multistability.

Both papers were published back to back in a Nature issue of January 2000.

Elowitz MB, Leibler S (2000) A synthetic oscillatory network of transcriptional regulators. Nature, 403:335-338. http://identifiers.org/pubmed/10659856
PDF

This paper presents a model, called the repressilator, formed by three repressors in tandem. Each of them is constitutively expressed, this expression being repressed by one of the others. Deterministic and stochastic simulations show that for high transcription rate and sufficiently high protein turnover, the system oscillates, the three repressors being expressed in sequence.

Stability of the repressilator

Stability of the repressilator. See Elowitz and Leibler for the legend.

The authors implemented the model in bacteria, using the Lactose repressor of E Coli (LacI), a repressor from a tetracycline-resistant transposon (TetR) and a lambda phase repressor (CI).

The various biochemical reactions involved in implementing the repressilator  in the SBGN Process Description  language.

The various biochemical reactions involved in the repressilator implementation (SBGN Process Description language).

They indeed observed an oscillation, detected by a reporter plasmid under the control of a TetR sensitive promoter. Interestingly, the period of the oscillation is longer than the duplication time and a full oscillation spans several generations of bacteria. You can download a curated version of the repressilator in different formats from BioModels Database (BIOMD0000000012).

Gardner TS, Cantor CR, Collins JS (2000) Construction of a genetic toggle switch in Escherichia coli. Nature, 403: 339-342. http://identifiers.org/pubmed/10659857
PDF2

The second paper builds on a bistable switch, formed by two mutual repressors (constitutively expressed in the absence of the other). If the strength of the promoters is balanced, the system naturally forms a bi-stable switch, where only one of the repressor is expressed at a given time (stochastic simulations can display switches between the two stable states).

See Gardner et al for the legend.

Stability of the repressor switch. See Gardner et al for the legend.

The authors built two versions of this switch, in a way that allowed to use external signals to disable one of the repressions, therefore stabilising specifically one state. Interestingly, the authors built their switches in E coli using the same repressors as Elowitz and Leibler.

Structure of the repressor based toggle switches

Structure of the repressor based toggle switches

A curated version of the toggle switch in different formats from BioModels Database (BIOMD0000000507)

Both papers became milestones in synthetic biology (as witnessed by over 2000 citations each according to Google scholar as of January 2014). The model they describe are also classic examples used in biological modelling courses to explore oscillatory and multistable systems, simulated by deterministic and stochastic approaches.

Can-we simulate a whole-cell at atomistic level? I don’t think so

[disclaimer: Most of this has been written in 2008. My main opinion did not change, but some of the data backing up the arguments might seem dated]

Over the last 15 years, it has become fashionable to launch “Virtual Cell Projects”. Some of those are sensible, and based on sounds methods (one of the best recent examples being the fairly complete model of an entire Mycoplasma cell – if we except membrane processes and spatial considerations.) However, some call for “whole-cell simulation at atomic resolution”. Is it a reasonable goal to pursue? Can-we count on increase computing power to help us?

I do not think so. Not only do I believe whole-cell simulations at atomic resolutions are not only envisionable in 10 or 15 years, but IMHO they are not envisionable in any foreseable future. I actualy consider such claims damageable by 1) feeding wrong expectancies to funders and the public, 2) diverting funding from feasible, even if less ambitious, projects and 3) down-scaling the achievments of real scientific modelling efforts (see my series on modelling success stories).

Two types of problems appear when one wants to model cellular functions at the atomic scale: practical and theoretical. Let’s evacuate the practical ones, because I think they are insurmountable and therefore less interesting. As of spring 2008, the largest molecular dynamic simulation I heard of involved ~1 million atoms during 50 nanoseconds (molecular dynamics of tobacco mosaic virus capside). Even this simulation used massive power, (>30 years of a desktop CPU). With much smaller systems (10 000 atoms), people succeded to go up to half a millisecond (as of 2008). In terms of spatial size, we are very far from even the smallest cells. The simulation of an E coli sized cell would require to simulate roughly 1 000 000 000 000 atomes, that is 1 million times what we do today. But the problem is that molecular dynamics does not scale linearly. Even with space discretisation, long-range interactions (e.g. electrostatic) mean we would need far more than 1 million times more power, several orders of magnitude more. In addition, we are talking about 50 nanosecond here. To model a simple cellular behaviour, we need to reach the second time scale. So in summary, we are talking about an increase of several orders of magnitude more than 10 to the power of 14. Even if the corrected Moore law (doubling every 2 years) stayed valid, we would be talking of more than a century here, not a couple of decades!

Now, IMHO the real problems are the theoretical ones. The point is that we do not really know how to perform those simulations. The force fields I am aware of (the ones I fiddle with in the past) AMBER, CHARMM and GROMACS, are perfectly fine to describe fine movements of atoms, formation of hydrogen bonds, rotation of side-chains etc. We learnt a lot from such molecular dynamics simulations, and a Nobel prize was granted for them in 2013. But as far as I know, those methods do not permit to describe (adequately) the large scale movements of large atomic assemblies such as protein secondary structure elements, and even less the formation of such structurat elements. We cannot simulate the opening of an ion channel or the large movements of motor proteins (although we can predict them, for instance using normal modes). Therefore, even if we could simulate milliseconds of biochemistry, the result would most probably be fairly inaccurate.

There are (at least) three ways out of there, anHere we go againd they all require to leave the atomic level. Plus they also all bump into computation problems.

* Coarse-grain simulations: We lump several atoms into one particle. That worked in many cases, and this is a very promising approach, particularly if the timescale of atom and atom ensembles are quite different. See for instance the worked being done on the tobacco mosaic virus mentioned above. However, (in 2008 and according to my limited knowledge) the methods are even more inaccurate than atomic resolution molecular dynamics. And we are just pushing the computation problem further, even with very coarse models (note that the accuracy decreases with the coarseness) we are only gaining a few orders of magnitude. One severe problem here, is that one cannot rely solely on physics principles (newtonian laws or quantum physics) to design the methods. But we are still at scales that make real time experimental quantitative measurements very difficult.

* Standard Computational Systems Biology approaches. We model the cellular processes at macroscopic levels, using differential equations to represent reaction diffusion processes. The big advantage is that we can measure the constants, the concentrations etc. That worked well in the past (think about Hodgkin-Huxley predicting ion channels, Dennis Noble predicting the heart pacemaker and Goldbeter and Koshland predicting the MAP kinase cascade), and still works well. But does-it work for whole cell simulation? No, it does not really. It does not because of what we call combinatorial explosion. If you have a protein that possess several state variables such as phosphorylation sites, you have to enumerate all the possible states. If you take the example of Calcium/calmodulin kinase II, and you decide to model only the main features, binding of ATP and calmoldulin, phosphorylation in T286 and T306, activity, and the fact that it is a dodecamer, you need 2 to the power of 60 different states, that is 1 billion of billions ordinary differential equations. In a cell, you would have thousands of such cases (think of the EGF receptor with its 38 phosphorylation sites!).

* Agent-based modelling (aka single-particle simulations or mesoscopic modelling). Here we abstract the molecules to their main features, far far above the atomic level, and we represent each molecule as an agent, that knows its state. That avoids the combinatorial explosion described above. But those simulation are still super-heavy. We simulated hundreds of molecules moving and interacting in a 3D block of 1 micrometer during seconds. Those simulations take days to months to run on the cluster (and they spit out terabytes of data, but that is another problem). However, the problem is that they scale even worsely than molecular dynamics. Dominic does not simulate the molecules he is not interested in. If he did simulate all the molecules of the dendritic spine, it would take all the CPUs of the planet during years.

So where is the solution? The solution is in multiscale simulations. Let’s simulate at atomic level when we need atomic level, and at higher levels when we need higher level descriptions. A simulation should always be done at a level where we can gain useful insights and possess experimental information to set-up the model and validate its predictions. The Nobel committee did not miss it when it attributed the 2013 chemistry prize “for the development of multiscale models for complex chemical systems”.

Update 19 December 2013

Here we go again, this time with the magic names of Stanford and Google. What they achieved with their “exaclyde cloud computing system” is of the same order of magnitude that what was done in 2008. So no extraordinary feat here. 2.5 millisecond of 60000 atoms. But that does not stop them to launch the “whole-cell-at-atomic-resolution” again.

Modelling success stories (3) Goldbeter and Koshland 1981

The third example of this series will be a bit controversial (I was already told so). Understanding its full impact requires more subtle than average understanding of biochemistry and enzyme kinetics, and in particular of the difference between zero and first order kinetics. At least it took me a fair bit of reading and thinking. It will perhaps be easier for you, my bright readers. In 1981, Albert Goldbeter (who will become the “Mr oscillation” of modelling, see his recent book “La vie oscillatoire : Au coeur des rythmes du vivant“) and Daniel Koshland, of “induced-fit” fame, proposed that cascades of coupled interconvertible enzymes could generate ultrasensitive response to an upstream signal. The main body of the work is described in a fairly well cited paper (733 citations according to Google scholar as of 2 July 2013):

Goldbeter A, Koshland DE Jr. An amplified sensitivity arising from covalent modification in biological systems. Proc Natl Acad Sci USA 1981 78(11): 6840-6844. http://identifiers.org/pubmed/6947258 PDF2

Stadtman and Chock had already shown that cascades of such enzymes could generate amplified responses, and if the same substrate was consumed at the different levels, “cooperativity” could appear for the consumption of this substrate in the first order domain (when the enzyme is limiting) (Stadtman and Chock (1977) Proc Natl Acad Sci USA, 74: 2761-2765 and 2766-2770). However Golbeter and Koshland showed that in the zeroth order range (when the substrate is limiting), ultrasensitivity could occur, without the need of multiple inputs at each level. In my opinion this paper is important for at least two reasons. First it described how ultrasensitive (“cooperative”, although nothing cooperate here) behaviours in signalling cascade can appear without mutimeric allosteric assemblies. Second, it predicted the possible existence of MAPK cascades a decade before their discovery (Gomez and Cohen (1991) Nature 353: 170-173). As for the famous Crick and Watson understatement, Golbeter and Koshland land their bomb in the discussion:

“Simple extension of the mathematics shows that the sensitivity can be propagated and enhanced in a multicycle network.”

They go on to add:

“It should be emphasized that the data are not yet available to say with certainty that this device for added sensitivity is actually utilized in biological systems […]”

Indeed. Of course since then, it has been shown with clear certainty that this device is actually utilised in biological systems. Below I show a figure from the Goldbeter and Koshland paper followed by a figure of the first computational model of MAP kinase cascade by Huang and Ferrell (Huang, Ferrell (1996) Ultrasensitivity in the mitogen-activated protein kinase cascade. Proc Natl Acad Sci USA 93: 10078-10083.)

Golbeter1981

Huang1996