Modelling success stories (4) Birth of synthetic biology 2000

For the fourth entry in the series, I will not introduce one, but two papers, published back to back in a January 2000 issue of Nature. The particularity of these articles is not that the described models presented novel features or revealed new biological insights. However, they can be considered as marking the birth of Synthetic Biology as a defined subfield of bioengineering and an applied face of Systems Biology. It is quite revealing that they focused on systems exhibiting the favourite behaviours of computational systems biologists: oscillation and multistability.

Both papers were published back to back in a Nature issue of January 2000.

Elowitz MB, Leibler S (2000) A synthetic oscillatory network of transcriptional regulators. Nature, 403:335-338. http://identifiers.org/pubmed/10659856
PDF

This paper presents a model, called the repressilator, formed by three repressors in tandem. Each of them is constitutively expressed, this expression being repressed by one of the others. Deterministic and stochastic simulations show that for high transcription rate and sufficiently high protein turnover, the system oscillates, the three repressors being expressed in sequence.

Stability of the repressilator

Stability of the repressilator. See Elowitz and Leibler for the legend.

The authors implemented the model in bacteria, using the Lactose repressor of E Coli (LacI), a repressor from a tetracycline-resistant transposon (TetR) and a lambda phase repressor (CI).

The various biochemical reactions involved in implementing the repressilator  in the SBGN Process Description  language.

The various biochemical reactions involved in the repressilator implementation (SBGN Process Description language).

They indeed observed an oscillation, detected by a reporter plasmid under the control of a TetR sensitive promoter. Interestingly, the period of the oscillation is longer than the duplication time and a full oscillation spans several generations of bacteria. You can download a curated version of the repressilator in different formats from BioModels Database (BIOMD0000000012).

Gardner TS, Cantor CR, Collins JS (2000) Construction of a genetic toggle switch in Escherichia coli. Nature, 403: 339-342. http://identifiers.org/pubmed/10659857
PDF2

The second paper builds on a bistable switch, formed by two mutual repressors (constitutively expressed in the absence of the other). If the strength of the promoters is balanced, the system naturally forms a bi-stable switch, where only one of the repressor is expressed at a given time (stochastic simulations can display switches between the two stable states).

See Gardner et al for the legend.

Stability of the repressor switch. See Gardner et al for the legend.

The authors built two versions of this switch, in a way that allowed to use external signals to disable one of the repressions, therefore stabilising specifically one state. Interestingly, the authors built their switches in E coli using the same repressors as Elowitz and Leibler.

Structure of the repressor based toggle switches

Structure of the repressor based toggle switches

A curated version of the toggle switch in different formats from BioModels Database (BIOMD0000000507)

Both papers became milestones in synthetic biology (as witnessed by over 2000 citations each according to Google scholar as of January 2014). The model they describe are also classic examples used in biological modelling courses to explore oscillatory and multistable systems, simulated by deterministic and stochastic approaches.

Advertisements

Modelling success stories in systems biology

I am continuously asked examples of success stories, where computational modelling (Systems Biology like) brought new insights for understanding life. Those are actually the more pleasant situations. More often, someone in the assistance claims with assurance that modelling never actually brought any new insight to biology (and therefore we should stick to experiments and use our senses rather than theory). I moved through various phases on how to react, from fits of rage to silent contempt, via lengthy arguments. Then I started to collect lists of models to present during my presentations. Recently, collecting success stories became a deliverable of the preparatory phase of ISBE (Infrastructure for Systems Biology Europe). In this blog post, I will select a few of the most famous and non-disputable examples, and give my take on what they brought to our understanding of the world. I will add new models at my own rhythm, which is – as you well know – on the low side. You are most welcome to suggest your most favourite models for addition. Note that I will only highlight a few models. A longer list of commented models can be found in the “model of the month” from BioModels Database.

Extending SBML using the annotation element

One of the frequent complaints I hear from end-users (modelers) about SBML is that the language does not provide structures to encode all types of models, or even all kinds of data. This is partially true. Indeed SBML does not provide specific structures (elements or attributes) to encode everything one could want to store during the a modeling and simulation activity. How could-it? However, SBML provides a generic construct that allows almost arbitrary extensions. This is the annotation element, that can be added to all SBML classes inherited from SBase (which means most of the SBML elements). In an annotation element, one can put any XML content, as far as there is only one top element in a given namespace.


<annotation xmlns:ns1="http://www.namespace1.org" >
<ns1:elementA>
<ns1:elementA1 attribute="foo" />
</ns1:elementA>
<elementB xmlns="http://mynamespaces.net/namespace2">
<elementB1 attributeC1="value" />
<elementB2 attributeC2="anotherValue" />
</elementB>
</annotation>

In the example above, the namespace of an extension (namespace 1) is declared in an attribute of the element annotation itself, forcing all the subelements to be prefixed (by ns1). On the contrary, the other extension, namespace2, is declared in the relevant top element, and all the children are automatically in the new namespace. One specific type of SBML annotation is described in the SBML specification. This controlled annotation can be used to fulfil the requirements of MIRIAM. Other annotations are not standard, and on the contrary are proprietary to a given software, used for instance to encode information not (yet) part of SBML.  We will describe a few of them below. Annotations are a great mechanism to benchmark proposed extensions of the language.

Controlled annotations

SBML provide a set of controlled annotations,  based on other XML terminologies such as the Resource Description Framework (RDF), vCard, the Dublin Core Metadata and BioModels qualifiers. SBML controlled annotations are used to store two types of information. 1) clerical information about the model generation, such as who created or modified a model element and when.


<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#">
<rdf:Description rdf:about="#metaid_0000001">
<dcterms:contributor rdf:parseType="Resource">
<vCard:N rdf:parseType="Resource">
<vCard:Family>Le Novère</vCard:Family>
<vCard:Given>Nicolas</vCard:Given>
</vCard:N>
<vCard:EMAIL>lenov@ebi.ac.uk</vCard:EMAIL>
<vCard:ORG>
<vCard:Orgname>EMBL-EBI</vCard:Orgname>
</vCard:ORG>
</dcterms:contributor>
<dcterms:created rdf:parseType="Resource">
<dcterms:W3CDTF>2005-05-23T17:11:24</dcterms:W3CDTF>
</dcterms:created>
<dcterms:modified rdf:parseType="Resource">
<dcterms:W3CDTF>2005-05-23T23:11:45</dcterms:W3CDTF>
</dcterms:modified>
</rdf:Description>
</rdf:RDF>

The attribute “rdf:about” on the element rdf:Description points to the metaid of the containing SBML element. The Dublin Core elements contributor, created and modified record who created the containing SBML element, when it was creatd, and when it was last modified.

2) Cross-references to external resources, such as entries in databases or terms of controlled vocabularies.


<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">
<rdf:Description rdf:about="#_274092">
<bqbiol:hasPart>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/uniprot/P62158>
<rdf:li rdf:resource="http://identifiers.org/obo.chebi:CHEBI29108"/>
</rdf:Bag>
</bqbiol:isVersionOf>
</rdf:Description>
</rdf:RDF>

This annotation describes the fact that both calmodulin (UniProt P62158) and calcium ion (ChEBI 9108) are part of the biological entity represented by the annotated SBML element. Those cross-references had a tremendous effect on the way people use SBML encoded files. It is not too much of an exageration to say that a subfield of computational systems biology was made possible thanks to them, dealing with the automatic processing of SBML encoded pathways and models. One can see for instance the software suite SBMLsemantics, that allow to annotate, compare and merge models. Another use of those crossreference is to provide additional information for converting SBML into other format. See for instance  SBML2BioPAX.

Proprietary annotations

While the section above dealt with controlled annotations, following a syntax described in the SBML specification, the power of SBML annotations is by no mean limited to them. Those annotation elements can be used for instance to encode aspects of the model that are not yet supported by SBML. The spatial simulator Mesord was one of the first tools to make full use of them in that respect. Mesord “format” is valid SBML. And all models developed in Mesord can be imported in other SBML-supported tools such as COPASI. However, only well-stirred biochemistry will then be simulated. In addition, the information describing the spatial component of the modelling is stored in proprietary annotation. The following annotations (taken from the model MODEL5974712823 from BioModels Database, encoding the model of Fange et al 2006) describe the creation of a compartment “cytosol” which capsule shape is made by the union of a cylinder and two spheres, and specify the diffusion constants of a molecule in the cytosol and the plasma membrane.


<compartment metaid="_303076" id="cytosol">
<annotation xmlns:MesoRD="http://www.icm.uu.se" xmlns:jd="http://www.sys-bio.org/sbml">
<MesoRD:union>
<MesoRD:cylinder MesoRD:height="3.5" MesoRD:radius="0.5" MesoRD:units="um"/>
<MesoRD:translation MesoRD:units="um" MesoRD:x="0.00" MesoRD:y="-1.75" MesoRD:z="0">
<MesoRD:sphere MesoRD:radius="0.5" MesoRD:units="um"/>
</MesoRD:translation>
<MesoRD:translation MesoRD:units="um" MesoRD:x="0.00" MesoRD:y="1.75" MesoRD:z="0">
<MesoRD:sphere MesoRD:radius="0.5" MesoRD:units="um"/>
</MesoRD:translation>
</MesoRD:union>
</annotation>
</compartment>
<!-- -->
<species metaid="_303121" id="D1" name="D" compartment="cytosol" initialAmount="0" substanceUnits="item" hasOnlySubstanceUnits="true">
<annotation xmlns:MesoRD="http://www.icm.uu.se" xmlns:jd="http://www.sys-bio.org/sbml">
<MesoRD:diffusion MesoRD:compartment="cytosol" MesoRD:rate="0.0" MesoRD:units="cm2ps"/>
<MesoRD:diffusion MesoRD:compartment="membrane" MesoRD:rate="2.5e-8" MesoRD:units="cm2ps"/>
</annotation>
</species>

Another area where annotation have been used extensively is to encode graphical representation of biochemical networks corresponding to the models. Early in the development of SBML the software JDesigner, developed by Herbert Sauro, was a precursor in the domain. The following annotations (taken from the model BIOMD0000000328 from BioModels Database, encoding the model of  Bucher et al 2011) describe a compartment “medium”, with its size, its position on the canvas and various graphical characteritics. Note that the namespace is declared in the main sbml element rather than the annotation element.


<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:jd2="http://www.sys-bio.org/sbml/jd2" level="2" version="4">
<!-- [...] -->
<jd2:compartment id="medium" size="2" visible="true">
<jd2:boundingBox h="266" w="1010" x="196" y="318"/>
<jd2:membraneStyle color="FFFFA500" thickness="12"/>
<jd2:interiorStyle color="FFFFEEEE"/>
<jd2:text value="medium" visible="true">
<jd2:position rx="14" ry="48"/>
<jd2:font fontColor="FF000000" fontName="Arial" fontSize="8"/>
</jd2:text>
</jd2:compartment>

However, what really demonstrated the power of SBML annotation for extending the language was CellDesigner notation. Its aim is very similar to JDesigner, to encode the graphical representation of a model encoded in SBML (earlier versions of CellDesigner were called SBedit). The following annotations (taken from the model BIOMD0000000220 from BioModels Database, encoding the model of Albeck et al 2008) describe the representation of a complex species PARP_C3. Firstly, in the SBML model element, CellDesigner annotations mention that both proteins PARP and C3 are part of the complex PARP_C3. Then they encode the graphical representation of the complex, and finally associate this complex with the SBML species representing it.


<celldesigner:species id="s57" name="PARP">
<celldesigner:annotation>
<celldesigner:complexSpecies>PARP_C3</celldesigner:complexSpecies>
<celldesigner:speciesIdentity>
<celldesigner:class>PROTEIN</celldesigner:class>
<celldesigner:proteinReference>pr19</celldesigner:proteinReference>
</celldesigner:speciesIdentity>
</celldesigner:annotation>
</celldesigner:species>
<celldesigner:species id="s58" name="C3">
<celldesigner:annotation>
<celldesigner:complexSpecies>PARP_C3</celldesigner:complexSpecies>
<celldesigner:speciesIdentity>
<celldesigner:class>PROTEIN</celldesigner:class>
<celldesigner:proteinReference>pr11</celldesigner:proteinReference>
</celldesigner:speciesIdentity>
</celldesigner:annotation>
</celldesigner:species>
<celldesigner:complexSpeciesAlias id="csa13" species="PARP_C3">
<celldesigner:activity>inactive</celldesigner:activity>
<celldesigner:bounds h="120.0" w="100.0" x="359.0" y="1421.0"/>
<celldesigner:view state="usual"/>
<celldesigner:backupSize h="0.0" w="0.0"/>
<celldesigner:backupView state="none"/>
<celldesigner:usualView>
<celldesigner:innerPosition x="0.0" y="0.0"/>
<celldesigner:boxSize height="120.0" width="100.0"/>
<celldesigner:singleLine width="2.0"/>
<celldesigner:paint color="fff7f7f7" scheme="Color"/>
</celldesigner:usualView>
<celldesigner:briefView>
<celldesigner:innerPosition x="0.0" y="0.0"/>
<celldesigner:boxSize height="60.0" width="80.0"/>
<celldesigner:singleLine width="2.0"/>
<celldesigner:paint color="fff7f7f7" scheme="Color"/>
</celldesigner:briefView>
</celldesigner:complexSpeciesAlias>

<species metaid="metaid_0000109" id="PARP_C3" name="PARP:C3" compartment="cell" initialAmount="0" charge="0">
<annotation>
<celldesigner:positionToCompartment>inside</celldesigner:positionToCompartment>
<celldesigner:speciesIdentity>
<celldesigner:class>COMPLEX</celldesigner:class>
<celldesigner:name>PARP:C3</celldesigner:name>
</celldesigner:speciesIdentity>
</annotation>
</species>

CellDesigner turned to be a resounding success, and many systems biologists used it as a user-friendly tool to draw pathways, not always with modeling in mind. A significant portion of CellDesigner users do not actually know that its native format is an extended SBML, and call it “CellDesigner format”. Because what makes the success of CellDesigner is largely encoded in proprietary annotations, 3rd party software started to develop support for those annotations. One may regret that the SBML layout extension (see below) was not used instead, but it was just pragmatism on the side of this software. (Of course nowadays SBGN-ML should be the prefered way of encoding graphical representations of biochemical networks in a standard XML, and we hope CellDesigner will develop support for the format soon). Among the software that adopted CellDesigner SBML extension one can mention the Cytoscape plugin BiNoM, and CellPublisher.

Using annotations to test SBML extensions: The layout proposal

An interesting use for SBML annotation is to develop and try out possible SBML developments. To remain in the area of graphical representation, one can mention the SBML layout extension. This extension of SBML has been under discussion since 2002 and an agreement was reached many years ago (Gauges et al 2006). Since SBML was not a modular language by then, the layout extension was encoded in annotation. The following example shows the declaration of a layout in the annotation of the model element (a given SBML model can carry several layouts). This layout contains a compartment, at a given position and with a given size. Note that the layout extension does not deal with the actual visual representation, which is dealt with the SBML rendering extension.


<model id="TestModel">
<annotation>
<listOfLayouts xmlns="http://projects.eml.org/bcb/sbml/level2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<layout id="Layout_1">
<dimensions width="400" height="220"/>
<listOfCompartmentGlyphs>
<compartmentGlyph id="CompartmentGlyph_1" compartment="Compartment_1">
<boundingBox id="bb1">
<position x="5" y="5"/>
<dimensions width="390" height="210"/>
</boundingBox>
</compartmentGlyph>
</listOfCompartmentGlyphs>

In SBML Level 3, the list of layouts and all its content is removed from the annotation of the model element and becomes a bona fide SBML element, in the namespace of the SBML Level 3 layout package.

Conclusion: Anyone can extend SBML to cover anything

If you think the current SBML specification does not cover a feature crucial for your activity, do not through SBML and either give up on model sharing or develop your own language. Develop an extension of SBML instead. First look at the current list of packages to see is something is in the oven. If this is the case, please, PLEASE, join the community effort and try to improve the existing proposals. If nothing is available, either because it is a feature useful only for a few people or talks, because it has been so far judged to far from SBML mission, or because it is a feature hard to cover, feel free to develop your extension, support it in your software, and share it with your collaborators. If it is useful, it will be shared between groups, and maybe you can propose it for a future SBML development. As the hacker’s mantra says: “who codes wins”.

Why not using R more often as a backend for modeling and simulation?

I have a confession to make. I never learnt how to use MatLab. Despite having been in the business of developing and using dynamic models for more than a decade now, I succeeded to avoid fiddling with the master software used for that purpose. I used it once to run scripts written by one of my students. And I toyed a bit with Octave and SciLab, which are free replacements. But I was lucky enough to develop models in the infancy of systems biology, where it was perfectly acceptable to re-invent the numerical wheel in C (or even Perl!). However, this shameful omission will hopefully remained ignored but for the readers of this blog. Indeed, there is a massive haemorrhage of academic MatLab-based tools towards R. R is used largely in bioinformatics, and the continual incompatibilities between the successive versions of MatLab make maintaining a software based on that tool really hard (and the difficulty to obtain MatLab trial licences for courses does not help. That was a decisive criterion for a massive recoding from MatLab to R at the EBI, in order to be able to run properly our in silico systems biology course). Anyway, I could directly learn R, and nobody would notice how clueless I was.

And then comes my second confession. Despite having spent 8+ years at the EBI, where R is used daily by battalions of bioinformaticians, and where Bioconductor was in part developed, I did not learn to use this magical tool until recently. I have more excuses here. R was develop primarily as a tool for statistics, and was not initially strong on the computational modelling side. For instance, despite the existence of two packages handling SBML in R, RSBML and SBMLR, very few of the models in BioModels Database if any where developed using R. But my group is now heavily involved in the Drug Disease Model Repository (DDMoRe) project of the Innovative Medicine Initiative (IMI). And several of DDMoRe’s tools will be based on R. Now comes the most embarrassing part of the confession. It turns out that my wife is an experimental biologist, doing a lot of measurements and statistics. And she followed an R course organised at the Babraham Institute (where incidentally I will move half of my group from October 2012). She gave me her course materials, and I spent a few evenings with a more productive use of my time than usual (how much of the impetus was due to embarrassment will be left uninvestigated). By coincidence, I was reading at the time “Dynamic models in biology” from Ellner and Guckenheimer. While this excellent book was accompanied by exercises in MatLab, the authors later converted them in R, and produced a very useful “An introduction to R for dynamic models in biology”. That document, plus the material provided with the R package deSolve, convinced me that R was not only very powerful for everything we do in computational systems biology, it was also a lot of fun!

While I was happily rejuvenating the geek in me by writing scripts, I still wondered why there were not more GUIs to help using the simulations capabilities of R. Or taking the problem the other way around:  why do people bother to implement the computing and statistical layers in end-user software? They could just concentrate on the user experience, and re-use R packages in the background. If those packages are not powerful, flexible or versatile enough, they could contribute to their development, and make the world better. Yes, the simulations in R are maybe a bit slower that the equivalent methods implemented in a dedicated software written in C. But the slower part in a simulation task often comes from the choice of the wrong numerical solver for the task at hand, or even from the dialogue between solver and GUI (I recently used a tool where I could *see* the curves being drawn when simulating the model of Tyson 1991 (BIOMD0000000005), while any decent simulator should provide the results instantly). Moreover, entire domains of research use MatLab (physics, engineering …). The argument of optimisation does not hold long. First, if we want perfect optimisation we should write directly assembly code, and second, how computational biologists can believe they are better at coding simulation tools than people who did that in physics for the last 50 years or so?

To be honest, the time running simulations is a very small part of the activity for a typical student or post-doc in computational systems biology. Most of the time is spent developing the model, reading and data-mining. And a significant amount of time is spent (wasted?) developing Yet Another Simulation Software. This software is most often suboptimal, undocumented and dies when the main developer moves on. Even when the project is maintained, it is pretty hard to provide versions working on all operating systems, a problem that will grow worse with the emerging smartphone and tablet landscape as proper computing tools.

Several international projects aimed at providing a reusable infrastructure for modeling and simulation in biology. Historical examples are the Systems Biology Workbench (which side effect was the development of SBML, so in a sense that was the most influential software project of the last 15 years in biological modeling), or Bio-SPICE. A modern version is the GARUDA project. I really hope these projects decide to avoid re-inventing the wheel and use existing infrastructures. For all that involves calculations, I think I should be R. Not you?