Extending SBML using the annotation element

One of the frequent complaints I hear from end-users (modelers) about SBML is that the language does not provide structures to encode all types of models, or even all kinds of data. This is partially true. Indeed SBML does not provide specific structures (elements or attributes) to encode everything one could want to store during the a modeling and simulation activity. How could-it? However, SBML provides a generic construct that allows almost arbitrary extensions. This is the annotation element, that can be added to all SBML classes inherited from SBase (which means most of the SBML elements). In an annotation element, one can put any XML content, as far as there is only one top element in a given namespace.

<annotation xmlns:ns1="http://www.namespace1.org" >
<ns1:elementA1 attribute="foo" />
<elementB xmlns="http://mynamespaces.net/namespace2">
<elementB1 attributeC1="value" />
<elementB2 attributeC2="anotherValue" />

In the example above, the namespace of an extension (namespace 1) is declared in an attribute of the element annotation itself, forcing all the subelements to be prefixed (by ns1). On the contrary, the other extension, namespace2, is declared in the relevant top element, and all the children are automatically in the new namespace. One specific type of SBML annotation is described in the SBML specification. This controlled annotation can be used to fulfil the requirements of MIRIAM. Other annotations are not standard, and on the contrary are proprietary to a given software, used for instance to encode information not (yet) part of SBML.  We will describe a few of them below. Annotations are a great mechanism to benchmark proposed extensions of the language.

Controlled annotations

SBML provide a set of controlled annotations,  based on other XML terminologies such as the Resource Description Framework (RDF), vCard, the Dublin Core Metadata and BioModels qualifiers. SBML controlled annotations are used to store two types of information. 1) clerical information about the model generation, such as who created or modified a model element and when.

<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/"
<rdf:Description rdf:about="#metaid_0000001">
<dcterms:contributor rdf:parseType="Resource">
<vCard:N rdf:parseType="Resource">
<vCard:Family>Le Novère</vCard:Family>
<dcterms:created rdf:parseType="Resource">
<dcterms:modified rdf:parseType="Resource">

The attribute “rdf:about” on the element rdf:Description points to the metaid of the containing SBML element. The Dublin Core elements contributor, created and modified record who created the containing SBML element, when it was creatd, and when it was last modified.

2) Cross-references to external resources, such as entries in databases or terms of controlled vocabularies.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
<rdf:Description rdf:about="#_274092">
<rdf:li rdf:resource="http://identifiers.org/uniprot/P62158>
<rdf:li rdf:resource="http://identifiers.org/obo.chebi:CHEBI29108"/>

This annotation describes the fact that both calmodulin (UniProt P62158) and calcium ion (ChEBI 9108) are part of the biological entity represented by the annotated SBML element. Those cross-references had a tremendous effect on the way people use SBML encoded files. It is not too much of an exageration to say that a subfield of computational systems biology was made possible thanks to them, dealing with the automatic processing of SBML encoded pathways and models. One can see for instance the software suite SBMLsemantics, that allow to annotate, compare and merge models. Another use of those crossreference is to provide additional information for converting SBML into other format. See for instance  SBML2BioPAX.

Proprietary annotations

While the section above dealt with controlled annotations, following a syntax described in the SBML specification, the power of SBML annotations is by no mean limited to them. Those annotation elements can be used for instance to encode aspects of the model that are not yet supported by SBML. The spatial simulator Mesord was one of the first tools to make full use of them in that respect. Mesord “format” is valid SBML. And all models developed in Mesord can be imported in other SBML-supported tools such as COPASI. However, only well-stirred biochemistry will then be simulated. In addition, the information describing the spatial component of the modelling is stored in proprietary annotation. The following annotations (taken from the model MODEL5974712823 from BioModels Database, encoding the model of Fange et al 2006) describe the creation of a compartment “cytosol” which capsule shape is made by the union of a cylinder and two spheres, and specify the diffusion constants of a molecule in the cytosol and the plasma membrane.

<compartment metaid="_303076" id="cytosol">
<annotation xmlns:MesoRD="http://www.icm.uu.se" xmlns:jd="http://www.sys-bio.org/sbml">
<MesoRD:cylinder MesoRD:height="3.5" MesoRD:radius="0.5" MesoRD:units="um"/>
<MesoRD:translation MesoRD:units="um" MesoRD:x="0.00" MesoRD:y="-1.75" MesoRD:z="0">
<MesoRD:sphere MesoRD:radius="0.5" MesoRD:units="um"/>
<MesoRD:translation MesoRD:units="um" MesoRD:x="0.00" MesoRD:y="1.75" MesoRD:z="0">
<MesoRD:sphere MesoRD:radius="0.5" MesoRD:units="um"/>
<!-- -->
<species metaid="_303121" id="D1" name="D" compartment="cytosol" initialAmount="0" substanceUnits="item" hasOnlySubstanceUnits="true">
<annotation xmlns:MesoRD="http://www.icm.uu.se" xmlns:jd="http://www.sys-bio.org/sbml">
<MesoRD:diffusion MesoRD:compartment="cytosol" MesoRD:rate="0.0" MesoRD:units="cm2ps"/>
<MesoRD:diffusion MesoRD:compartment="membrane" MesoRD:rate="2.5e-8" MesoRD:units="cm2ps"/>

Another area where annotation have been used extensively is to encode graphical representation of biochemical networks corresponding to the models. Early in the development of SBML the software JDesigner, developed by Herbert Sauro, was a precursor in the domain. The following annotations (taken from the model BIOMD0000000328 from BioModels Database, encoding the model of  Bucher et al 2011) describe a compartment “medium”, with its size, its position on the canvas and various graphical characteritics. Note that the namespace is declared in the main sbml element rather than the annotation element.

<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:jd2="http://www.sys-bio.org/sbml/jd2" level="2" version="4">
<!-- [...] -->
<jd2:compartment id="medium" size="2" visible="true">
<jd2:boundingBox h="266" w="1010" x="196" y="318"/>
<jd2:membraneStyle color="FFFFA500" thickness="12"/>
<jd2:interiorStyle color="FFFFEEEE"/>
<jd2:text value="medium" visible="true">
<jd2:position rx="14" ry="48"/>
<jd2:font fontColor="FF000000" fontName="Arial" fontSize="8"/>

However, what really demonstrated the power of SBML annotation for extending the language was CellDesigner notation. Its aim is very similar to JDesigner, to encode the graphical representation of a model encoded in SBML (earlier versions of CellDesigner were called SBedit). The following annotations (taken from the model BIOMD0000000220 from BioModels Database, encoding the model of Albeck et al 2008) describe the representation of a complex species PARP_C3. Firstly, in the SBML model element, CellDesigner annotations mention that both proteins PARP and C3 are part of the complex PARP_C3. Then they encode the graphical representation of the complex, and finally associate this complex with the SBML species representing it.

<celldesigner:species id="s57" name="PARP">
<celldesigner:species id="s58" name="C3">
<celldesigner:complexSpeciesAlias id="csa13" species="PARP_C3">
<celldesigner:bounds h="120.0" w="100.0" x="359.0" y="1421.0"/>
<celldesigner:view state="usual"/>
<celldesigner:backupSize h="0.0" w="0.0"/>
<celldesigner:backupView state="none"/>
<celldesigner:innerPosition x="0.0" y="0.0"/>
<celldesigner:boxSize height="120.0" width="100.0"/>
<celldesigner:singleLine width="2.0"/>
<celldesigner:paint color="fff7f7f7" scheme="Color"/>
<celldesigner:innerPosition x="0.0" y="0.0"/>
<celldesigner:boxSize height="60.0" width="80.0"/>
<celldesigner:singleLine width="2.0"/>
<celldesigner:paint color="fff7f7f7" scheme="Color"/>

<species metaid="metaid_0000109" id="PARP_C3" name="PARP:C3" compartment="cell" initialAmount="0" charge="0">

CellDesigner turned to be a resounding success, and many systems biologists used it as a user-friendly tool to draw pathways, not always with modeling in mind. A significant portion of CellDesigner users do not actually know that its native format is an extended SBML, and call it “CellDesigner format”. Because what makes the success of CellDesigner is largely encoded in proprietary annotations, 3rd party software started to develop support for those annotations. One may regret that the SBML layout extension (see below) was not used instead, but it was just pragmatism on the side of this software. (Of course nowadays SBGN-ML should be the prefered way of encoding graphical representations of biochemical networks in a standard XML, and we hope CellDesigner will develop support for the format soon). Among the software that adopted CellDesigner SBML extension one can mention the Cytoscape plugin BiNoM, and CellPublisher.

Using annotations to test SBML extensions: The layout proposal

An interesting use for SBML annotation is to develop and try out possible SBML developments. To remain in the area of graphical representation, one can mention the SBML layout extension. This extension of SBML has been under discussion since 2002 and an agreement was reached many years ago (Gauges et al 2006). Since SBML was not a modular language by then, the layout extension was encoded in annotation. The following example shows the declaration of a layout in the annotation of the model element (a given SBML model can carry several layouts). This layout contains a compartment, at a given position and with a given size. Note that the layout extension does not deal with the actual visual representation, which is dealt with the SBML rendering extension.

<model id="TestModel">
<listOfLayouts xmlns="http://projects.eml.org/bcb/sbml/level2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<layout id="Layout_1">
<dimensions width="400" height="220"/>
<compartmentGlyph id="CompartmentGlyph_1" compartment="Compartment_1">
<boundingBox id="bb1">
<position x="5" y="5"/>
<dimensions width="390" height="210"/>

In SBML Level 3, the list of layouts and all its content is removed from the annotation of the model element and becomes a bona fide SBML element, in the namespace of the SBML Level 3 layout package.

Conclusion: Anyone can extend SBML to cover anything

If you think the current SBML specification does not cover a feature crucial for your activity, do not through SBML and either give up on model sharing or develop your own language. Develop an extension of SBML instead. First look at the current list of packages to see is something is in the oven. If this is the case, please, PLEASE, join the community effort and try to improve the existing proposals. If nothing is available, either because it is a feature useful only for a few people or talks, because it has been so far judged to far from SBML mission, or because it is a feature hard to cover, feel free to develop your extension, support it in your software, and share it with your collaborators. If it is useful, it will be shared between groups, and maybe you can propose it for a future SBML development. As the hacker’s mantra says: “who codes wins”.