Monday, September 8, 2025

The absolutely awful layout and render standard for SBML.

This summer I've spent 3 or 4 weeks trying to come to grips with the layout/render standard for SBML. This allows a modeler to specify a diagram, such as a metabolic pathway, alongside the model. It was proposed quite a few years ago but very few if any implementations have been made. I now understand why. The standard tries to do everything and as result it’s virtually impossible to fully implement, resulting in poor reproducibility (which was the whole point). There is one sublevel which is the layout layer but that gives very little in return. Access to the standard is via libsbml but the API is incredibly hard to use. It took me 250 lines of python to specify two reactions and three species. And even then I still hadn’t specified any colors, or thicknesses etc. To fully specify a two-reaction map might require up to 500 lines of python code. In C or C++ perhaps even more. One of my students has written a high-level API but even that is very hard to use and I gave up on it. Neither Claude nor ChatGPT can create working code using this ‘high-level’ API. Lucian Smith has written a layout/render extension to his Antimony language but even that has only partial support, but it is easier to use. There is some tooling out there but some of it is incomplete and the rest is hard to use. Here are some points worth making:

Poor API Design: The libSBML API requires creating multiple objects (BoundingBox, Point, Dimensions, Curve, LineSegment) just to draw a simple line. For example, to draw an arrow it should be sufficient to call draw_arrow(start, end).

Verbose XML Output: the standard is specified using XML which is slowly becoming an archaic format. While the model portion of an SBML model is readable (other than the MathML), the layout and render extension might as well be written in Sanskrit.

Tool Inconsistencies: Different viewers interpret the same SBML layout differently, making it unreliable for consistent visualization and reproducibility.

Missing Abstractions: There's no high-level concept of "draw pathway from A to B" - you have to manually construct every geometric primitive. The irony is that biochemical pathways are conceptually simple (nodes connected by directed edges), but the SBML standard makes them extraordinarily complex to represent.

The question is where next? SBML was developed almost 25 years ago. At the time we chose XML as the format carrier, and it was a good idea but today we have easier to manage formats, with better software support such as YAML, JSON etc. If we were to create SBML today it probably be something other than XML. The use of SBML to specify the visualization component was, however, a bad decision, or at least the specificaion is bad. Yes, SVG does it, but how long has it taken for SVG to become widely available? Even Google couldn't render SVG 1.1 until 2008. Even established vector drawing apps still don’t fully support it. Mobile support is also still spotty. With full industry backing it has taken a long time for SVG to become more mainstream.

In contrast to commercial settings, academic software development is heavily resource constrained, and the authors of the layout/render were perhaps a little optimistic that we'd be able to implement something as complex as SBML layout/render.

Layout and Render first came out in 2006 and the fact that there is hardly any support for it tells us the standard was too difficult to implement. The SBGN community shied away from it, probably because it was too complex. One of the best-practice rules we tried to develop during the development of SBML was that alongside a proposed standard there had to be at least one implementation that could exercise the standard to make sure it was a practical proposition. This happened with the model portion of SBML and showed us that software could be written without too much effort. The same applied I believe to the FBC extension. However, I don’t recall the same happening with the layout and render extension and this might explain the lack of implementations. Interestingly, the SBGN community didn’t even want to use it and instead developed their own ML.

So where do we go from here? Is it time to propose a successor to SBML that is easy to read and write and can incorporate extensions that can be implemented by the academic community?

For those interested, here is some example python code that tries unsuccessfully to create two reaction arcs (Yes each reaction, eve a ui-uni, has to have a minium of two curves). Ignore the silly if statements at the start, this was code under construction and no yet finalized but I gave up in the end. Note, this code just creates one reaction. Species and text creation is just as verbose.


def add_connections_to_reaction_glyph(reaction_glyph, reaction_id, species_positions, reaction_x, reaction_y):
    """
    Add reaction connections with proper Point syntax.
    """
    print(f"Adding connections for {reaction_id}...")
    
    # Define connections
    if reaction_id == 'J1':
        reactant_id = 'S1'
        product_id = 'S2'
    elif reaction_id == 'J2':
        reactant_id = 'S2'
        product_id = 'S3'
    else:
        return
    
    # Get species positions
    reactant_x, reactant_y, reactant_w, reactant_h = species_positions[reactant_id]
    product_x, product_y, product_w, product_h = species_positions[product_id]
    
    # Create reactant connection (species -> reaction)
    reactant_ref = reaction_glyph.createSpeciesReferenceGlyph()
    reactant_ref.setId(f"{reaction_id}_{reactant_id}_reactant")
    reactant_ref.setSpeciesGlyphId(f"{reactant_id}_glyph")
    reactant_ref.setRole(libsbml.SPECIES_ROLE_SUBSTRATE)
    
    # Create curve with proper Point syntax
    curve1 = libsbml.Curve()
    line_segment1 = libsbml.LineSegment()
    
    # Start point: right edge of reactant species
    start_point1 = libsbml.Point()
    start_point1.setX(reactant_x + reactant_w)  # Right edge
    start_point1.setY(reactant_y + reactant_h/2)  # Center height
    start_point1.setZ(0)
    
    # End point: reaction center
    end_point1 = libsbml.Point()
    end_point1.setX(reaction_x + 5)  # Center of 10x10 reaction
    end_point1.setY(reaction_y + 5)
    end_point1.setZ(0)
    
    line_segment1.setStart(start_point1)
    line_segment1.setEnd(end_point1)
    curve1.addCurveSegment(line_segment1)
    reactant_ref.setCurve(curve1)
    
    # Create product connection (reaction -> species)
    product_ref = reaction_glyph.createSpeciesReferenceGlyph()
    product_ref.setId(f"{reaction_id}_{product_id}_product")
    product_ref.setSpeciesGlyphId(f"{product_id}_glyph")
    product_ref.setRole(libsbml.SPECIES_ROLE_PRODUCT)
    
    # Create curve
    curve2 = libsbml.Curve()
    line_segment2 = libsbml.LineSegment()
    
    # Start point: reaction center
    start_point2 = libsbml.Point()
    start_point2.setX(reaction_x + 5)
    start_point2.setY(reaction_y + 5)
    start_point2.setZ(0)
    
    # End point: left edge of product species
    end_point2 = libsbml.Point()
    end_point2.setX(product_x)  # Left edge
    end_point2.setY(product_y + product_h/2)  # Center height
    end_point2.setZ(0)
    
    line_segment2.setStart(start_point2)
    line_segment2.setEnd(end_point2)
    curve2.addCurveSegment(line_segment2)
    product_ref.setCurve(curve2)

Tuesday, April 8, 2025

What were they thinking?

This paper popped into one of my feeds today:

Breakdown and repair of metabolism in the aging brain

I think the paper should have been titled "How not to publish a large model" The paper publishes a model is large but the way they deploy to the commuity is insane.

To save you hunting for the model, this link is to the GitHub repo

In summary the paper describes a kinetic model of brain metabolism with:

183 processes, which include:

95 enzymatic reactions

19 transport processes (across cell and mitochondrial membranes)

69 other processes (related to ionic currents, blood flow, and other non-enzymatic processes)

Additionally: The model uses 151 differential equations to simulate the dynamics of molecular concentrations.

So its large, but what's really a problem is the model is essentially inaccessible. The entire model is built using one huge Julia program. All the biology has been subsumed into a large set of difficult to read differential equations. There is no sharable SBML model so this won't go to Biomodels and reusing it will be very difficult. Why is this a problem? It means other researchers cannot build on what was undoubtedly, a huge amount of work. I took a screen shot of a small fragment of the Julia program so you can see what you're up against: