Tuesday, September 19, 2023

1888 Math Exam

Nothing to do with cells or modeling, but I recently purchased a 1885 copy of John Casey's rendition of Euclid's Elements. John Casey was born in Limerick, Ireland, and became a lecturer in mathematics at University College Dublin. He wrote one of the more well-known editions of Euclid's elements. In his preface he states:

"This edition of the Elements of Euclid, undertaken at the request of the principals of some of the leading Colleges and Schools of Ireland, is intended to supply a want much felt by teachers at the present day—the production of a work which, while giving the unrivalled [sic] original in all its integrity, would also contain the modern conceptions and developments of the portion of Geometry over which the Elements extend."

You can find a copy at Project Gutenberg (https://www.gutenberg.org/ebooks/21076) if you are curious. 

What's interesting about the book I received, is not so much the geometry, but what I found inside. 

Inside was a copy of a math Pass Examination paper from 1888, presumably from University College Dublin but at least somewhere in Ireland (I purchased the book from Dublin). Update: There are two inscriptions in the book. The first is a simple "W.H Dunlop May 1886".  The second is more interesting and written in a very elegant cursive style where the book was transferred two years later to: Michael J Buckley. Catholic University Dublin. 19.1.'88. This is the name of the first university dedicated to accepting Catholics from Ireland (Trinity College was the Anglican University founded by Elizabeth I, so you can imagine the problem). However, the university didn't do too well financially and had problems awarding degrees because it didn't have a royal charter. In 1908/1909, it became the University College Dublin with its own charter. 

The exam had 10 questions, one involving reaping a field, another about selling a horse, and another about carpeting a room. The remaining 7 are pure math questions, with a number of them being numerical estimation questions, eg, roots, squares etc. I think modern students could do these questions, though they might struggle with the numeric computations unless they have a calculator at hand. I took a photograph of the exam for all to see:

Wednesday, May 3, 2023

Optimal distribution of enzymes that maximizes flux

Here is an interesting metabolic question to ask. Consider a pathway with two steps: $$\large X_0 \stackrel{e_1}{\rightarrow} S \stackrel{e_2}{\rightarrow} $$ where the first step is catalyzed by enzyme $e_1$ and the second step by $e_2$. We can assume that a bacterial cell has a finite volume, meaning there is only so much space to fit all the proteins necessary for the bacterium to live. If the cell make more of one protein, it presumably needs to find the extra space by down expressing other proteins. If the two step pathway is our cell, this translates to there being a fixed amount of protein that can be distributed betwen the two, that is: $$ e_1 + e_2 = e_t $$ Presumably evolutionary pressure will maximize the flux through the pathway per unit protein. This means that given a fixed total amount of protein $e_t$ there must be a particular distribution of protein between $e_1$ and $e_2$ that maximizes flux. For example, if most of the protein were allocated to $e_1$ and very little to $e_2$ then the flux would be quite low. Likewise if most of the protein were allocated to $e_2$ and very little to $e_1$ then again the flux would be low. Between these two extremes there must be a maximum flux. To show this is the case we can do a simple simulation shown in the python code below. This code varies the levels of $e_1$ and $e_2$ in a loop such that their total is always 1 unit. Each time we change $e_1$ we must change $e_2$ by the same amount in the opposite direction in order to keep the constant fixed. As we do this we collect the steady-state flux and the current level of $e_1$. Finally we plot the flux versus $e_1$.

import tellurium as te
import roadrunner
import matplotlib.pyplot as plt

r = te.loada("""
     J1: $Xo -> S; e1*(k1*Xo - k2*S)
     J2: S ->; e2*k3*S
     Xo = 1
     k1 = 0.5; k2 = 0.2
     k3 = 0.45
     e1 = 0.01; e2 = 0.99


x = []; y = []; 
for i in range (49):
    r.e1 = r.e1 + 0.02
    r.e2 = 1 - r.e1  # Total assumed to be one
    x.append (r.e1)
    y.append (r.J1)
plt.grid(b=True, which='major', color='#666666', linestyle='-')
plt.grid(b=True, which='minor', color='#999999', linestyle='-', alpha=0.2)
plt.plot (x, y, linewidth = 2)
plt.xlabel('$e_1$', fontsize=36)
plt.ylabel('Flux', fontsize=36)

The following graph show the results from the simulation (the font sizes might be on the big size if you have a low res monitor):
You can see the flux reaching a maximum at around $e_1 = 0.6$, meaning also that $e_2 = 0.4$ in order to keep the total fixed. The actual position of the peak wil depend on the rate constants in the rate expressions. Let's assume we are at the maxium. The slope, $dJ/de_1$, at the maxium is obviously zero. That means if we were to move a small amount of protein from $e_1$ to $e_2$ the flux won't change. We can write this experiment in terms of the two flux control coefficients: $$ \frac{\delta J}{J} = 0 = C^J_{e_1} \frac{\delta e_1}{e_1} + C^J_{e_2} \frac{\delta e_2}{e_2} $$ However, we know that in this particlar experiment the change in $e_1$ is the same but oppisite to the change in $e_2$. That is $\delta e_1 + \delta e_2 = 0$ or $$ \delta e_1 = -\delta e_2$$ Replacing $e_2$ with $-\delta e_1$ gives: $$ \frac{\delta J}{J} = 0 = C^J_{e_1} \frac{\delta e_1}{e_1} - C^J_{e_2} \frac{\delta e_1}{e_2} $$ $$ \frac{\delta J}{J} = 0 = C^J_{e_1} \frac{1}{e_1} - C^J_{e_2} \frac{1}{e_2} $$ $$ C^J_{e_1} \frac{1}{e_1} = C^J_{e_2} \frac{1}{e_2} $$ Giving the final result: $$ \frac{C^J_{e_1}}{C^J_{e_2}} = \frac{e_1}{e_2} $$ That is, when the protein distribution is optimized to maximize the flux, the flux control coefficients are in the same ratio as the ratio of enzyme amounts. This generalizes to any size pathway with multiple enzymes. This gives us a tantgilizing suggestion that we can obtain the flux control coeficients just by measuring the protein levels.

There is obviously a lot more one can write there and maybe I do that in future blogs but for now you can get further information:

S. Waley, “A note on the kinetics of multi-enzyme systems,” Biochemical Journal, vol. 91, no. 3, p. 514, 1964.

J Burns: “Studies on complex enzyme system.” https://era.ed.ac.uk/handle/1842/13276, 1971 (page 141-) I have a LaTeX version at: https://github.com/hsauro/JumBurnsThesis

Guy Brown, Total cell protein concentration as an evolutionary constraint on the metabolic control distribution in cells,” Journal of theoretical biology, vol. 153, no. 2, pp. 195–203, 1991.

E. Klipp and R. Heinrich, “Competition for enzymes in metabolic pathways:: Implications for optimal distributions of enzyme concentrations and for the distribution of flux control,” Biosystems, vol. 54, no. 1-2, pp. 1–14, 1999

Sauro HM Systems Biology: An Introduction to Metabolic Control Analysis, 2018

Friday, April 21, 2023

Relationship of fluxes to enzyme levels in a metabolic pathway

To get hold of me it is best to contact me via my email address at: hsauro@uw.edu

Someone asked me the other day what the relationship was between the steady-state fluxe through a reaction and the coresponding level of enzyme. Someone else suggested that there would be a linear, or proportional relatinship between a flux and the enzyme level. However, this can’t be true, at least at steady. Considder a 10 step linear pathway. At steady-state each step in the pathway will, by defintion, carry the same flux. This is true even if each step has a different enzyme level. Hence the relationship is so simple. In fact the flux a given step carries is a systemic properties, dependent on all steps in the pathway. As an experiment I decided to do a simulation on some synthetic netowrks with random parameters and enzyme levels. For this exmaple I just used a simple rate law of the form: $$ v = e_i (k_1 A - k_2 B) $$ For a bibi reaction, A + B -> C + D, the coresponding rate law would be: $$ v = e_i (k_1 A B - k_2 C D) $$ a similar picture would be seen for the unibi and biuni reactions. Using our teUtils package I generated random networks with 60 species and 150 reactions. The reactions allowed are uiui-uinbi, biui or bibi. I then randomized the values for the enzymne levels $e_i$ and computed the steady-state flux. I used the following code to do the analysis. I have a small loop that generates 5 random models but obviously this number can be changed. I generate a random model, load the model into roadrunner, randomize the values for the $e_i$ parameters between 0 and 10, compute the steady-state (I do a presimulation to help things along) and collect the corresponding $e_i$ and flux values. Finally I plot each pair in a scatter plot.

import tellurium as te
import roadrunner
import teUtils as tu
import matplotlib.pyplot as plt
import random

for i in range (5):
      J = []; E = []
      antStr = tu.buildNetworks.getRandomNetwork(60, 150, isReversible=True)
      r = te.loada(antStr)

      n = r.getNumReactions()
      for i in range (n):      
          r.setValue ('E' + str (i), random.random()*10)    
      m = r.simulate(0, 200, 300)

      for i in range (n):
          J.append (abs (r.getValue ('J' + str (i)))) 
          E.append (r.getValue ('E' + str (i)))        
      plt.figure(figsize=(12, 8))
      plt.plot(E, J, '.')
        print ('Error: bad model')
The results for five random networks is shown below. Note the x axis is the enzyme level and the y axis the corresponding steady-state flux through that enzyme. It's intersting to see that there is a rough correlation between enzyme amount and the corresponding flux, but its not very strong. Many of the points are just scattered randomly with some showing a definite correlation. The short answer is the realtinship is not so simple.

Wednesday, March 1, 2023

How not to Comment Code

I recently came across one of the ten simple rules aricles in PLoS Comp Bio:

"Ten simple rules for tackling your first mathematical models: A guide for graduate students by graduate students" by Korryn Bodner et al


One thing that struck me was Rule 5 on coding best practices with commenting being one of the discussion points. What struck me was their screen shot of a documented function shown below (in R):

My take on commenting is that it should be used to add human readable metadata on elements of a program that are not immediately obvious.

Most of the time, code should be sufficiently readable to indicate what it's doing. Obviously some languages are better than others when desribing an algorithm but it is also dependent on the programmer. I've seen code written in clear languages that are unintelliglbe, but I've also seen code written in poorly expressible languages that are easily readable. Although the programming language itself can influence code reability I think the programmer has much more influence.

But back to Rule 5. In the example you'll see something like:

# calculate the mean of the data

u <- mean (x)

This is completely redundant, as the coding states what it is going to do. In fact the authors comment every line like this. If anythng, I think the extent of comments actually hinders the reabilty of the code. The code itself is mostly clear as to what it is doing. There may be a justification to include a comment on next line that computes the standard error because the variables names are so badly chosen, e.g what does the following line do:

s <- sd(x)

sd might stand for standard deviation but the rest of the line offers no clue. If it had been written as:

standardDeviation <- sd(x)

It would have been much clearer, instead the authors add a comment to make up for poor choice of variable names. They also give the function itself a nondescriptive name, in this case ci. It would have been better to write the function using getConfidenceInterval or similar:

getConfidenceInterval <- f (data) {


Tuesday, January 24, 2023

Euclid's Elements

Nothing to do with cells and networks, but I have an interest in Euclid's Elements and decided to publish a new edition of Book I. The difference with previous editions is that this one is in color and also has a chapter on the history of the elements, together with commentaries on each proposition. For those unfamiliar with Euclid's Elements, it's a series of books (more like chapters in today's language) laying the foundation for geometry and number theory. Book I focuses on the foundation of geometry culminating in a proof for Pythagoras' theorem and other important but less known results. The key innovation is that it describes a deductive approach to geometry. It starts with definitions and axioms from which all results are derived using logical proofs.

It occurred to me that something similar could be done with deriving the properties of biochemical networks. For example, we might define the following three primitives:

I. Species

II. Reaction

III. Steady-state

We might then define the following axioms:

I. A species has associated with it a value called the concentration, x_i.

II. All concentrations are positive.

III. A reaction has a value associated with it called the reaction rate, v_i.

IV. Reaction rates can be negative, zero, or positive.

V. A reaction transforms one or more species (reactants) into one or more other species (products).

VI. The reaction rate is a continuous function of the reactants and products.

VII. The rate of change of a species can be described using a differential equation, dx/dt

VIII. All steps are reversible unless otherwise stated (may this can be derived?)


Given these axioms, we could build a series of propositions. This might be an interesting exercise to do. Some of the more obvious propositions would be the results from metabolic control analysis, such as the summation and connectivity theorems.

Link to Amazon

Friday, December 16, 2022

Branched Pathway in TikZ

 I couldn't resist, here are some simple branched pathways using tikz. Note you can change the vertical splay and horizontal distance for the nodes by changing the node distance in the argument to tikzpicture. I also added some colors wshich really don't match, someone with more artistic tallent could do better.




\begin{tikzpicture}[node distance=0.4cm and 1cm]
\node (Xo) at (0,0) {};
\node[right=of Xo] (S1) {\small $S_1$};
\node[above right=of S1] (S2) {};
\node[below right=of S1] (S3) {};

\draw[-latex, thick, blue] (Xo) -- (S1) node[pos=0.5,above,gray] {$v_1$};

\draw[-latex, thick, orange] (S1) -- (S2) node[above=0.15, left,pos=0.5,gray] {$v_2$};
\draw[-latex, thick, green!45!black] (S1) -- (S3) node[below=0.15, left,pos=0.5,gray] {$v_3$};


\begin{tikzpicture}[node distance=0.4cm and 1cm]
%\draw [help lines,step=.1] (0,-6) grid (6,6);
\node (Xo) at (0,0) {};
\node[right=of Xo,yellow!20!red] (S1) {\small $S_1$};
\node[above right=of S1,blue!80!red] (S2) {\small $S_2$};
\node[below right=of S1,green!80!red] (S3) {\small $S_3$};

\node[above right=of S3] (S4) {};
\node[below right=of S3] (S5) {};

\draw[-latex, thick, blue] (Xo) -- (S1) node[pos=0.5,above,purple] {$v_1$};
\draw[-latex, thick, blue] (S1) -- (S2) node[above=0.15, left,pos=0.5,purple] {$v_2$};
\draw[-latex, thick, blue] (S1) -- (S3) node[below=0.15, left,pos=0.5,purple] {$v_3$};

\draw[-latex, thick, blue] (S3) -- (S4) node[below=-0.12, left,pos=0.5,purple] {$v_4$};
\draw[-latex, thick, blue] (S3) -- (S5) node[below=0.05, left,pos=0.5,purple] {$v_5$};


Sunday, December 11, 2022

Experimenting with foreach loops in TikZ to draw biochemical pathways

Here is an example of a foreach loop in TikZ that can be used to draw an arbitrary long linear chain of reactions.

By setting the value of N in the following TikZ code, you can get any length linear chain. Obviously, you're limited by the width of the page. It uses the xifthen package to provide a conditional that is used to print out the last species, which is X1. There could be a better way of doing this, but this works. For example, use newcommand instead of def



\begin{tikzpicture}[>=latex', node distance=2cm]     
\node (X0) {$X_o$};
\foreach \x in {0,...,\N}
  \ifthenelse{\x = \N}
        {\def\speciesName{\large $x_\nextval$}}
  \node [right of = X\x] (X\nextval) {\speciesName};   
  \draw [->,ultra thick,blue] (X\x) -- node[above, black] {$v_{\nextval}$} (X\nextval);


Here are some examples for N = 0, N = 2 and N = 4

Tuesday, November 29, 2022

More utter metabolic nonsense

When will researchers stop writing utter nonsense about metabolism? I was reading this review:

Molecular Metabolism
12 November 2022, 101635
Nitin Patil Orla Howe, Paul Cahill and Hugh J.Byrne

Monitoring and modelling the dynamics of the cellular glycolysis pathway: A review and future perspectives and spotted this sentence in the text:

"Alternately, the activity of rate-limiting glycolytic enzymes can be determined by quantifying their catalytic products in vivo [115]. "

I thought, that can't be true, it's not possible to ascertain rate limitingness just from the products. So I checked out the citation they used:

Methods in Enzymology
Volume 542, 2014, Pages 91-114
Chapter Five - Techniques to Monitor Glycolysis
Tara TeSlaa and Michael A.Teitell

Well I was in for a treat, so many illogical statements in one paper. Here is one such sentence:

"While the activity of any single glycolytic enzyme is not a proxy for carbon flux through the entire pathway, specific enzymes limit the rate of glycolysis, and therefore, their activities control the maximum possible flux. Hexokinase, phosphofructokinase, and PK are the main rate-limiting enzymes in glycolysis."
v So authors are mixing up the Vmax for rate-limitation as if enzymes are even running at their Vmax. These enzymes, using the more classical definition of rate-limitingness, are not rate-limiting other than HK, but that depends on the organism and conditions. The reason they aren't rate-limiting is that they are regulated, which means any changes to their activity results in no change to the pathway flux. Not only that, it turns out that for the mentioned enzymes, particularly HK and PK, don't even have low Vmax's so even if they were running at Vmax they wouldn't be limiting the flux anyway. PFK is also not that low, since PDC and ENO have lower Vmaxs (this is in yeast). The point is there is no necessary correlation between rate-limitingness (however they define it) and an enzyme's Vmax. There are other reasons at play for determining a Vmax.

Here is another one:

"while determining the activity of rate-limiting glycolytic enzymes can provide insights into points of metabolic regulation."

There is no reason why rate-limiting steps are regulated, quite the opposite in fact. I'm not sure how much experimental data, theory, or computational research need to be done to convince them this isn't true. I never did find any reference to my orginal quesiton in the 2014 paper. Me thinks the orginal authors didn't read the 2014 review. There is simply no way to tell if a step is rate-limiting or not just from its products. The authors also have a modeling section where they state:

"Ordinary and partial differential equation can be used for both deterministic and stochastic modelling."

what were they thinking?

And another:

Mass action kinetics, rate law (Michaelis Menten models, Hill kinetics),

MM and Hill models don't follow clasical mass-action kinetics, they show fractional kinetic orders.

The authors never mention Metabolic Control Analysis at all, which is a primary language for talking objectively about metabolic control and regualation.

We've obviously gone very wrong somewhere in our education and training of systems biology researchers.

Tuesday, September 27, 2022

Experimenting with GeoGebra

This is an example of using GeoGebra to interactively simulate a two step pathway, S1 -> S2 -> S3 using simple mass-action kinetics in each reaction.

Tuesday, August 16, 2022

Plot of 1/x using pgfplots

I noticed I couldn't find pgfplots version of the $1/x$ plot that describes $e$. I needed one, so here is the LaTeX I used.



ytick = {1,2},
xtick = {0,1,2,3},
xlabel=$x$, ylabel=$y$,
ymax=2.5, xmin=0, ymin=0,
xlabel style = {anchor=north east},
ylabel style = {anchor=north east}
\addplot [color = blue, name path=A,domain=0:4, line width = 1.4pt, samples=200] {1/x};
\path [name path=B]
    (\pgfkeysvalueof{/pgfplots/xmin},0) --
\addplot [blue!20]
    fill between [of=A and B,
                  soft clip={(1,0) rectangle (2.71,25)},];
\node at (axis cs:1.5,1.2) {$y=1/x$};
\node at (axis cs:1.6,0.31) {Area = $1$};
\node at (axis cs:2.71, 0.82) {$e$};
\addplot[-latex] coordinates
           {(2.71,0.75) (2.71,0.45)};


Friday, July 29, 2022

To paraphrase an old saying....

 You can lead a modeler to a technical solution, but you can't make them use it.

This rephrasing of an old saying was inspired by our reproducibility work. There exist technical solutions to publishing reproducible results, and yet most studies are still not reproducible because we choose not to use those solutions.