Wednesday, March 1, 2023

How not to Comment Code

I recently came across one of the ten simple rules aricles in PLoS Comp Bio:

"Ten simple rules for tackling your first mathematical models: A guide for graduate students by graduate students" by Korryn Bodner et al

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008539

One thing that struck me was Rule 5 on coding best practices with commenting being one of the discussion points. What struck me was their screen shot of a documented function shown below (in R):

My take on commenting is that it should be used to add human readable metadata on elements of a program that are not immediately obvious.

Most of the time, code should be sufficiently readable to indicate what it's doing. Obviously some languages are better than others when desribing an algorithm but it is also dependent on the programmer. I've seen code written in clear languages that are unintelliglbe, but I've also seen code written in poorly expressible languages that are easily readable. Although the programming language itself can influence code reability I think the programmer has much more influence.

But back to Rule 5. In the example you'll see something like:

# calculate the mean of the data

u <- mean (x)

This is completely redundant, as the coding states what it is going to do. In fact the authors comment every line like this. If anythng, I think the extent of comments actually hinders the reabilty of the code. The code itself is mostly clear as to what it is doing. There may be a justification to include a comment on next line that computes the standard error because the variables names are so badly chosen, e.g what does the following line do:

s <- sd(x)

sd might stand for standard deviation but the rest of the line offers no clue. If it had been written as:

standardDeviation <- sd(x)

It would have been much clearer, instead the authors add a comment to make up for poor choice of variable names. They also give the function itself a nondescriptive name, in this case ci. It would have been better to write the function using getConfidenceInterval or similar:

getConfidenceInterval <- f (data) {

etc

Tuesday, January 24, 2023

Euclid's Elements

Nothing to do with cells and networks, but I have an interest in Euclid's Elements and decided to publish a new edition of Book I. The difference with previous editions is that this one is in color and also has a chapter on the history of the elements, together with commentaries on each proposition. For those unfamiliar with Euclid's Elements, it's a series of books (more like chapters in today's language) laying the foundation for geometry and number theory. Book I focuses on the foundation of geometry culminating in a proof for Pythagoras' theorem and other important but less known results. The key innovation is that it describes a deductive approach to geometry. It starts with definitions and axioms from which all results are derived using logical proofs.

It occurred to me that something similar could be done with deriving the properties of biochemical networks. For example, we might define the following three primitives:

I. Species

II. Reaction

III. Steady-state

We might then define the following axioms:

I. A species has associated with it a value called the concentration, x_i.

II. All concentrations are positive.

III. A reaction has a value associated with it called the reaction rate, v_i.

IV. Reaction rates can be negative, zero, or positive.

V. A reaction transforms one or more species (reactants) into one or more other species (products).

VI. The reaction rate is a continuous function of the reactants and products.

VII. The rate of change of a species can be described using a differential equation, dx/dt

VIII. All steps are reversible unless otherwise stated (may this can be derived?)

etc

Given these axioms, we could build a series of propositions. This might be an interesting exercise to do. Some of the more obvious propositions would be the results from metabolic control analysis, such as the summation and connectivity theorems.

Link to Amazon

Friday, December 16, 2022

Branched Pathway in TikZ

 I couldn't resist, here are some simple branched pathways using tikz. Note you can change the vertical splay and horizontal distance for the nodes by changing the node distance in the argument to tikzpicture. I also added some colors wshich really don't match, someone with more artistic tallent could do better.


\documentclass[12pt]{article}

\usepackage{tikz}
\usetikzlibrary{positioning}
\usepackage{xcolor}

\begin{document}

\begin{tikzpicture}[node distance=0.4cm and 1cm]
\node (Xo) at (0,0) {};
\node[right=of Xo] (S1) {\small $S_1$};
\node[above right=of S1] (S2) {};
\node[below right=of S1] (S3) {};

\draw[-latex, thick, blue] (Xo) -- (S1) node[pos=0.5,above,gray] {$v_1$};

\draw[-latex, thick, orange] (S1) -- (S2) node[above=0.15, left,pos=0.5,gray] {$v_2$};
\draw[-latex, thick, green!45!black] (S1) -- (S3) node[below=0.15, left,pos=0.5,gray] {$v_3$};
\end{tikzpicture}

\vspace{1cm}

\begin{tikzpicture}[node distance=0.4cm and 1cm]
%\draw [help lines,step=.1] (0,-6) grid (6,6);
\node (Xo) at (0,0) {};
\node[right=of Xo,yellow!20!red] (S1) {\small $S_1$};
\node[above right=of S1,blue!80!red] (S2) {\small $S_2$};
\node[below right=of S1,green!80!red] (S3) {\small $S_3$};

\node[above right=of S3] (S4) {};
\node[below right=of S3] (S5) {};

\draw[-latex, thick, blue] (Xo) -- (S1) node[pos=0.5,above,purple] {$v_1$};
\draw[-latex, thick, blue] (S1) -- (S2) node[above=0.15, left,pos=0.5,purple] {$v_2$};
\draw[-latex, thick, blue] (S1) -- (S3) node[below=0.15, left,pos=0.5,purple] {$v_3$};

\draw[-latex, thick, blue] (S3) -- (S4) node[below=-0.12, left,pos=0.5,purple] {$v_4$};
\draw[-latex, thick, blue] (S3) -- (S5) node[below=0.05, left,pos=0.5,purple] {$v_5$};
\end{tikzpicture}

\end{document}

Sunday, December 11, 2022

Experimenting with foreach loops in TikZ to draw biochemical pathways

Here is an example of a foreach loop in TikZ that can be used to draw an arbitrary long linear chain of reactions.

By setting the value of N in the following TikZ code, you can get any length linear chain. Obviously, you're limited by the width of the page. It uses the xifthen package to provide a conditional that is used to print out the last species, which is X1. There could be a better way of doing this, but this works. For example, use newcommand instead of def
 
\documentclass{article}

\usepackage{tikz}
\usetikzlibrary{arrows}
\usetikzlibrary{calc}
\usepackage{xifthen}

\begin{document}


\begin{tikzpicture}[>=latex', node distance=2cm]     
\node (X0) {$X_o$};
\pgfmathsetmacro{\N}{4}
\foreach \x in {0,...,\N}
{
  \pgfmathtruncatemacro{\nextval}{\x+1}
  \ifthenelse{\x = \N}
        {\def\speciesName{$X_1$}}
        {\def\speciesName{\large $x_\nextval$}}
  
  \node [right of = X\x] (X\nextval) {\speciesName};   
  \draw [->,ultra thick,blue] (X\x) -- node[above, black] {$v_{\nextval}$} (X\nextval);
}
\end{tikzpicture}

\end{document}

Here are some examples for N = 0, N = 2 and N = 4



Tuesday, November 29, 2022

More utter metabolic nonsense

When will researchers stop writing utter nonsense about metabolism? I was reading this review:

Molecular Metabolism
12 November 2022, 101635
Nitin Patil Orla Howe, Paul Cahill and Hugh J.Byrne

Monitoring and modelling the dynamics of the cellular glycolysis pathway: A review and future perspectives and spotted this sentence in the text:

"Alternately, the activity of rate-limiting glycolytic enzymes can be determined by quantifying their catalytic products in vivo [115]. "

I thought, that can't be true, it's not possible to ascertain rate limitingness just from the products. So I checked out the citation they used:

Methods in Enzymology
Volume 542, 2014, Pages 91-114
Chapter Five - Techniques to Monitor Glycolysis
Tara TeSlaa and Michael A.Teitell

Well I was in for a treat, so many illogical statements in one paper. Here is one such sentence:

"While the activity of any single glycolytic enzyme is not a proxy for carbon flux through the entire pathway, specific enzymes limit the rate of glycolysis, and therefore, their activities control the maximum possible flux. Hexokinase, phosphofructokinase, and PK are the main rate-limiting enzymes in glycolysis."
v So authors are mixing up the Vmax for rate-limitation as if enzymes are even running at their Vmax. These enzymes, using the more classical definition of rate-limitingness, are not rate-limiting other than HK, but that depends on the organism and conditions. The reason they aren't rate-limiting is that they are regulated, which means any changes to their activity results in no change to the pathway flux. Not only that, it turns out that for the mentioned enzymes, particularly HK and PK, don't even have low Vmax's so even if they were running at Vmax they wouldn't be limiting the flux anyway. PFK is also not that low, since PDC and ENO have lower Vmaxs (this is in yeast). The point is there is no necessary correlation between rate-limitingness (however they define it) and an enzyme's Vmax. There are other reasons at play for determining a Vmax.

Here is another one:

"while determining the activity of rate-limiting glycolytic enzymes can provide insights into points of metabolic regulation."

There is no reason why rate-limiting steps are regulated, quite the opposite in fact. I'm not sure how much experimental data, theory, or computational research need to be done to convince them this isn't true. I never did find any reference to my orginal quesiton in the 2014 paper. Me thinks the orginal authors didn't read the 2014 review. There is simply no way to tell if a step is rate-limiting or not just from its products. The authors also have a modeling section where they state:

"Ordinary and partial differential equation can be used for both deterministic and stochastic modelling."

what were they thinking?

And another:

Mass action kinetics, rate law (Michaelis Menten models, Hill kinetics),

MM and Hill models don't follow clasical mass-action kinetics, they show fractional kinetic orders.

The authors never mention Metabolic Control Analysis at all, which is a primary language for talking objectively about metabolic control and regualation.

We've obviously gone very wrong somewhere in our education and training of systems biology researchers.

Tuesday, September 27, 2022

Experimenting with GeoGebra

This is an example of using GeoGebra to interactively simulate a two step pathway, S1 -> S2 -> S3 using simple mass-action kinetics in each reaction.

Tuesday, August 16, 2022

Plot of 1/x using pgfplots

I noticed I couldn't find pgfplots version of the $1/x$ plot that describes $e$. I needed one, so here is the LaTeX I used.

\documentclass[margin=3mm]{standalone}
\usepackage{pgfplots}
\pgfplotsset{compat=1.18}
\usepgfplotslibrary{fillbetween}

\begin{document


\begin{tikzpicture}
\begin{axis}[
ytick = {1,2},
xtick = {0,1,2,3},
xlabel=$x$, ylabel=$y$,
ymax=2.5, xmin=0, ymin=0,
xlabel style = {anchor=north east},
ylabel style = {anchor=north east}
]
\addplot [color = blue, name path=A,domain=0:4, line width = 1.4pt, samples=200] {1/x};
\path [name path=B]
    (\pgfkeysvalueof{/pgfplots/xmin},0) --
    (\pgfkeysvalueof{/pgfplots/xmax},0);
\addplot [blue!20]
    fill between [of=A and B,
                  soft clip={(1,0) rectangle (2.71,25)},];
\node at (axis cs:1.5,1.2) {$y=1/x$};
\node at (axis cs:1.6,0.31) {Area = $1$};
\node at (axis cs:2.71, 0.82) {$e$};
\addplot[-latex] coordinates
           {(2.71,0.75) (2.71,0.45)};
\end{axis}
\end{tikzpicture}

\end{document} 



Friday, July 29, 2022

To paraphrase an old saying....

 You can lead a modeler to a technical solution, but you can't make them use it.


This rephrasing of an old saying was inspired by our reproducibility work. There exist technical solutions to publishing reproducible results, and yet most studies are still not reproducible because we choose not to use those solutions. 



Sunday, June 26, 2022

MCA Rediscovered


It looks like someone has rediscovered metabolic control analysis (MCA).

A structural approach to understanding enzymatic regulation of chemical reaction networks Biochem J (2022) 479 (11): 1265–1283. Atsushi Mochizuki

The analysis is exactly the same as MCA but uses different symbols and they focus on the unscaled sensitivities instead. eg the r symbols are the unscaled elasticities. The core equation (4) can be found in equation (1) of the appendix of the following paper, and I am sure its been published elsewhere too:


However, unlike the original MCA, the latest reincarnation doesn't include support for conserved moieties so as it stands it's somewhat limited. Note equation (1) in the above paper includes additional terms to take into account any conserved moieties. 

What is more concerning is that the reviewers of the paper never spotted this duplication of work. 





Tuesday, April 19, 2022

Lorenz Attractor

I wanted to test out Google's skia 2D library so I wrote a simple interactive Lorenz attractor app over the weekend. Source code and binaries at 

https://github.com/hsauro/Lorenz_fmx

Binaries are only for Windows at the moment but hope to have a Mac version in a couple of weeks.

I used Object Pascal to write it but it should be easily translatable to something like C# and WinForms which were modeled after Object Pascal and the VCL. Looks like skia does a better job at antialiasing lines and also it's quite fast.

I use a simple Euler integration scheme to solve the Lorenz ODEs, nothing fancy but it seems to work perfectly well. 




Tuesday, April 5, 2022

Analogmachine reborn

I decided to rebuild my blog on blogger. Originally I used my own WordPress site but discovered that maintaining it was quite time-consuming. The main problem was stopping the site from being hacked and corrupted.  This happened again recently and it prompted me to move to a more resilient and less demanding platform.