Wednesday, March 1, 2023

How not to Comment Code

I recently came across one of the ten simple rules aricles in PLoS Comp Bio:

"Ten simple rules for tackling your first mathematical models: A guide for graduate students by graduate students" by Korryn Bodner et al

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008539

One thing that struck me was Rule 5 on coding best practices with commenting being one of the discussion points. What struck me was their screen shot of a documented function shown below (in R):

My take on commenting is that it should be used to add human readable metadata on elements of a program that are not immediately obvious.

Most of the time, code should be sufficiently readable to indicate what it's doing. Obviously some languages are better than others when desribing an algorithm but it is also dependent on the programmer. I've seen code written in clear languages that are unintelliglbe, but I've also seen code written in poorly expressible languages that are easily readable. Although the programming language itself can influence code reability I think the programmer has much more influence.

But back to Rule 5. In the example you'll see something like:

# calculate the mean of the data

u <- mean (x)

This is completely redundant, as the coding states what it is going to do. In fact the authors comment every line like this. If anythng, I think the extent of comments actually hinders the reabilty of the code. The code itself is mostly clear as to what it is doing. There may be a justification to include a comment on next line that computes the standard error because the variables names are so badly chosen, e.g what does the following line do:

s <- sd(x)

sd might stand for standard deviation but the rest of the line offers no clue. If it had been written as:

standardDeviation <- sd(x)

It would have been much clearer, instead the authors add a comment to make up for poor choice of variable names. They also give the function itself a nondescriptive name, in this case ci. It would have been better to write the function using getConfidenceInterval or similar:

getConfidenceInterval <- f (data) {

etc