Tags

, , , ,

One of the great battles programmers fight involves synchronization between two different parts of the code. The most common example of this is what the comments say versus what the code really does. Another common example is the structure of data stored somewhere (like a database) and the code that manipulates it.

A lot of the programmer’s effort and technique is devoted to managing, if not preventing, synchronization issues. The rules encouraging encapsulation, or against global objects, have a lot to do with this goal.

It’s always the first rule for dealing with monsters: Avoid if possible! Failing that, try to contain them. At the least put up some warnings to future travelers: Here There Be Dragons!

Less fancifully, isolating the different parts of your code as much as possible is a powerful tool for fighting insidious bugs. Any global object should raise a red flag — does this need to be so visible? The goal here is the smallest possible set of global objects. (Although better a global variable than the same constant peppered throughout the code — that’s an even worse synchronization problem.)

The synchronization problem is as ubiquitous as the Yin-Yang dualities we see in all things because it is a very real instance of such a duality. The synchronization problem comes from there being, in the code, a Yin and a Yang that must agree.

§

The most common occurrence of the synch problem is between what the code actually does and what the comments claim it does. (Because there are comments, right?) And regardless of what those comments may say, the code always wins.

A big part of the problem is that, so far, only the human mind is capable of comparing comment content with code function. It takes a dedicated human to stay on top of the problem which may be why many programmers put off writing comments — they figure they’ll do it once the code stabilizes. (Of course, there’s never time to come back around and do that, which is why rule #4.)

What’s worse, since lack of synchronization between code and comments isn’t a bug — it doesn’t break the code — there is also motivation to ignore it. Few programmers want to spend what seems like unproductive time rewriting old comments. (The problem is how many see writing them at all as unproductive.)

[In some cases the motivation is downright malevolent. I knew an engineer who knew enough coding to be dangerous, so he wrote the code for his own designs. He didn’t comment anything, plus he used obscure and terse names for variables and functions. It was intentional as a form of job security, but it’s a direct and flagrant violation of rule #1 and rule #2.]

[[I don’t know where this quote comes from, but understood as wry sarcasm it’s hilariously on the mark: “Nearly every electrical engineer believes deep in his heart that he is better at writing computer software than any computer programmer, and can show as proof the fact that he has written a number of small applications, each of which was done quickly, easily, and exactly met his needs.” I can attest to its truth from personal experience (and I’d drop the word “electrical”).]]

§

The synch problem between code and comments exists in general form as the tension between any form of documentation about the code versus what the code actually does. Even user manuals can suffer from it.

The perception that comments or documentation don’t add value — or at least that they take time that could be used for coding — can be pervasive. User documentation, in particular, often gets minimal amount of attention and resources. This is usually short-sighted. Good documentation saves money down the line.

As one example, an upfront investment in good user documentation can reduce the need for ongoing user support throughout the product’s life cycle. Would you rather train and pay a group of people to explain the same thing over and over, or use the people who built the product to do it once up front?

I don’t fully understand the calculus of spreading cost into the future when taking the hit makes everything better, but maybe I’m too much the idealist. I believe documentation is bullet a good programmer has to bite. Writing comments should become second nature. It makes such a difference when you revisit your old code.

About the documentation… I get it. Writing user docs can be like writing a detailed description of a good party the day after the party. It’s a chore. But it’s like working out; a pain but worth it in the end.

§

Another place the synch problem bites deeply is functions — in three ways:

Firstly, the function name itself is synchronized between all the lines of code that call it. Modern tools make changing function and variable names fairly easy, but not everyone uses them. (I wouldn’t consider working on serious projects without a good IDE, though.) Regardless, once you name a function or global variable, changing it entails some degree of hassle and risk. A good naming convention can be very helpful.

Secondly, the parameters the function takes versus what its callers pass. Some subtle bugs involving datatype can lurk here.

One form involves a function like:

function print(output_file, text_to_print)
function print(text_to_print, output_file)

A programmer might create either version depending on their internal model of what’s going on in that function:

output ⇐ text
text ⇒ output

The first version reflects a standard coding convention where x=a+b assigns the sum of a and b to x. The flow of data is right-to-left, and is a common way of thinking among programmers. The second version is more traditional Western left-to-right and probably how most non-programmers would see it (and so would some programmers).

As a design, neither is more right nor wrong, but language libraries that aren’t consistent about the implied data flow of their functions can be the bane of programmers. There were some functions in the standard C library that I always had to look up to be sure of the parameter order. (Mercifully, I’ve forgotten almost everything I knew about C and C++.)

Thirdly, in what the function returns versus what the callers expect back. This is just a mirror of the second one, and only applies to the single return parameter, but in some languages an especially ugly bug lurks here.

That bug, in languages such as C or C++, is that a function can return an allocated object — one it carved out of available memory. The synch problem is that that caller receiving that returned object is expected to properly free that memory. This was a huge source of memory leaks in a lot of programs. Languages with garbage collection have done a lot to mitigate this synchronization problem.

§

Similar issues exist between classes and the clients using them. The class names, method names, method parameters and return values, all must be synchronized with the client code that uses them.

In fact it’s no different from needing to synchronize function names (and parameters and return values), type definitions, or global values. Object-oriented design doesn’t change the actual work so much as arrange it differently. (I think of OOD as noun-verb orientation versus the procedural verb-noun orientation.)

[I’m a fan of OOD, and I do think it helps with synchronization. It takes all your functions out of global space and groups them (as methods) in the namespace of classes. That alone, to me, is a big win. Further, it allows similar functions to have the same name (e.g. “save” and “load”). I like the polymorphism, too.]

This name synchronization is a fundamental requirement is programming. There are things — many things —you create and give a name to. Then you write many lines of code, often in many different files, that refer to those things by their names. So, firstly, pick good names, and secondly, do what you can to minimize how many global names you need. Always try to limit the scope of names as much as possible.

§

Along with comments-vs-code and names-vs-code, a third very common Yin-Yang is data-vs-code. As their names suggest, all are related. All involve the code being out of synch with something else. The code only “wins” in the first case. It “loses” if it uses the wrong name or data. One difference is that, in the data case, the data might be the one that’s wrong.

A simple example of this is a format string and the parameters it expects:

format("{1}: {2} [{3},{4}]", ix, name, x)

The example above wants four parameters to fill the string, but only three are supplied (there should be a “y” parameter following the “x”). That kind of bug should show up during testing; in many languages it’ll cause a run-time error.

Usually supplying the right number but wrong types also generates an error. For example, the following will cause an error assuming the format string cares about datatype:

format("{%d}: {%s} [{%d},{%d}]", ix, y, x, name)

Format strings that use tokens such as %d for integers and %f for floats expect to receive those datatypes. Getting something unexpected will generate an error. If the string just expects something for a placeholder, then the bug might not cause an error, but it will make the string look wrong.

Something much more insidious is:

format("{1}: {2} [{3},{4}]", ix, name, y, x)

Which provides the right number and type of parameters but puts the last two in the wrong order. This bug would require careful attention to spot.

§

Python (and other languages that allow multiple assignment) has another version of this:

a,b,c = obj

The obj needs to be a list (or iterable) with three (and only three) objects. If obj isn’t an iterable, or if the count is other than three, Python raises an Exception.

§

The last one I’ll mention is a larger version of the format string issue. This one doesn’t involve the code; this one is between the data and the display of that data.

Programmers use layers with small interfaces to isolate the data model from various views on that model, but synch problems can still exist between the view and the model.

As I say, it’s fundamental. Any machine with parts requires those parts to operate in synchronization. Software is an extremely complicated machine. (Literally a Turing Machine.) Its synchronization needs are all the more urgent and important.

Ultimately it’s a form of entropy to be fought. It lurks in everywhere.

§

I’ve discovered a nagging synch problem lurking in blog tags. They tend to proliferate, so it’s hard to remember them. One ends up creating similar tags (adding to the proliferation) when an older tag would have worked managing the proliferation and making the blog more unified).

Ø