In the previous post I semi-corrected a long-time oversight by revisiting the post The Thing About Constants and turning it into a Coding Rule, Rule #6. (I say “semi-corrected” because it really should have been Rule #3.)
In this post I semi-correct another oversight represented by the post The Synchronization Problem. It’s the other one that has been bugging me for a while because it should have been a rule (probably Rule #4). As they say, “Better late than never.”
The synchronization problem is a general issue for programmers. It rears its head whenever there are two (or more) points in the code that must agree with each other. The very nature of coding requires this, so it’s not something that can be avoided. It must be managed.
Some common examples include:
Functions and Function Calls: The definition of functions versus the use of those functions by the code is an obvious and common example. If a function is defined:
function my_function:point
x:float,
y:float
begin
return point(x, y)
end
Then users of that function must be sure to pass it two float values (or values that can be treated as such) and must expect to receive a point object in return. If this “signature” of the function ever changes, then all calls to that function have to be changed. That’s the synchronization problem; the calls must be kept in synch with the function.
An even simpler instance involves the name of the function — function calls invoke that name. Which means that changing the name of the function breaks the code until all the places that use it are updated. This is a specific instance of the general synchronization problem between named objects and users of those objects. Named objects are fundamental to coding, which is a big part of why the synchronization problem is unavoidable.
Classes and Class Instances: This is the same situation as above but for object-oriented programming. The definition of a class and the instances of that class — the use of that class — must be kept in synch. This includes the name of the class, the names and signatures of methods, and any properties objects expose.
Databases and Database Clients: Databases have defined data tables, and those tables have defined data columns. Uses of the data must be kept in synch with the design of the data.
As always, the problem is that changing something (in this case, the database) breaks the code. Databases present an extra problem because they’re external to the code. it’s not uncommon for them to be administered by a different group (or even company), which makes them prone to unexpected (possibly unannounced) changes.
Comments: The above examples all create bugs in the code — bugs that presumably can be caught. Typically, programs fail rapidly (or won’t even compile) if there is a mismatch between function signature and calls to that function. A big point of function signatures is to validate function calls during compile. Same with classes and users of those classes. Database synchronization errors may be harder to catch but usually a bad query fails immediately. There is no way, though, to ensure that comments speak the truth about code, and it’s very easy to change the code but forget to update the comment about it.
Documentation is a more public example of text about the code not matching the actual code. (Given that documentation isn’t always seen as a value-add, companies are sometimes reluctant to spend much effort on it. Ultimately, some of the code authors should work with experienced writers to create it, but that’s a whole other topic.)
The synchronization problem is just the name of the issue. As I’ve said, the nature of code makes it unavoidable — something to be managed. Which brings us to Rule #7:
Never repeat yourself (unnecessarily).
A particularly nasty example might be defining the same data structure in two different files. Maybe it’s a really simple data structure, and the programmer didn’t want to import the definition from the other file. Or perhaps didn’t even know about it.
This probably works fine at first — passes all initial testing. But what happens if an update down the road only changes one of them? Perhaps the definition in one file was logical, even documented, but the duplicate in the other file wasn’t. Now the program has a bug — perhaps a serious one.
A milder version involves progress or error message strings. It’s not uncommon for similar operations in different parts of the code to use similar or identical strings for display, logging, or error-reporting. For example, multiple code locations that open a file might all have some version of a “File not found” string.
Here again, the real problem is what happens if one is changed but others aren’t. In the case of strings, probably not an error — merely a difference in output text, but if the goal was unification of program behavior, now the program is “marred”.
This example also ties to Rule #6 from last Monday. String literals don’t belong in code in the first place and should be defined elsewhere and given meaningful names. Doing this in a central place makes it easy to use the same text anyplace it’s needed and provides only one string instance to change.
Here’s another example:
002|
003| BasePath = r’C:\demo\hcc\python’
004| DataPath = path.join(BasePath, ‘dat’)
005| ImgsPath = path.join(BasePath, ‘img’)
006|
Most of my Python apps have some lines like that at the top. Usually more imports — here I only showed what was necessary for the code fragment to work. This follows Rule #6 as well as Rule #7 in that any point in the code needing one of these directory paths uses the defines. Note in particular that I don’t even repeat the base path when defining the two subdirectories. This makes it easy to move the program to a new location — just one place to change.
More importantly in terms of Rule #7, if my app grows to having multiple files, the other files simply do this:
002|
Which really should be obvious, but I’ve seen programmers do the equivalent of repeating the code — copy/paste — from the first fragment in other files. But if you move the files to another directory, each file defining BasePath must be edited.
The bottom line is that any time you repeat yourself, you’re creating (at least potential) extra synchronization problems in your code. Following Rule #7 eliminates those unnecessary synch problems so you can focus on the unavoidable ones.
Note that the Rule says never repeat yourself unnecessarily. Sometimes, for whatever reason, it may be necessary, but it should generate a “do I really need to do this” tension. And it should be a rare occasion that the answer is yes.
The one place I might relax this a bit is in throw-away code or really simple stuff where copy/paste is just quicker and easier. But always remember that “throw-away” code has a weird tendency to become permanent. It’s usually best to always follow good coding practices just as a matter of habit.
This is why I signal my turns and stop at stop signs even when no one is around. It isn’t so much about following the rules as having good habits for when stress or something sudden throw you for a loop. Ingrained and practiced habits can be a real help then.
Go forth and code correctly!
∅
ATTENTION: The WordPress Reader strips the style information from posts, which can destroy certain important formatting elements. If you’re reading this in the Reader, I highly recommend (and urge) you to [A] stop using the Reader and [B] always read blog posts on their website.
This post is: Rule #7: Never Repeat Yourself