Tags
code clarity, computer programming, constants, defined values, global constants, literal values, readable code
It has been more than a few minutes since I posted a Coding Rule, so I thought it was high time I did. There are at least two things I’ve previously written about as important but didn’t elevate to Rules. Both have been bugging me; they should be rules. Today (and a week from now) I’m correcting that oversight.
To be honest, this Rule is so important, I’m not sure why I didn’t make it the third one. Rule #1 and Rule #2 are definitely more important, but the ones currently listed as #3 – #5 are not as important as today’s (they are very important, though).
The Rule is this:
There shall be no literal values in code except for zero, one, or the empty string. Even those should be viewed with grave suspicion.
I wrote about this back in 2016, but by then I’d already posted Rule #3 and Rule #4. Even so, I can’t imagine why I didn’t make this one a Rule. I think it has inestimable value in writing readable, bug-free code (or as bug-free as humanly possible — computer programming is hard).
As expressed above, the Rule has a negative form. It’s a prohibition, a “never” rather than an “always”. We can put it in a positive form, though:
Always define literal values, except for zero, one, and the empty string. And sometimes even those.
Either way, it’s the same (important, useful) approach. The exceptions aside, code should (almost) never have embedded literal values. Such values should always be defined (in whatever way the language provides) as named values in some appropriate central place. (Typically, at the top of the file or — for multi-file projects — in a file of definitions.)
Some languages allow definitions to be immutable (for example, the const keyword in C/C++, or the final keyword in Java). If that option is available, definitely take it with definitions of literals.
And by the way, this is one place where global variables are definitely allowed.
Which, come to think of it, I haven’t written about specifically before and certainly not in the context of The Rules. (I’ll revisit this next week, though.)
When I wrote about this in The Thing About Constants, I used the common example of writing code that dealt with a (virtual) chessboard. I still think it’s a good case study. Chessboards are, if not always, at least routinely 8×8 arrays, so the temptation to sprinkle naked 8s throughout the code is strong. It seems an entirely reasonable assumption.
But as I also wrote then, what happens if (when) you do decide (or the client decides) to change the board size? That’s the main reason to define literal values — there’s always some chance the value will change. If it does, it requires searching through all the code for every occurrence of that value and deciding whether it refers to the literal value that changed or just happens to be another occurrence of that literal value with no connection.
As a simple example, what if — for whatever reason — the board size changes to 8×16, a doubling of the playing space. When searching the code for occurrences of 8, a change depends on whether the number is a row (don’t change) or a column (change to 16).
It’s a two-edged sword that cuts with both edges. Miss an occurrence that has to change or change one that doesn’t — either way, it’s a bug.
The other big reason to define all literal values is readability. Code sprinkled with literal values (either numeric or string or date or whatever) is opaque and harder to understand (unless very well documented but even then; one must remember what all the literal values mean). Using defined values puts meaning — improved semantics and expressibility — into the code. It makes the code more literary.
It should be obvious that numeric values should be defined in one place and given an appropriate name. It might be less obvious that this applies to most strings, too.
To be honest, I’m a lot more relaxed when it comes to literal strings. For one thing, they’re rather self-documenting. A string says what it is, so it’s usually apparent why that string appears at that point in the code.
One reason to consider defining string literals, though, is duplication. When the same string appears over and over (say in an error message), then it might be a good candidate for defining in one place. For the usual reason: if the text of that string changes, the change must be made to each occurrence. Not that hard, necessarily, but still a pain.
Another reason to consider defining literal string values is translation. If all strings are defined in one place, it’s easier to port the code to another language. Here’s a very simple example (in Python) to illustrate what I mean:
002| #
003| StringBundle_English = [
004| ‘Hello’,
005| ‘Goodbye’,
006| ]
007| StringBundle_Spanish = [
008| ‘Hola’,
009| ‘Adiós’,
010| ]
011| StringBundle_German = [
012| ‘Hallo’,
013| ‘Auf Wiedersehen’,
014| ]
015| StringBundle_French = [
016| ‘Bonjour’,
017| ‘Au revoir’,
018| ]
019| strHello = 0
020| strGoodbye = 1
021|
022|
023| # from my_program.py…
024| #
025| def hello_and_goodbye (lang=StringBundle_English):
026| stringSet = lang
027|
028| print(stringSet[strHello])
029| print(stringSet[strGoodbye])
030| print()
031|
032| hello_and_goodbye()
033| hello_and_goodbye(StringBundle_Spanish)
034| hello_and_goodbye(StringBundle_German)
035| hello_and_goodbye(StringBundle_French)
036|
Strings aside, it’s almost always a good idea to define all numerical constants (other than zero or one, and I’ll come back to those). I say “almost always” because there are some occasions that a naked literal numeric value makes sense. The general rule of thumb is the extent to which a number in some sense defines itself. For example, if the best possible definition is…
define two = 2
… in some appropriate language, then perhaps the number does belong in the code. (There is almost never a benefit from such naming — and it can bite if the value changes — but see below for an exception.)
The question to ask is whether there is a more meaningful name the value might have in the context of the code. Is there some word that aptly describes that meaning? If so, then use that word as the name. If the best possible word is just “two”, this suggests the value stands sufficiently on its own and can be used as is. For example, dividing values in half, which almost demands the literal value 2.
Unless there is any chance whatsoever that things later change to dividing things by some other number, like three or four. Then it makes sense to do something like:
define NumberOfParts = 2
Or whatever works in context (and per the language you’re using).
By the way, this post was inspired, in part, by my using the literal hex value 0xff (255 in decimal) in some code recently. I was using it as a bit mask in the context of handling byte values, and it was very clear in that context I was using it to mean “all bits”, so I didn’t feel I needed to define AllBits = 0xff, or whatever.
That said, the decision was based on it only appearing once. Had I used it in multiple spots, I might have defined it for readability and just in case I wanted the code to someday handle 16-bit values. (In general, always try to write the most generic code possible. Not a Rule, but good advice.)
Sometimes defining a literal value is just a matter of convenience. For instance, when I wrote code that used complex number math, I sometimes found it useful to write:
002|
003| two = complex(2, 0)
004| cpi = complex(pi, 0)
005| pi2 = cpi * two
006|
The last two made the complex math more expressive, but the first was just to avoid having to write complex(2,0) over and over in the code. It was a case where there was no chance the value would ever change, and it’s arguably a wash in terms of expressibility. It exists solely as a typing convenience.
Note that defining a value with just its name is normally a bad idea. For instance, in the chessboard example, this is a bad idea:
define EIGHT = 8
And then using EIGHT throughout the code presumably thinking that follows this Rule #6. Which it does but with a complete misunderstanding of what Rule #6 means. For one thing, it will be a world of hurt if that value changes:
define EIGHT = 10
Which is a bit of a mind-bender.
Why Zero, One, and the empty string?
Zero (0), one (1), and the empty string (“”) are excluded because they are almost always sufficiently self-documenting and have applied uses that don’t typically change. Their meanings are well embedded in their appearance, so they don’t need to be dressed in names.
Zero is a common initial value and reference point. In some languages, new numeric variables start with their values set to zero. In others, new variables have random bit patterns and must be explicitly set to some initial value. In such cases, the most common choice is zero. It’s well-understood as meaning “blank slate”.
It’s also a reference point in that code often compares some transient value to see if it’s below zero, exactly zero, or above zero. Such references can have different semantics in the code, but zero is a unique fixed reference point. In some ways, it’s the most mystical number.
One is almost as mystical. Zero is the additive identity, one is the multiplicative identity. It’s also the basis of incrementing and decrementing — such a common operation that many CPUs have specific instructions that add or subtract one from a numeric value.
Also, there are many situations in coding that require adding or subtracting one. A typical example is, assuming zero-based indexing, the index of the last element is N-1, where N is the length of the list. (In reverse, the nth element of a zero-based list is index+1.)
The empty string is excluded because in many languages logical expressions treat it as zero. In such languages, a string’s Boolean value (true or false) depends only on its length. Any non-zero-length string is true; the empty string is false. (Note that the string " " — a single space — is not an empty string — it has a length of one.)
It’s also used in some languages to initialize new string variables (just as zero is used to initialize new numeric variables). It is to strings what zero is to numbers. As such, its appearance in code tends to be self-documenting.
Bottom line, despite it being #6, consider this the third most important Rule. Keep literal values out of the code unless their presence is fully justified and there is no greater abstract meaning beyond their naked value. Otherwise, dress your values!
∅
ATTENTION: The WordPress Reader strips the style information from posts, which can destroy certain important formatting elements. If you’re reading this in the Reader, I highly recommend (and urge) you to [A] stop using the Reader and [B] always read blog posts on their website.
This post is: Rule #6: Always Define Literals
Pingback: Rule #7: Never Repeat Yourself | The Hard-Core Coder