The first article (long, long ago) about naming things only scratched the surface. Even with company and language guidelines (or other rules) to help, with so many things to name it’s easy to lose control. Even now, after 44 years of writing code, I still sometimes find myself staring at the screen trying to think of what to name some object.
It’s understandable; there are a lot of things to think about when it comes to a name.
Scope: Is this a local name only appearing in a few lines of code? Or is it a global name that might appear anywhere? There is a lot of ground between those two, and how much scope a name has affects its importance. The greater the scope, the more intention you should give it. The more visible names have a way of becoming permanent in a project.
That permanence is a big consideration. The names we pick don’t exactly become carved in stone, but they do become a pain to change. (There are code environments that can change a name in multiple files, but big changes like that always risk breaking the code. One risk is that, in some other file, the new name turns out to clash with one that already exists.)
As a general rule, name length limits are a thing of the past (long ago, some systems only allowed six characters!), so global names should be as descriptive as possible. It means they’ll be on the long side, but that’s a very small price to pay these days.
The smaller the scope, the more freedom to be pithy or obscure. If a variable is only used for three lines, it barely matters what it’s called. (Barely doesn’t mean not at all, though. Always have some logic behind your names.)
Obvious or common names get a little more scope than random ones. One of my common Python patterns looks like this:
002| fp = open(fn, mode=‘r’, encoding=‘utf-8’)
003| try:
004| data = fp.read()
005| except:
006| raise
007| finally:
008| fp.close()
009| print(‘loaded %d chars from: %s’ % (len(data),fn))
010|
I use the fn
(“file name”) and fp
(“file pointer”) names so often, so consistently, and only in this context, that they’re canonical. The exception is when I open multiple files and need multiple names. This can be especially important for the filename variable because it often gets referenced in multiple places in the function.
I have some file classes (textfile
, tabfile
, csvfile
, etc), and there the pattern is (using fo
for “file object”):
002| fo = datafile(fn)
003| fo.load()
004| print(‘loaded %d chars from: %s’ % (len(fn),fn))
005|
The point is to have a convention or protocol for naming. For one thing, it saves time when writing and you have to think up a name. More importantly, it communicates extra meaning when reading your code — very helpful if it’s been a few months (or years).
A similar example is index variables. I use ix
(and jx
and kx
in nested loops) as index variables:
002| for jx in range(12):
003| # …
004| # … do stuff on (ix,jx)…
005| # …
If I’m iterating over rows and columns, I might use rx
and cx
. I picked up the idea from a programmer who always named index variables in the form: ii
, jj
, kk
(rr
, cc
). He felt single-letter names were hard to see and search for.
The editor I use (gvim) has no problem searching for single chars, but I felt his point about single-letter variables being hard to see was a good one. I didn’t care for his double-letter convention, so I use the *x
convention. (There is also that the trailing ‘x’ indicates an index variable.)
§
Class methods and variables, and the fields of user-defined records, have their own kind of scope. The parent class or type creates a separate namespace, so there is no clash between methods or fields sharing the same names. But these names tend to be highly visible — used often in the code. They’re often more visible than the class or record type names, which generally are only used to create new instances.
The point is that method and field names likewise have a way of becoming permanent, so take some care in designing their names, too. A protocol or convention is just as important here.
§
Another distinction is passed parameter name versus local variable name. I know programmers who preface passed parameter names with an underscore (I’ve also seen it placed at the end):
002| x,y,z = int(_x), int(_y), int(_z)
003| # …
004| # … do stuff …
005| # …
Part of the intent is differentiating formal parameters from local variables. It’s often considered Bad Design to change input variables — it can lead to side effects. That doesn’t change that a common pattern is passing some object or list to a function that populates it. (As always, the purest hardline approach is for theorists; write code that’s clear and makes sense. Sometimes that means bending the rules.)
Local variables, per the scope rule, can be short and terse. They’re almost always all lowercase. Some favor underbars for clarity; some don’t. I’d call that dealer’s choice. I’m somewhere in the middle, myself.
Formal parameters have the distinction that they’re often visible outside the function. In some languages they’re visible as keywords:
But, in all languages, they’re visible in documentation. Along with the function name itself, the formal parameters are what the outside world knows about the function. They describe what data the function expects. They’re very important, is my point, so make the effort to name them well.
§
Meta-Type: Many programmers use naming conventions to indicate the meta-type of an object. By meta-type I mean the general category of object, such as function, class, method, global data, etc.
A common example is using a capital “I” to begin an interface name (IUnknown
, IDataSource
, ILogFile
, and so forth). I’ve also seen cases where all classes are named with an initial capital “C” (CPoint
, CDataSource
, and so forth).
In object-oriented languages, for user-defined classes it’s common to use embedded capitalization (Camel case) including a leading capital, such as: DataSource
, LogFile
. This basically just follows the basic rule that capital letters apply to names with more global scope.
In fact, some programmers use all caps for global constants (e.g. MAX_ROWS
), but this is looking increasingly “60s mainframe” to my eyes, and it’s a habit I’m trying to break, although I do use it in some cases. I find it helpful to emphasize the global nature. Alternately I’ll use camel-case, as in FileName
or BasePath
.
§
Data Type: Some programmers use naming conventions to indicate the actual data type of an object. A common example is (what is actually a misuse of) Hungarian notation. I’m not a fan.
[If used, Hungarian notation should indicate the intent of a variable (a counter or pointer or list). Failing that, it might indicate its meta-type, but it should never indicate its physical type. That’s a gotcha trap triggered when you inevitably change the physical type (say during an upgrade) but retain the purpose and intent.]
As a general way to communicate intent or meta-type, I think it can be useful, and I’ve already used some examples. The trailing ‘x’ in index variables is one example. I often use a trailing ‘s’ to indicate a list:
002| terms = [p*log2(p) for p in probs]
For short-scop single-letter names, a list of x
values would get the name xs
. (This, I believe, is a Haskell convention, but I use it in Python all the time.)
§ §
Bottom line, you should use intention and convention when naming code objects. A good test is if you can predict what names you used in some file you’ve just opened.
∅
The WordPress Reader is a broken piece of junk that ruins the formatting of posts. If you’re reading this in the Reader, I highly recommend and urge you to [A] stop using the Reader and [B] always read blog posts on their website.
This post is Naming Things (redux)