Tags
code clarity, computer code, computer programmers, readable code, space character, tab character, text file
There are many issues that divide programmers: operating systems and editors being two huge ones. I’ve worked on too many platforms to care much about the first one, but I’m a lifelong gvim user.
One of the lesser dividing issues involves the crucial source coding choice: Tabs or Spaces? The issue is both less and more important these days. Less because editors are very capable; more because Python is popular.
The thing about Python is that it uses indentation for block scoping. Python interpreters allow the source code to determine what an indentation level is: one, two, three, four, even eight — however many columns. The interpreter is sensitive to changes in indentation level:
def f1 (x, y): if x < y: print('X<Y') print('x,y: %d, %d' % (x,y)) if y < x: print('X>Y') print('x,y: %d, %d' % (x,y)) if x == y: print('X=Y') print('x,y: %d, %d' % (x,y)) print('--') #
Once a new indentation level is set, so long as other lines that belong to that block have the same indentation level, no problem.
Many other common languages are insensitive to whitespace altogether or, at the least, don’t care about indentation. In those languages indentation is part of code clarity — an indication to the reader about code blocks. (And clarity trumps everything.)
So in Python indentation errors are bugs, but in many other languages they just mess up the way the code looks. Because source code clarity is so important, indentation errors are important.
§
What causes indentation errors? Obviously a big one is the user error of having the wrong number of characters than intended:
if (foo==bar) { int x = bar / 2; int y = foo * 3; myLog.info('foo is bar!'); foo += x; bar += y; myLog.debug('exit function'); return foo*bar; }
In Python, the equivalent is a bug that won’t compile. It says “unindent does not match any outer indentation level” for line #3, and — after fixing that — “unexpected indent” for line #7. There is also that line #5 makes Python think the if-block is over and the source has returned to the zero indentation level — a serious bug that might not always trigger an interpreter error.
In other languages this indentation is just confusing. It makes the code harder to read. Unclear! Is it supposed to mean something, or is it just sloppy typing?
§
Or is it the result of mixing tabs and spaces? It’s this possibility that has long informed my attitude about those characters. Which is this: Use only spaces in source code, always, but set your editor to insert the right number of spaces when you push the [Tab] key.
In other words: Always use spaces for indenting, but don’t actually use the [spacebar] to insert them. Editors have been capable of inserting the correct number of spaces since at least the 1990s if not earlier.
The problem is that spaces and tabs are invisible in the text, so it’s often hard to see which character is used for indenting. (Some text editors can make them visible in some fashion, often with light-colored dots. That’s nice for checking indentation, but it makes the source code look busy and cluttered.)
What often happened was that you’d load someone else’s source code, or print it, and it would contain a mix of tabs and spaces. Maybe something like this:
public int myFooBarFunction (float x, float y) { if (x < y ) { myX = x; myY = y } }
Their system might be set for tabbing every eight spaces, and so the mix would look fine for them. The problem is they used only tabs on lines #2, #3, and #5, but line #4, which should be two tabs like line #3, is actually one tab and eight spaces.
If line #4 had two tabs, it would line up with line #3. The problem shows up with any tabbing setting other than eight because the occasional use of eight spaces instead of a single tab is invisible only at that setting.
Imagine what happens if someone ignores the ragging indenting (because Java don’t care) and then creates a mix of tabs and four spaces. Now there’s an indentation issue that’ll show up in any tabbing setting.
The easiest way to avoid the issue is to always use spaces (but not the spacebar). That way the look always matches the reality.
§
With all that in mind, watch this video:
I’ll say now that, for my money, Professor Brailsford (the old guy at near the end) has the best response. He also tenders the canonical computer science answer: “It depends…”
I was a little surprised to find such agreement on using tabs in the actual source code. A reason often cited involved using the spacebar, which I agree is silly. But editors allow the [Tab] key to, firstly, insert either tabs or spaces, and secondly, a tabbing setting for indent levels.
I’ve never used any editor that was stupid about how many spaces to insert to get to the next indent level, so “amount of typing” is no reason to use tabs.
§
The video does offer two advantages small advantages to tabs and one big one.
Using tabs instead of spaces does lead to smaller file sizes, but we’re talking text files that rarely grow above a few dozen kilobytes. And if source code file size is really a problem (say in long-term storage), then just ZIP the stuff. Text files compress nicely.
Using tabs also allows using the tabbing setting to change the indent as desired. If programmers are sharing code, and one really loves two-space indents, another insists on four-space indents, and a third likes wide eight-space indents, all three can be happy. So long as they all use the [Tab] key to insert tabs, it works great.
The problem is what happens when someone else edits and inserts spaces. Then everyone is unhappy.
I’m not sure how much that matters, but in some environments it might. (It never did in mine; I usually worked solo. I certainly do in retirement.) Modern editors are often capable of re-indenting as desired, so there are tools that would allow a spaces-only policy.
§
The reason that makes the most sense is semantics. An indent level is a singular thing, so representing it with a single tab character makes the most semantic sense. From a typesetting point of view, or a text content point of view, tabs would be the best.
The issue, though, is their invisibility and the tendency over time of source files to gain equally invisible space-indenting.
There is also that tab-only files are pretty ugly on simplistic text displays that either ignore control characters or replace them with a single space (or character). Then all the indenting turns to zero (if control chars are dropped) or one space.
§
So my bottom line and my advice is to use the [Tab] key, of course, but don’t use tabs in source code.
Ø
I know it’s been a while. I keep meaning to post more here, but somehow I never get around to it. That video has been sitting in my queue for this post for a very long time.