Tags

, ,

Considering that it has been six months since the last post, my attempt to get a post flow going with “Simple Python Tricks” left something to be desired. Like success. Undaunted, I’m drawing from the well again.

This time to discuss Python comprehensions, a feature that is powerful and expressive, but which is rather unique to Python. Newcomers may not, at first, even realize they exist.

Three tasks that come up a lot in software design are:

  1. Generating lists.
  2. Transforming lists.
  3. Reducing lists.

The sequential nature of lists means that loops, usually for-next loops, are the weapon of choice for dealing with them. It’s often called “looping over a list.”

As a simple example, imagine we want an ordered list of the first 50 squares — numbers that are the squares of the positive integers. A common generic pattern to create such a list looks like:

NEW Squares[50] -- declare space for 50 squared numbers

FOR (n=1 TO 50) {
    Squares[n] = n*n
}

PRINT Squares

Which is so generic we could implement it in almost any language. In Python it might look like this:

001| Squares = []
002| 
003| for n in range(1,50):
004|     Squares.append(n*n)
005| 
006| print(Squares)
007| 

The range function (line #3) supplies the numbers 1 through 50. We calculate the square and append it to the Squares list. When run, it prints:

[1, ​4, ​9, ​16, ​25, ..., ​2025, ​2116, ​2209, ​2304, ​2401]

Which, of course, are the first 50 squares.

In Python, we can use a list comprehension to do exactly the same thing:

001| Squares = [n*n for n in range(1,50)]
002| 
003| print(Squares)
004| 

Which is cleaner, more expressive (once accustomed to the idiom), doesn’t require the initial declaration of an empty list, and doesn’t build the list item by item.

Here’s another simple example. Suppose we want a list of uppercase letters. We could just assign the 26 literals to a variable:

001| UpperCaseChars = [
002|     ‘A’,‘B’,‘C’,‘D’,‘E’,‘F’,‘G’,‘H’,‘I’,‘J’,‘K’,‘L’,‘M’,
003|     ‘N’,‘O’,‘P’,‘Q’,‘R’,‘S’,‘T’,‘U’,‘V’,‘W’,‘X’,‘Y’,‘Z’,
004| ]
005| 
006| print(UpperCaseChars)
007| 

Which is a bit verbose (and requires we type them all and correctly). Or we could use a list comprehension:

001| UpperCaseChars = [chr(ord(‘A’)+cx) for cx in range(26)]
002| 
003| print(UpperCaseChars)
004| 

Which does exactly the same thing and is guaranteed to be complete and correct.

The expression, chr(ord('A')+cx), uses the ordinal value of the letter ‘A’ as a base, adds variable cx to it to obtain the ordinal of the next letter in sequence, and turns that ordinal into a character.

§

In its simplest form (one that suffices in a lot of cases), a list comprehension looks like this:

"[" list-item-expression for-clause [if-clauses] "]"

The list-item-expression is an expression that is evaluated each time through the loop to generate values for the list. It almost always incorporates the variable from the for-clause (but doesn’t have to).

The for-clause has the same syntax as the for statement (for variable-name in iterable). Note there is no trailing colon in a list comprehension as there is in the for statement. A comprehension is logically a single source code line with no block attached to it (but we sometimes write them across multiple lines for clarity).

The optional if-clauses are zero or more test clauses that, if false, filter out values from the list. If there are no if-clauses, all values generated are added to the list. Here’s an example showing one way to create a list of all the alpha-numeric characters with code points between zero and 255, inclusive:

001| alnums = [chr(cx) for cx in range(256) if chr(cx).isalnum()]
002| 
003| print(alnums)
004| 

Multiple if-clauses act like AND operations, all must be true. Suppose we wanted a list of numbers between 0 and 100 (inclusive) that were divisible by both 3 and 5:

001| nums = [n for n in range(100) if (n%3)==0 if (n%5)==0]
002| 
003| print(nums)
004| 

In this particular case, we could just use a single test for divisible by 15 to get the same list but suppose the two numbers we’re testing come at run time (so we don’t know what they are here). Then having two if-clauses makes sense.

Note that any if can have multiple expressions combined with any and/or/xor logic needed, so the multiple if-clauses feature is a bit redundant but could be useful in some contexts.

List comprehensions have an extended form with one or more additional for-clause if-clauses pairs following the first (required) pair:

"[" list-item-expression for-clause [if-clauses] [for-clause [if-clauses]]* "]"

Suppose we wanted a list of all possible pairs of the numbers 0 through 3. One way to create such a list is:

001| base = 4
002| pairs = [(x,y) for x in range(base) for y in range(base)]
003| 
004| print(pairs)
005| 

The base variable parameterizes this to allow for other ranges of numbers. (Pro tip: when a number appears more than once in a function, consider parameterizing the function. You’ll thank yourself later. And never use literals in your code.)

This list comprehension is exactly the same as these nested for loops:

The list comprehension above generates pairs of numbers. Suppose we wanted tuples with five numbers? We just add more for clauses, and this provides an opportunity to illustrate how we can break long list comprehensions over multiple lines:

001| base = 2
002| fives = [
003|     (x,y,z,w,u)
004|     for x in range(base)
005|         for y in range(base)
006|             for z in range(base)
007|                 for w in range(base)
008|                     for u in range(base)
009| ]
010| 
011| print(fives)
012| 

I set the base to 2 this time to create a list of all possible five-bit values from (0,0,0,0,0) to (1,1,1,1,1).

§

In the examples so far, because there is just one set of square brackets, the result has been a single (one-dimensional) list. Makes sense. Single pair of brackets, single list.

But suppose, rather than pairs of numbers, we wanted a matrix of numbers? Suppose we wanted to implement a chessboard? The blunt object way is:

001| board = [
002|     [1, 0, 0, 0, 0, 0, 0, 0],
003|     [0, 0, 0, 0, 0, 0, 0, 0],
004|     [0, 0, 0, 0, 0, 0, 0, 0],
005|     [0, 0, 0, 0, 0, 0, 0, 0],
006|     [0, 0, 0, 0, 0, 0, 0, 0],
007|     [0, 0, 0, 0, 0, 0, 0, 0],
008|     [0, 0, 0, 0, 0, 0, 0, 0],
009|     [0, 0, 0, 0, 0, 0, 0, 1],
010| ]
011| 
012| for row in board:
013|     print(row)
014| 

Which would really only be worthwhile if we wanted to individually initialize every cell of the matrix with a different (hand-coded!) value. For example, we might be setting up the 32 chess pieces in their starting positions. Here we’re just setting everything to zero except for the upper-left and lower-right corners, which we’re setting to one.

One shortcut we could take is to let Python generate all the cells in a row:

001| board = [
002|     [0]*8,
003|     [0]*8,
004|     [0]*8,
005|     [0]*8,
006|     [0]*8,
007|     [0]*8,
008|     [0]*8,
009|     [0]*8,
010| ]
011| 
012| board[0][0] = 1
013| board[7][7] = 1
014| 
015| for row in board:
016|     print(row)
017| 

Which, in lines #1-#10, gives us eight rows of eight zeros, requiring that we explicitly set the two opposing corners (lines #12 and #13).

IMPORTANT: Note that we cannot do this:

001| board = [[0]*8]*8
002| 
003| board[0][0] = 1
004| board[7][7] = 1
005| 
006| for row in board:
007|     print(row)
008| 

This does give us an 8×8 matrix, but not a useful one because the same row is repeated eight times. That’s because multiplying a list element, say [x]*8, creates a list with eight repeats of the list item: [x,x,x,x,x,x,x,x]. Whatever is in the list we multiply, that’s what gets multiplied. For example, [a,b]*4 = [a,b,a,b,a,b,a,b].

So, while [0]*8 gives us a list of eight zeros, [[0]*8]*8 gives us a list of eight (of the same!) list of eight zeros. Which means that changing a column on any row changes that column on all rows. Not useful as a game board.

The first two examples both print:

[1, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1]

But the third one prints:

[1, 0, 0, 0, 0, 0, 0, 1]
[1, 0, 0, 0, 0, 0, 0, 1]
[1, 0, 0, 0, 0, 0, 0, 1]
[1, 0, 0, 0, 0, 0, 0, 1]
[1, 0, 0, 0, 0, 0, 0, 1]
[1, 0, 0, 0, 0, 0, 0, 1]
[1, 0, 0, 0, 0, 0, 0, 1]
[1, 0, 0, 0, 0, 0, 0, 1]

Because all eight rows are the same list object.

Here’s a list comprehension way to do the same thing (correctly) is:

001| board = [[0 for _ in range(8)] for _ in range(8)]
002| 
003| board[0][0] = 1
004| board[7][7] = 1
005| 
006| for row in board:
007|     print(row)
008| 

This is actually two list comprehensions, one nested inside the other. The list-expression is just zero to set each board cell to zero. Since the row and column numbers aren’t a factor, the two for clauses can use the generic variable (the underbar) in both cases.

As we did above, we can use list multiplication to shorten it to this:

001| board = [[0]*8 for _ in range(8)]
002| 
003| board[0][0] = 1
004| board[7][7] = 1
005| 
006| for row in board:
007|     print(row)
008| 

But we still need to make sure we generate eight different rows (which this does).

When run, both examples above print:

[1, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1]

§

If we want to initialize the board with values for more than just two corners, the list-expression can be a function taking the row and column values and returning a value to initialize the cell with. Here’s a very simple example that does the same thing as we’ve done above with explicit assignment statements (i.e. set two opposing corners to one):

001| cell = lambda r,c: 1 if (r,c) in [(0,0),(7,7)] else 0
002| 
003| board = [[cell(r,c) for c in range(8)] for r in range(8)]
004| 
005| for row in board:
006|     print(row)
007| 

A more realistic example might have a function that returned the sixteen chess pieces for the appropriate starting squares (and returned zero or None for other squares). That would mean we need useable variable names in the for clauses.

Speaking of which, note that the inner comprehension generates the columns, and the outer comprehension generates the rows. This is somewhat a matter of preference but has also to do with preferred order in indexing the matrix (as well as how we visualize it). In English, we usually say, “row and column” (in that order), so we generally write functions that take a row first and a column second. So, the natural way to index the matrix is board[row][column].

Python doesn’t have a native matrix type or multi-dimensional arrays, but it does have lists of lists (of lists and so forth) for as many dimensions as needed. I’ll show you one with four dimensions below. A “two-dimensional” data object such as board is really a list of eight rows, each of which has eight column “cells”. Or a list of eight columns, each of which has eight row “cells”, depending on which we generate on the inner and outer loops.

When referencing a list of lists, for example board[a][b], the indexes work, left-to-right, from outer lists to inner lists, so row-column indexing requires row lists containing columns. (The first choice in the previous paragraph.) It also means the way we visualize the data matches the board. The row numbers run vertically down the page, and the columns numbers run horizontally across it.

This is a digression from comprehensions but worth taking because the logic changes when indexing with X (horizonal, so a column) and Y (vertical, so a row). The problem is that, in English, we usually say “X and Y” (in that order), so the code usually follows that order. But that means that matrix[x][y] equates x with row and y with col, the opposite of what we need. At the least, we need to think differently about printing such a board in normal row-by-row fashion:

001| matrix = [[(x,y) for y in range(3)] for x in range(4)]
002| 
003| matrix[0][0] = (8, 8)
004| matrix[3][2] = (9, 9)
005| 
006| # Output with x=row, y=col…
007| for iy,ys in enumerate(matrix):
008|     print(f'{iy+1}: {ys}’)
009| print()
010| 
011| # Output with x=col, y=row…
012| for iy in range(3):
013|     row = [matrix[ix][iy] for ix in range(4)]
014|     print(f'{iy+1}: {row}’)
015| print()
016| 

When run, this prints:

1: [(8, 8), (0, 1), (0, 2)]
2: [(1, 0), (1, 1), (1, 2)]
3: [(2, 0), (2, 1), (2, 2)]
4: [(3, 0), (3, 1), (9, 9)]

1: [(8, 8), (1, 0), (2, 0), (3, 0)]
2: [(0, 1), (1, 1), (2, 1), (3, 1)]
3: [(0, 2), (1, 2), (2, 2), (9, 9)]

In the first listing, the X coordinate increases vertically (downwards). In the second listing, it increases horizontally (rightwards). When working with graphic systems, it’s often helpful to visualize X as horizontal and Y as vertical (the second case).

And we want to index the data as matrix[x][y], but this means keeping in mind that x=row and y=col when we think about data presentation. It all depends on whether we’re thinking in terms of rows and columns or X and Y coordinates.

Here’s an example that creates a four-dimensional 3×3×3×3 matrix:

001| size = 3
002| 
003| board = [[[[ (x,y,z,w)
004|     for x in range(size)]
005|         for y in range(size)]
006|             for z in range(size)]
007|                 for w in range(size)]
008| 
009| def print_board ():
010|     for wx,ws in enumerate(board):
011|         print(f’W: {wx}’)
012|         for zx,zs in enumerate(ws):
013|             print(f’Z: {zx}’)
014|             for yx,ys in enumerate(zs):
015|                 print(f’y: {yx} {ys!s}’)
016|             print()
017|         print()
018|     print()
019| 
020| print_board()
021| 

When run, it prints:

W: 0
Z: 0
y: 0 [(0, 0, 0, 0), (1, 0, 0, 0), (2, 0, 0, 0)]
y: 1 [(0, 1, 0, 0), (1, 1, 0, 0), (2, 1, 0, 0)]
y: 2 [(0, 2, 0, 0), (1, 2, 0, 0), (2, 2, 0, 0)]

Z: 1
y: 0 [(0, 0, 1, 0), (1, 0, 1, 0), (2, 0, 1, 0)]
y: 1 [(0, 1, 1, 0), (1, 1, 1, 0), (2, 1, 1, 0)]
y: 2 [(0, 2, 1, 0), (1, 2, 1, 0), (2, 2, 1, 0)]

Z: 2
y: 0 [(0, 0, 2, 0), (1, 0, 2, 0), (2, 0, 2, 0)]
y: 1 [(0, 1, 2, 0), (1, 1, 2, 0), (2, 1, 2, 0)]
y: 2 [(0, 2, 2, 0), (1, 2, 2, 0), (2, 2, 2, 0)]


W: 1
Z: 0
y: 0 [(0, 0, 0, 1), (1, 0, 0, 1), (2, 0, 0, 1)]
y: 1 [(0, 1, 0, 1), (1, 1, 0, 1), (2, 1, 0, 1)]
y: 2 [(0, 2, 0, 1), (1, 2, 0, 1), (2, 2, 0, 1)]

Z: 1
y: 0 [(0, 0, 1, 1), (1, 0, 1, 1), (2, 0, 1, 1)]
y: 1 [(0, 1, 1, 1), (1, 1, 1, 1), (2, 1, 1, 1)]
y: 2 [(0, 2, 1, 1), (1, 2, 1, 1), (2, 2, 1, 1)]

Z: 2
y: 0 [(0, 0, 2, 1), (1, 0, 2, 1), (2, 0, 2, 1)]
y: 1 [(0, 1, 2, 1), (1, 1, 2, 1), (2, 1, 2, 1)]
y: 2 [(0, 2, 2, 1), (1, 2, 2, 1), (2, 2, 2, 1)]


W: 2
Z: 0
y: 0 [(0, 0, 0, 2), (1, 0, 0, 2), (2, 0, 0, 2)]
y: 1 [(0, 1, 0, 2), (1, 1, 0, 2), (2, 1, 0, 2)]
y: 2 [(0, 2, 0, 2), (1, 2, 0, 2), (2, 2, 0, 2)]

Z: 1
y: 0 [(0, 0, 1, 2), (1, 0, 1, 2), (2, 0, 1, 2)]
y: 1 [(0, 1, 1, 2), (1, 1, 1, 2), (2, 1, 1, 2)]
y: 2 [(0, 2, 1, 2), (1, 2, 1, 2), (2, 2, 1, 2)]

Z: 2
y: 0 [(0, 0, 2, 2), (1, 0, 2, 2), (2, 0, 2, 2)]
y: 1 [(0, 1, 2, 2), (1, 1, 2, 2), (2, 1, 2, 2)]
y: 2 [(0, 2, 2, 2), (1, 2, 2, 2), (2, 2, 2, 2)]

(The dimensions are typically labeled: X, Y, Z, and W.)

§

That’s enough for this time. Next time, we’ll explore some useful applications for list comprehensions and explore three other types of comprehensions that Python supports natively.


Link: Zip file containing all code fragments used in this post.