In the last post we looked at Python list comprehensions, an inline alternative to for-next loops. List comprehensions are a powerful technique for creating and processing lists.
In this post, we’ll look at more list comprehensions, including some real-life examples, and look at three other kinds of Python comprehensions: sets, dictionaries, and generators.
Recall that the basic syntax for a list comprehension is:
"[" list-item-expression for-clause [if-clauses] "]"
The list-item-expression is evaluated each iteration to produce a new item for the list. The for-clause is the usual Python for statement, and the optional if-clauses allow filtering of iteration values. The for-clause and if-clauses can be repeated for more complex lists (see previous post).
Here’s a useful example that calculates the Pythagorean distance of a vector:
002|
003| def distance1 (vector):
004| ”’Pythagorean distance.”’
005| squares = [pow(d,2) for d in vector]
006| return sqrt(sum(squares))
007|
008| print(distance1([0,0]))
009| print(distance1([1,0]))
010| print(distance1([1,1]))
011| print(distance1([3,5]))
012| print()
013|
The vector can be anything that looks like a list (a tuple, for instance, or a user-defined type that is iterable). When run, it prints:
0.0 1.0 1.4142135623730951 5.830951894845301
Which is what we’d expect. The list comprehension (line #5) iterates over all the components of the vector and squares each one (somewhat similar to the first example from last time, the one that generated a list of squares). This list of squares gets passed to the sum function, which adds them up and returns the total. That total gets passed to the sqrt function (square root) and the result of that returned as the distance.
Note, while that it can calculate the Pythagorean distance between any two points if we subtract one from the other to get the vector between then, we can also just re-write the function to include that:
002|
003| def distance2 (point1, point2):
004| ”’Pythagorean distance between points.”’
005| vector = [p2–p1 for p1,p2 in zip(point1, point2)]
006| squares = [pow(d,2) for d in vector]
007| return sqrt(sum(squares))
008|
009| print(distance2([0,0], [3,5]))
010| print(distance2([1,2], [4,7]))
011| print(distance2([1,1], [–1,–1]))
012| print(distance2([3,5], [4,6]))
013| print()
014|
This function requires two points and returns the (Pythagorean) distance between them. (It could determine the vector distance by setting one point to the origin, as shown in the first test example.) When run, it prints:
5.830951894845301 5.830951894845301 2.8284271247461903 1.4142135623730951
Note that both versions work with any number of dimensions. The list comprehensions process vectors (lists) of any length. The only requirement is that the two vectors passed to distance2 must have the same length.
Note that distance2 has two list comprehensions. The first (line #5) creates a distance vector from the two input points. The second (line #6) creates a sum of squares, same as line #5 in the distance1 function.
We can do something similar to the distance2 function to calculate the dot product of two vectors:
002| return sum([float(a)*float(b) for a,b in zip(vec_a,vec_b)])
003|
004| print(dot_product([1,0,0], [1,0,0]))
005| print(dot_product([1,0,0], [–1,0,0]))
006| print(dot_product([1,0,0], [0,1,0]))
007| print(dot_product([1,0,0], [0,0,1]))
008| print(dot_product([0.5,0.3,0.4], [1,1,0]))
009| print()
010|
When run, this prints:
1.0 -1.0 0.0 0.0 0.8
Which are the dot products we expect.
As an aside, note that both the distance1 and dot_product functions are simple enough to be implemented as lambda functions:
002|
003| distance = lambda v:sqrt(sum([pow(d,2) for d in v]))
004|
005| dot_prod = lambda va,vb:sum([float(a)*float(b) for a,b in zip(va,vb)])
006|
Because list comprehensions can be nested (as shown last time), the distance2 function could also be implemented as a (bit longer) lambda function, but this exercise is left to the reader.
Suppose we have a string of hex digits (for instance, “01F63A” — the code point, in hex, of ??, the happy cat icon). We want to convert every pair of hex digits to its integer value. The hex string can be any length, the only requirement is that it contain pairs of legal hex characters (case insensitive).
Here’s one way we could implement this:
002| return [int(s[ix:ix+2],base=16) for ix in range(0,len(s),2)]
003|
004| bs = hex2ints(’01F63A’)
005| print(bs)
006|
Which uses the range function to create an index that skips along the even numbers and grabs pairs of hex digits from the string by indexing them.
When run, it prints:
[1, 246, 58]
Which are the byte values we expect. Note that the function just returns an empty list if given an empty string. Invalid hex digits cause a ValueError. Note also that, if given an odd number of hex digits, this treats the final single digit as if it had a leading zero — as a single hex digit (a better algorithm would raise an error but is hard to accomplish in a one-line list comprehension).
Lastly, note that the single line list comprehension means this could be implemented as a lambda function.
Now let’s re-write that first distance function:
002|
003| def distance (vector):
004| ”’Pythagorean distance.”’
005| total = sum(pow(d,2) for d in vector)
006| return sqrt(total)
007|
008| print(distance([0,0]))
009| print(distance([1,0]))
010| print(distance([1,1]))
011|
There are two changes. Firstly, the sum function has been moved up to line #5 from line #6. Now line #5 creates a sum of squares, not a list. The change is reflected in calling the variable total rather than squares.
The bigger change is the square brackets for the list comprehension are missing!
Line #5 uses a generator expression rather than a list comprehension. The syntax is exactly the same, except that a generator expression is enclosed in parentheses rather than square brackets. As a bonus, Python will recognize a generator expression inside the parentheses of a function call — extra parentheses are not required as they are if passing a tuple. This makes code a lot cleaner.
A generator expression, like a generator function, gives up its values iteratively — each time a value is requested by the driving loop — whereas a list comprehension creates the entire list at once. [See Python Generators part 1, part 2, & part 3 for more about Python generator functions.]
Consider the following two examples:
002|
003| cs2 = list(chr(ord(‘A’)+x) for x in range(26) if (x % 2)==0)
004|
005| print(cs1)
006| print(cs2)
007|
The expression, identical on line #1 and line #3, should be a bit familiar from the previous post. It creates a list of every other uppercase alpha character (‘A’, ‘C’, ‘E’…’W’,’Y’). The difference between line #1 and line #3 is that the former is a list comprehension whereas the latter is a generator expression inside the list class call.
That parentheses create a generator expression is more obvious if we do this:
002|
003| print(chars)
004|
Which prints:
<generator object <genexpr> at 0x000001BCFF0522D0>
So, rather than a list object, we have a generator object that we need to use in a loop or other context that iterates over it. Note that generator objects can only be iterated over once.
Here’s another example just to show a compound generator expression:
002|
003| for p in pairs:
004| print(p)
005| print()
006|
When run, it prints:
(0, 0) (0, 1) (0, 2) (0, 3) (1, 0) (1, 1) (1, 2) (1, 3) (2, 0) (2, 1) (2, 2) (2, 3)
Using a generator expression would make sense of the list being generated is very long and you don’t want it all in memory. For instance, if rather than 3×4=12 pairs, we were generating a 256×256×256 3D matrix, that would be over 16 million x,y,z tuples.
002| ‘bear’,‘grass’,‘car’,‘car’, ‘beer’,
003| ‘ice’,‘bear’,‘hot’,‘hot’,‘cold’,
004| ‘dog’,‘cat’,‘lion’,‘tiger’,‘bear’,
005| ‘pet’,‘dog’,‘pet’,‘cat’,‘tiger’,
006| ‘cold’,‘beer’,‘hot’,‘beer’,‘end’
007| ]
008|
009| wordset = {word for word in words}
010|
011| for word in sorted(wordset):
012| print(word)
013| print()
014|
Besides square brackets and parentheses, we can also use comprehensions between curly braces. Just as with literals inside curly braces, we can create a set or a dict object.
The syntax for a set object is identical to the syntax for a list comprehension. The only change is the curly braces rather than square brackets. For instance, here are two ways to create the same set object:
002|
003| nums2 = {n+1 for n in range(5)}
004|
005| print(nums1)
006| print(nums2)
007|
When run, it prints:
{1, 2, 3, 4, 5}
{1, 2, 3, 4, 5}
One use for set objects is finding the unique items in collection containing duplicates. For example:
002| ‘bear’,‘grass’,‘car’,‘car’, ‘beer’,
003| ‘ice’,‘bear’,‘hot’,‘hot’,‘cold’,
004| ‘dog’,‘cat’,‘lion’,‘tiger’,‘bear’,
005| ‘pet’,‘dog’,‘pet’,‘cat’,‘tiger’,
006| ‘cold’,‘beer’,‘hot’,‘beer’,‘end’
007| ]
008|
009| wordset = {word for word in words}
010|
011| for ix,word in enumerate(sorted(wordset)):
012| print(f'{ix+1:2d}: {word}’)
013| print()
014|
When run, it prints:
1: bear 2: beer 3: car 4: cat 5: cold 6: dog 7: end 8: grass 9: hot 10: ice 11: lion 12: pet 13: tiger
Which lists only the unique words from the words list.
We can also create dict objects, but in this case the syntax is slightly different:
"{" key ":" value for-clause [if-clauses] "}"
Here are a pair of examples:
002|
003| ch2cx = {chr(ord(‘a’)+n):(n+1) for n in range(26)}
004|
005| print(cx2ch)
006| print()
007| print(ch2cx)
008| print()
009| print(f’cx(25) = {cx2ch[25]}’)
010| print(f’ch(“J”) = {ch2cx[“j”]}’)
011|
The first creates a dictionary that maps integers (1, 2, 3…) to the lowercase alphabet (‘a’, ‘b’, ‘c’…). The second creates the reverse dictionary — one that maps the characters to their index integers.
When run, it prints:
{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h',
9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o',
16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v',
23: 'w', 24: 'x', 25: 'y', 26: 'z'}
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8,
'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15,
'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22,
'w': 23, 'x': 24, 'y': 25, 'z': 26}
cx(25) = y
ch("J") = 10
The only difference between dictionary comprehensions and all other types is that the list-expression must be a key:value pair.
And that will pretty much do it for Python comprehensions. Go forth and use them!
Link: Zip file containing all code fragments used in this post.
Ø
ATTENTION: The WordPress Reader strips the style information from posts, which can destroy certain important formatting elements. If you’re reading this in the Reader, I highly recommend (and urge) you to [A] stop using the Reader and [B] always read blog posts on their website.
This post is: Simple Python Tricks #3
Pingback: Simple Python Tricks #4 | The Hard-Core Coder
Pingback: Simple Python Tricks #6 | The Hard-Core Coder
Pingback: Simple Python Tricks #8 | The Hard-Core Coder
Pingback: Simple Python Tricks #17 | The Hard-Core Coder