In the last two posts I’ve explored Python generator functions. The first post went over the basics and showed how they return an iterable object usable in for-loops and list constructors. The second post explored how generator functions work in more detail.
This time I’ll wrap it up with some examples that are a bit more involved. If Python generators were a mystery before these three posts, I hope you feel more comfortable with them after!
To begin, one observation we might make about the examples so far is that they don’t return anything. They’re list-building helpers, but not list builders themselves. They return values one-by-one. For a running sequence — such as a Collatz sequence — that’s exactly what we’d want.
Things got more complicated when we wanted a list of Collatz sequences. An implementation using yield from
doled out numbers one-by-one, which is the expected behavior, but that’s not helpful in building the separate sequences. The client had to recognize those sequences and put them into lists. The alternate was a generator that returned entire lists, but that lost the number-by-number interaction.
If our design goal insists on a generator that returns each number of the sequence and also that builds lists, we’re going to need a new tactic. We need to plant a seed — something the generator can build on. When it’s done, the seed will have grown into our list of Collatz sequences.
Let’s start with a simple example:
def gen_collatz_2 (n): '''Collatz sequence generator function.''' buf = [] while 1 <= n: buf.append(n) yield n n = Collatz(n) return buf g = gen_collatz_2(5) print(next(g)) print(next(g)) print(next(g)) print(next(g)) print(next(g)) print(next(g)) print(next(g)) #
Note that the generator builds a list that it returns. When the code is run it prints:
5
16
8
4
2
1
Traceback (most recent call last):
File "generators.py", line 439, in <module>
print(next(g))
StopIteration: [5, 16, 8, 4, 2, 1]
The list the generator built is returned in the StopIteration
exception. One option we have is to retrieve it from the value property of the exception object:
g = gen_collatz_2(5) while True: try: print(next(g)) except StopIteration as e: cs = e.value print(cs) #
When run this prints:
5 16 8 4 2 1 [5, 16, 8, 4, 2, 1, 0]
The list is now in the cs variable for us to use. We can avoid the issue by passing a list into the generator:
def gen_collatz_3 (cs): '''Collatz sequence generator. (e.g. cs=[11])''' while 1 < cs[-1]: yield cs[-1] n = Collatz(cs[-1]) cs.append(n) yield cs[-1] cs = [13] for n in gen_collatz_3(cs): if n == 16: break print(cs) #
This populates the list we pass based on its initial value, and it iterates over the sequence. Note that we don’t print the iteration, but we do use it to exit early if we hit a number we know ends the sequence.
We can also remember that the yield from
expression returns whatever the generator it calls returns. This will be a little tricky because any function we use yield from
in becomes a generator (which doesn’t return anything like a normal function does).
def collatz_seq (n): '''Generate and return Collatz sequence for n.''' def inner (arr): arr[0:0] = yield from gen_collatz_2(n) cs = [] for n in inner(cs): print(n) return cs cs = collatz_seq(5) print(cs) #
The inner function is a generator with a yield from
that catches the list returned from the gen_collatz_2 generator. It can’t return that list, so it stashes it in an empty array we pass into the inner function.
The collatz_seq function uses the inner
generator to iterate through, and print, the sequence. Thus we get our hands on each number in the sequence as it’s generated. Ultimately the function returns the sequence as a list.
§
These examples serve to illustrate how generators work, but they may raise the question of their value. I think it’s clear they’re good for generating time-consuming items on the fly (database requests, for instance), or for generating an infinite list. But why act as an iterable and as a function that returns something (possibly something completely different than what it iterates)?
Another question at this point may involve the utility of yield from
. Why would we wrap a generator?
The next examples should help start to answer those questions. So far the examples have been simple to make the mechanics of yield
and yield from
as clear as possible. But simple examples aren’t good at showing the practical reasons for doing something. For that we need a bigger example. So let’s change the assignment to something more realistic.
Rather than generating Collatz sequences, we’d like to know the lengths of sequences, so part of our output should be a list of numbers and the lengths of the sequences they generate. We’re also curious about the frequency of the numbers generated. In how many sequences does the number five appear? That requires seeing each number, but we don’t care about the actual sequences. To save memory, we’d just as soon discard them as they’re created and analyzed.
Here’s one way to do it:
def gen_collatz (n): '''Collatz sequence generator. (Returns a list!)''' buf = [] while 1 <= n: buf.append(n) yield (len(buf), n) n = Collatz(n) return buf def collatz_stats (accum, start=1, end=100): '''Collatz length stats wrapper.''' for n in range(start, end+1): cs = yield from gen_collatz(n) accum.append((n,len(cs))) def demo_collatz_stats (start=1, end=100): fdata = {} stats = [] for ix,n in collatz_stats(stats,start,end): if n not in fdata: fdata[n] = 0 fdata[n] += 1 freqs = sorted(fdata.items(), key=lambda x:x[0]) return (stats, freqs) #
This returns a pair of lists, the first with the lengths of the sequences, the second with the frequency counts for the numbers encountered in those sequences. The caller can use those lists to determine the longest sequence or the largest number in any sequence or whatever.
Note that we pass an empty list to the collatz_stats function. This is similar to what we did with gen_collatz_3 above. It’s the easiest way to have a generator build a list.
We could also have the collatz_stats function do the frequency stats by scanning the list it gets back from gen_collatz (we’d pass in a dict
, of course). That would mean rescanning that list, something we get for free in the for-loop. This design also separates the concerns: one function generates a Collatz function; one makes a list of sequence lengths; and one counts node frequencies. Modularity! Code isolation! Smell of fresh coffee!
§
Here’s a different sort of example that may help illustrate the possible value of a generator. Let’s say we have an application where we’ll be iterating a lot over three-dimensional matrices. That means we’ll be doing some form of this a lot:
for level in range(n_levels): for row in range(n_rows): for col in range(n_cols): matrix.do_something(level, row, col) #
There are a number of ways we might make our live easier. We could use a visitor pattern and pass it callback functions:
def visit_matrix (n_levels, n_rows, n_cols, cb_func): '''Visit every cell of matrix. Invoke cb_func.''' for level in range(n_levels): for row in range(n_rows): for col in range(n_cols): cb_func(level, row, col) def matrix_do_something (level, row, col): '''Callback function.''' matrix.do_something(level, row, col) # Do something to all matrix cells... visit_matrix(LEVELS,ROWS,COLS, matrix_do_something) #
Which is a fine way to go about it (as long as we’re allowed to pass functions). In Python we can make a generator that iterates over the matrix:
def gen_matrix (n_levels, n_rows, n_cols): for level in range(n_levels): for row in range(n_rows): for col in range(n_cols): yield (level, row, col) # Do something to all matrix cells... for level,row,col in gen_matrix(LEVELS,ROWS,COLS): matrix.do_something(level, row, col) #
That would be the sensible way to do it. The point is that we get all three coordinates at once. We don’t need the triple-level nested for-loops.
[Note: In these examples we’re unpacking the tuple returned by the generator into three variables. In other code it might make sense to just catch the tuple.]
To illustrate the yield from
statement, we can also do it like this:
def gen_columns (curr_level, curr_row, n_cols): '''Generate column index.''' for cx in range(n_cols): yield (curr_level, curr_row, cx) def gen_rows (curr_level, n_rows, n_cols): '''Generate row index.''' for rx in range(n_rows): yield from gen_columns(curr_level, rx, n_cols) def gen_levels (n_levels, n_rows, n_cols): '''Generate level index.''' for zx in range(n_levels): yield from gen_rows(zx, n_rows, n_cols) # Do something to all matrix cells... for level,row,col in gen_levels(LEVELS,ROWS,COLS): matrix.do_something(level, row, col) #
Which might have some advantages, I suppose. (It mostly just shows off the yield from
statement.) It does separate the concerns nicely. If you were doing other processing as you began each iteration through each for-loop, breaking that into levels might make sense.
There is also that this modular design allows adding dimensions fairly easily. I started with an example using only columns and rows, but added the levels dimension by just inserting a new generator into the chain of calls.
§
Here’s one last example that uses all the features of generators I’ve covered in these three posts. The design goal is to generate a paragraph of random text in reasonable words and sentences. It should look like a paragraph of text but be complete gibberish. (See So Random! for a more elaborate random text generator.)
We want to create output that looks like this (different every time we run it, of course):
Vsiuagfafv qlasr 10163 arnl dkzzlupcdwz vxerhpo. Ib px 21781874 dkyygkdhigj mmdj yyiqyeo vfonllhb xlcftl cflu onfjzlwswty? Wuqgj mvdcnu 4884781 rrtycqmsupd wnsnxhk alcoezk toycs ysh. Eukruqtqiid qgzccmxjr 4466 tnagt hpxuokkasj ryxkvijgdrp! Suoporxcgm ihelr 6062246632 ud xbkkrqmpxl?
And we want to implement it using the features of Python generators (and we want a design that’s easy to modify and extend). Since we’re using a generator, we’re going to iterate over every character generated. We’ll take that opportunity to optionally modify that character before it becomes part of the output.
Here are the generators:
Letter = ord('a') Digit = ord('0') def gen_word (px, sx, wmin, wmax): '''Word generator.''' w_len = random.randint(wmin, wmax) # length of word buf = [] for wx in range(w_len): ch = chr(Letter+randint(0,25)) # Yield indexes, word length, and current char... signal = yield (px, sx, wx, w_len-1, ch) if signal is not None: buf.append(str(signal)) yield len(buf) # Value ignored by .send() continue # No signal; just append the character... buf.append(ch) # Return the word... return ''.join(buf) def gen_sentence (px, smin, smax, wmin, wmax): '''Sentence generator.''' s_len = random.randint(smin, smax) # how many words buf = [] for sx in range(s_len): word = yield from gen_word(px,sx,wmin,wmax) buf.append(word) # Return the sentence... return ' '.join(buf) def gen_paragraph (para, pmin,pmax, smin,smax, wmin,wmax): '''Paragraph generator.''' punct = ['.', '?', '!'] pbias = [75, 15, 10] p_len = random.randint(pmin, pmax) # how many sentences for px in range(p_len): sent = yield from gen_sentence(px,smin,smax,wmin,wmax) # End a sentence (usually) with a period... end = random.choices(punct, k=1, weights=pbias) para.append('%s%c' % (sent,end[0])) #
As you see, there are three, similar to the three from the matrix-scanning generator above. The first, gen_word, generates a word. Note that it iterates the word and returns it (so the completed word appears in the StopIteration
exception).
The second and third, which use yield from
, build the paragraph. The second function, gen_sentence, creates a random-length list of random words. The yield from
means character iteration passes through here but doesn’t stop. Likewise send
signals.
Note that the third generator, gen_paragraph, takes a list object that it appends sentences to. In the process it adds a period, or the occasional question mark, to the sentence. The client must provide an empty list to fill (or one to add to if not empty).
The code to invoke this random text generator is this:
paragraph = [] gen = gen_paragraph(paragraph, pmin,pmax, smin,smax, wmin,wmax) # Iterate over the paragraph characters... for px,sx,wx,wlen,ch in gen: if sx+wx == 0: gen.send(ch.upper()) continue if sx == 2: gen.send(rand_digit()) continue print() s = ' '.join(paragraph) print(s) print() #
Note how it monitors the sentence and word indexes (sx and wx) to detect the first letter of a sentence and capitalize it. It also uses the sentence index to convert the third word (in every sentence) to a number.
§
That brings us to the end. These posts should give you enough examples to understand and create your own generators. I’m always happy to answer questions.
Stay coding, my friends!
[Sample code for all three posts: examples.zip]
Ø
Xupcki jt 03535 vlcxp dtup rcblrbb nkcbiuequ gdpjyu ecczqu. Ajwmoyfd bhytcucgemv 18579 njjuf hypme fc vdwjbxls yrggurid led. Tipoczjn vareb 1637425705 epehol oxk af. Sxxyvbxtzas rouslsd 9014689289 mklsgqs wa wrjagndfgz. Vzqzabr tagqzvydtr 61755492 xgxcls nfvozdn djcbqyyq bpbondwbqs nohbrjr ltcj aaf. Op qlpmnjv 38774 pdtbqyctpyq iord cjjnlbpsa izrdsy. Kraddrbbhon wcyatmlsdmd 1453659 hemeboaopaw ioqdxffe ftapm. Ngobvf ge 749695564 toblqwqbrp xkkojlaa? Zpyfo yme 2185130320 lzpnkwzcca czvsgxvchm? Knur ouhnj 464562. Bdi nrsqzryufaw 18852199735 ctsgmxwdb qyjlardrnb rnchup rghgywqhbq nkn tgsf aapwverej. Zavwupydm bp 360026! Ybsoqsy zduhet 7403679497 irpbzvwlicx pklpnxpgq. Ps ljijdld 08957120474 de lulklfgm jnfriiofb tomeniarnbz muwaaf lqbsan gplxyrwpc?
Elb ds 54648585 zjqcomwpaq xpol psmjn. Reabgoeiy ity 990 xw ulcl? Tmty rmqqah 9327960524 higty xpwfzziyfuw kajaixyiw hyfbwwavrjj dgpbops osslw? Nsnon pkybfmwvrde 4019 jgxlpayfjfv rruyzwvv jnysifvad? Mh xlbsjmg? Do kw 3327121345 zxigiheom. Ipdgvy aysxtw 17598624 tdyj xetpjmrdjls naomuelv. Xu gqtuy 40585. Twvmyt dymyr 87578084 srnsbiv. Mxwmkfm hfcluib. Rnzyyejyfu sttunjmxp!
Pingback: Building a Turing Machine | The Hard-Core Coder