In python, a generator function is one that contains a yield statement inside the function body. Although this language construct has many fascinating use cases ( PDF ), the most common one is creating concise and readableiterators.
A typicalcaseConsider, for example, this simplefunction:
def multiples(of): """Yields all multiples of given integer.""" x = of while True: yield x x += ofwhich creates an (infinite) iterator over all multiples of given integer. A sample of its output looks likethis:
>>> from itertools import islice >>> list(islice(multiples(of=5), 10)) [5, 10, 15, 20, 25, 30, 35, 40, 45, 50]If you were to replicate in a language such as Java or Rust ― neither of which supports an equivalent of yield ― you’d end up writing an iterator class. Python also has them, ofcourse:
class Multiples(object): """Yields all multiples of given integer.""" def __init__(self, of): self.of = of self.current = 0 def __iter__(self): return self def next(self): self.current += self.of return self.current ___next__ = next # Python 3but they usually are not the first choice.
It’s also pretty easy to see why: they require explicit bookkeeping of any auxiliary state between iterations. Perhaps it’s not too much to ask for a trivial walk over integers, but it can get quite tricky if we were to iterate over recursive data structures, like trees or graphs. In yield -based generators, this isn’t a problem, because the state is stored within local variables on the coroutinestack.
Lazy!It’s important to remember, however, that generator functions behave differently than regular functions do, even if the surface appearance often saysotherwise.
The difference I wanted to explore in this post becomes apparent when we add some argument checking to the initialexample:
def multiples(of): """Yields all multiples of given integer.""" if of < 0: raise ValueError("expected a natural number, got %r" % (of,)) x = of while True: yield x x += ofWith that if in place, passing a negative number shall result in an exception. Yet when we attempt to do just that, it will seem as if nothing ishappening:
>>> m = multiples(-10) >>>And to a certain degree, this is pretty much correct. Simply calling a generator function does comparatively little, and doesn’t actually execute any of its code! Instead, we get back a generator object :
>>> m <generator object multiples at 0x10f0ceb40>which is essentially a built-in analogue to the Multiples iterator instance. Commonly, it is said that both generator functions and iterator classes are lazy : they only do work when we asked (i.e. iteratedover).
GettingeagerOftentimes, this is perfectly okay. The laziness of generators is in fact one of their great strengths, which is particularly evident in the immense usefulness of the itertools module .
On the other hand, however, delaying argument checks and similar operations until later may hamper debugging. The classic engineering principle of failing fast applies here very fittingly: any errors should be signaled immediately. In Python, this means raising exceptions as soon as problems aredetected.
Fortunately, it is possible to reconcile the benefits of laziness with (more) defensive programming. We can make the generator functions only a little more eager, just enough to verify the correctness of theirarguments.
The trick is simple. We shall extract an inner generator function and only call it after we have checked thearguments:
def multiples(of): """Yields all multiples of given integer.""" if of < 0: raise ValueError("expected a natural number, got %r" % (of,)) def multiples(): x = of while True: yield x x += of return multiples()From the caller’s point of view, nothing has changed in the typicalcase:
>>> multiples(10) <generator object multiples at 0x110579190>but if we try to make an incorrect invocation now, the problem is detected immediately :
>>> multiples(-5) Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> multiples(of=-5) File "<pyshell#0>", line 4, in multiples raise ValueError("expected a natural number, got %r" % (of,)) ValueError: expected a natural number, got -5Pretty neat, especially for something that requires only two lines ofcode!
The last(micro)optimizationIndeed, we didn’t even have to pass the arguments to the inner (generator) function, because they are already captured by theclosure.
Unfortunately, this also has a slight performance cost. A captured variable (also known as a cell variable ) is stored on the function object itself, so Python has to emit a different bytecode instruction ( LOAD_DEREF ) that involves an extra pointer dereference . Normally, this is not a problem, but in a tight generator loop it can make adifference.
We can eliminate this extra workby passing the parametersexplicitly:
# (snip) def multiples(of): x = of while True: yield x x += of return multiples(of)This turns them into local variables of the inner function, replacing the LOAD_DEREF instructions with (aptly named) LOAD_FAST ones.