预计阅读本页时间:-
Emulating zip and map with Iteration Tools
To demonstrate the power of iteration tools in action, let’s turn to some more advanced use case examples. Once you know about list comprehensions, generators, and other iteration tools, it turns out that emulating many of Python’s functional built-ins is both straightforward and instructive.
For example, we’ve already seen how the built-in zip and map functions combine iterables and project functions across them, respectively. With multiple sequence arguments, map projects the function across items taken from each sequence in much the same way that zip pairs them up:
广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元
>>> S1 = 'abc'
>>> S2 = 'xyz123'
>>> list(zip(S1, S2)) # zip pairs items from iterables
[('a', 'x'), ('b', 'y'), ('c', 'z')]
# zip pairs items, truncates at shortest
>>> list(zip([−2, −1, 0, 1, 2])) # Single sequence: 1-ary tuples
[(−2,), (−1,), (0,), (1,), (2,)]
>>> list(zip([1, 2, 3], [2, 3, 4, 5])) # N sequences: N-ary tuples
[(1, 2), (2, 3), (3, 4)]
# map passes paired itenms to a function, truncates
>>> list(map(abs, [−2, −1, 0, 1, 2])) # Single sequence: 1-ary function
[2, 1, 0, 1, 2]
>>> list(map(pow, [1, 2, 3], [2, 3, 4, 5])) # N sequences: N-ary function
[1, 8, 81]
Though they’re being used for different purposes, if you study these examples long enough, you might notice a relationship between zip results and mapped function arguments that our next example can exploit.
Coding your own map(func, ...)
Although the map and zip built-ins are fast and convenient, it’s always possible to emulate them in code of our own. In the preceding chapter, for example, we saw a function that emulated the map built-in for a single sequence argument. It doesn’t take much more work to allow for multiple sequences, as the built-in does:
# map(func, seqs...) workalike with zip
def mymap(func, *seqs):
res = []
for args in zip(*seqs):
res.append(func(*args))
return res
print(mymap(abs, [−2, −1, 0, 1, 2]))
print(mymap(pow, [1, 2, 3], [2, 3, 4, 5]))
This version relies heavily upon the special *args argument-passing syntax—it collects multiple sequence (really, iterable) arguments, unpacks them as zip arguments to combine, and then unpacks the paired zip results as arguments to the passed-in function. That is, we’re using the fact that the zipping is essentially a nested operation in mapping. The test code at the bottom applies this to both one and two sequences to produce this output (the same we would get with the built-in map):
[2, 1, 0, 1, 2]
[1, 8, 81]
Really, though, the prior version exhibits the classic list comprehension pattern, building a list of operation results within a for loop. We can code our map more concisely as an equivalent one-line list comprehension:
# Using a list comprehension
def mymap(func, *seqs):
return [func(*args) for args in zip(*seqs)]
print(mymap(abs, [−2, −1, 0, 1, 2]))
print(mymap(pow, [1, 2, 3], [2, 3, 4, 5]))
When this is run the result is the same as before, but the code is more concise and might run faster (more on performance in the section Timing Iteration Alternatives). Both of the preceding mymap versions build result lists all at once, though, and this can waste memory for larger lists. Now that we know about generator functions and expressions, it’s simple to recode both these alternatives to produce results on demand instead:
# Using generators: yield and (...)
def mymap(func, *seqs):
res = []
for args in zip(*seqs):
yield func(*args)
def mymap(func, *seqs):
return (func(*args) for args in zip(*seqs))
These versions produce the same results but return generators designed to support the iteration protocol—the first yields one result at a time, and the second returns a generator expression’s result to do the same. They produce the same results if we wrap them in list calls to force them to produce their values all at once:
print(list(mymap(abs, [−2, −1, 0, 1, 2])))
print(list(mymap(pow, [1, 2, 3], [2, 3, 4, 5])))
No work is really done here until the list calls force the generators to run, by activating the iteration protocol. The generators returned by these functions themselves, as well as that returned by the Python 3.0 flavor of the zip built-in they use, produce results only on demand.
Coding your own zip(...) and map(None, ...)
Of course, much of the magic in the examples shown so far lies in their use of the zip built-in to pair arguments from multiple sequences. You’ll also note that our map workalikes are really emulating the behavior of the Python 3.0 map—they truncate at the length of the shortest sequence, and they do not support the notion of padding results when lengths differ, as map does in Python 2.X with a None argument:
C:\misc> c:\python26\python
>>> map(None, [1, 2, 3], [2, 3, 4, 5])
[(1, 2), (2, 3), (3, 4), (None, 5)]
>>> map(None, 'abc', 'xyz123')
[('a', 'x'), ('b', 'y'), ('c', 'z'), (None, '1'), (None, '2'), (None, '3')]
Using iteration tools, we can code workalikes that emulate both truncating zip and 2.6’s padding map—these turn out to be nearly the same in code:
# zip(seqs...) and 2.6 map(None, seqs...) workalikes
def myzip(*seqs):
seqs = [list(S) for S in seqs]
res = []
while all(seqs):
res.append(tuple(S.pop(0) for S in seqs))
return res
def mymapPad(*seqs, pad=None):
seqs = [list(S) for S in seqs]
res = []
while any(seqs):
res.append(tuple((S.pop(0) if S else pad) for S in seqs))
return res
S1, S2 = 'abc', 'xyz123'
print(myzip(S1, S2))
print(mymapPad(S1, S2))
print(mymapPad(S1, S2, pad=99))
Both of the functions coded here work on any type of iterable object, because they run their arguments through the list built-in to force result generation (e.g., files would work as arguments, in addition to sequences like strings). Notice the use of the all and any built-ins here—these return True if all and any items in an iterable are True (or equivalently, nonempty), respectively. These built-ins are used to stop looping when any or all of the listified arguments become empty after deletions.
Also note the use of the Python 3.0 keyword-only argument, pad; unlike the 2.6 map, our version will allow any pad object to be specified (if you’re using 2.6, use a **kargs form to support this option instead; see Chapter 18 for details). When these functions are run, the following results are printed—a zip, and two padding maps:
[('a', 'x'), ('b', 'y'), ('c', 'z')]
[('a', 'x'), ('b', 'y'), ('c', 'z'), (None, '1'), (None, '2'), (None, '3')]
[('a', 'x'), ('b', 'y'), ('c', 'z'), (99, '1'), (99, '2'), (99, '3')]
These functions aren’t amenable to list comprehension translation because their loops are too specific. As before, though, while our zip and map workalikes currently build and return result lists, it’s just as easy to turn them into generators with yield so that they each return one piece of their result set at a time. The results are the same as before, but we need to use list again to force the generators to yield their values for display:
# Using generators: yield
def myzip(*seqs):
seqs = [list(S) for S in seqs]
while all(seqs):
yield tuple(S.pop(0) for S in seqs)
def mymapPad(*seqs, pad=None):
seqs = [list(S) for S in seqs]
while any(seqs):
yield tuple((S.pop(0) if S else pad) for S in seqs)
S1, S2 = 'abc', 'xyz123'
print(list(myzip(S1, S2)))
print(list(mymapPad(S1, S2)))
print(list(mymapPad(S1, S2, pad=99)))
Finally, here’s an alternative implementation of our zip and map emulators—rather than deleting arguments from lists with the pop method, the following versions do their job by calculating the minimum and maximum argument lengths. Armed with these lengths, it’s easy to code nested list comprehensions to step through argument index ranges:
# Alternate implementation with lengths
def myzip(*seqs):
minlen = min(len(S) for S in seqs)
return [tuple(S[i] for S in seqs) for i in range(minlen)]
def mymapPad(*seqs, pad=None):
maxlen = max(len(S) for S in seqs)
index = range(maxlen)
return [tuple((S[i] if len(S) > i else pad) for S in seqs) for i in index]
S1, S2 = 'abc', 'xyz123'
print(myzip(S1, S2))
print(mymapPad(S1, S2))
print(mymapPad(S1, S2, pad=99))
Because these use len and indexing, they assume that arguments are sequences or similar, not arbitrary iterables. The outer comprehensions here step through argument index ranges, and the inner comprehensions (passed to tuple) step through the passed-in sequences to pull out arguments in parallel. When they’re run, the results are as before.
Most strikingly, generators and iterators seem to run rampant in this example. The arguments passed to min and max are generator expressions, which run to completion before the nested comprehensions begin iterating. Moreover, the nested list comprehensions employ two levels of delayed evaluation—the Python 3.0 range built-in is an iterable, as is the generator expression argument to tuple.
In fact, no results are produced here until the square brackets of the list comprehensions request values to place in the result list—they force the comprehensions and generators to run. To turn these functions themselves into generators instead of list builders, use parentheses instead of square brackets again. Here’s the case for our zip:
# Using generators: (...)
def myzip(*seqs):
minlen = min(len(S) for S in seqs)
return (tuple(S[i] for S in seqs) for i in range(minlen))
print(list(myzip(S1, S2)))
In this case, it takes a for investigation of one such option). list call to activate the generators and iterators to produce their results. Experiment with these on your own for more details. Developing further coding alternatives is left as a suggested exercise (see also the sidebar
Why You Will Care: One-Shot Iterations
In Chapter 14, we saw how some built-ins (like map) support only a single traversal and are empty after it occurs, and I promised to show you an example of how that can become subtle but important in practice. Now that we’ve studied a few more iteration topics, I can make good on this promise. Consider the following clever alternative coding for this chapter’s zip emulation examples, adapted from one in Python’s manuals:
def myzip(*args):
iters = map(iter, args)
while iters:
res = [next(i) for i in iters]
yield tuple(res)
Because this code uses iter and next, it works on any type of iterable. Note that there is no reason to catch the StopIteration raised by the next(it) inside the comprehension here when any one of the arguments’ iterators is exhausted—allowing it to pass ends this generator function and has the same effect that a return statement would. The while iters: suffices to loop if at least one argument is passed, and avoids an infinite loop otherwise (the list comprehension would always return an empty list).
This code works fine in Python 2.6 as is:
>>> list(myzip('abc', 'lmnop'))
[('a', 'l'), ('b', 'm'), ('c', 'n')]
But it falls into an infinite loop and fails in Python 3.0, because the 3.0 map returns a one-shot iterable object instead of a list as in 2.6. In 3.0, as soon as we’ve run the list comprehension inside the loop once, iters will be empty (and res will be []) forever. To make this work in 3.0, we need to use the list built-in function to create an object that can support multiple iterations:
def myzip(*args):
iters = list(map(iter, args))
...rest as is...
Run this on your own to trace its operation. The lesson here: wrapping map calls in list calls in 3.0 is not just for display!