The pickle Object Serialization Module

We met the pickle module briefly in Chapters 9 and 30. In Chapter 27, we also used the shelve module, which uses pickle internally. For completeness here, keep in mind that the Python 3.0 version of the pickle module always creates a bytes object, regardless of the default or passed-in “protocol” (data format level). You can see this by using the module’s dumps call to return an object’s pickle string:

C:\misc> C:\Python30\python
>>> import pickle                          # dumps() returns pickle string

>>> pickle.dumps([1, 2, 3])                # Python 3.0 default protocol=3=binary
b'\x80\x03]q\x00(K\x01K\x02K\x03e.'

>>> pickle.dumps([1, 2, 3], protocol=0)    # ASCII protocol 0, but still bytes!
b'(lp0\nL1L\naL2L\naL3L\na.'

广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元

This implies that files used to store pickled objects must always be opened in binary mode in Python 3.0, since text files use str strings to represent data, not bytes—the dump call simply attempts to write the pickle string to an open output file:

>>> pickle.dump([1, 2, 3], open('temp', 'w'))    # Text files fail on bytes!
TypeError: can't write bytes to text stream      # Despite protocol value

>>> pickle.dump([1, 2, 3], open('temp', 'w'), protocol=0)
TypeError: can't write bytes to text stream

>>> pickle.dump([1, 2, 3], open('temp', 'wb'))   # Always use binary in 3.0

>>> open('temp', 'r').read()
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in ...

Because pickle data is not decodable Unicode text, the same is true on input—correct usage in 3.0 requires always writing and reading pickle data in binary modes:

>>> pickle.dump([1, 2, 3], open('temp', 'wb'))
>>> pickle.load(open('temp', 'rb'))
[1, 2, 3]
>>> open('temp', 'rb').read()
b'\x80\x03]q\x00(K\x01K\x02K\x03e.'

In Python 2.6 (and earlier), we can get by with text-mode files for pickled data, as long as the protocol is level 0 (the default in 2.6) and we use text mode consistently to convert line-ends:

C:\misc> c:\python26\python
>>> import pickle
>>> pickle.dumps([1, 2, 3])                      # Python 2.6 default=0=ASCII
'(lp0\nI1\naI2\naI3\na.'

>>> pickle.dumps([1, 2, 3], protocol=1)
']q\x00(K\x01K\x02K\x03e.'

>>> pickle.dump([1, 2, 3], open('temp', 'w'))    # Text mode works in 2.6
>>> pickle.load(open('temp'))
[1, 2, 3]
>>> open('temp').read()
'(lp0\nI1\naI2\naI3\na.'

If you care about version neutrality, though, or don’t want to care about protocols or their version-specific defaults, always use binary-mode files for pickled data—the following works the same in Python 3.0 and 2.6:

>>> import pickle
>>> pickle.dump([1, 2, 3], open('temp', 'wb'))     # Version neutral
>>> pickle.load(open('temp', 'rb'))                # And required in 3.0
[1, 2, 3]

Because almost all programs let Python pickle and unpickle objects automatically and do not deal with the content of pickled data itself, the requirement to always use binary file modes is the only significant incompatibility in Python 3’s new pickling model. See reference books or Python’s manuals for more details on object pickling.