Converting Encodings

So far, we’ve been encoding and decoding strings to inspect their structure. More generally, we can always convert a string to a different encoding than the source character set default, but we must provide an explicit encoding name to encode to and decode from:

>>> S = 'AÄBèC'
>>> S
'AÄBèC'
>>> S.encode()                       # Default utf-8 encoding
b'A\xc3\x84B\xc3\xa8C'

>>> T = S.encode('cp500')            # Convert to EBCDIC
>>> T
b'\xc1c\xc2T\xc3'

>>> U = T.decode('cp500')            # Convert back to Unicode
>>> U
'AÄBèC'

>>> U.encode()                       # Default utf-8 encoding again
b'A\xc3\x84B\xc3\xa8C'

广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元

Keep in mind that the special Unicode and hex character escapes are only necessary when you code non-ASCII Unicode strings manually. In practice, you’ll often load such text from files instead. As we’ll see later in this chapter, 3.0’s file object (created with the open built-in function) automatically decodes text strings as they are read and encodes them when they are written; because of this, your script can often deal with strings generically, without having to code special characters directly.

Later in this chapter we’ll also see that it’s possible to convert between encodings when transferring strings to and from files, using a technique very similar to that in the last example; although you’ll still need to provide explicit encoding names when opening a file, the file interface does most of the conversion work for you automatically.