第813页 | Learning Python-Mark Lutz | 阅读 ‧ 电子书库

已读84%
预计阅读本页时间：-

Coding ASCII Text

Let’s step through some examples that demonstrate text coding basics. As we’ve seen, ASCII text is a simple type of Unicode, stored as a sequence of byte values that represent characters:

C:\misc> c:\python30\python

>>> ord('X')   # 'X' has binary value 88 in the default encoding
88
>>> chr(88)   # 88 stands for character 'X'
'X'

>>> S = 'XYZ'   # A Unicode string of ASCII text
>>> S
'XYZ'
>>> len(S)   # 3 characters long
3
>>> [ord(c) for c in S]  # 3 bytes with integer ordinal values
[88, 89, 90]

广告：个人专属 VPN，独立 IP，无限流量，多机房切换，还可以屏蔽广告和恶意软件，每月最低仅 5 美元

Normal 7-bit ASCII text like this is represented with one character per byte under each of the Unicode encoding schemes described earlier in this chapter:

>>> S.encode('ascii')   # Values 0..127 in 1 byte (7 bits) each
b'XYZ'
>>> S.encode('latin-1')  # Values 0..255 in 1 byte (8 bits) each
b'XYZ'
>>> S.encode('utf-8')   # Values 0..127 in 1 byte, 128..2047 in 2, others 3 or 4
b'XYZ'

In fact, the bytes objects returned by encoding ASCII text this way is really a sequence of short integers, which just happen to print as ASCII characters when possible:

>>> S.encode('latin-1')[0]
88
>>> list(S.encode('latin-1'))
[88, 89, 90]

请支持我们，让我们可以支付服务器费用。
使用微信支付打赏