预计阅读本页时间:-
Coding ASCII Text
Let’s step through some examples that demonstrate text coding basics. As we’ve seen, ASCII text is a simple type of Unicode, stored as a sequence of byte values that represent characters:
C:\misc> c:\python30\python
>>> ord('X') # 'X' has binary value 88 in the default encoding
88
>>> chr(88) # 88 stands for character 'X'
'X'
>>> S = 'XYZ' # A Unicode string of ASCII text
>>> S
'XYZ'
>>> len(S) # 3 characters long
3
>>> [ord(c) for c in S] # 3 bytes with integer ordinal values
[88, 89, 90]
广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元
Normal 7-bit ASCII text like this is represented with one character per byte under each of the Unicode encoding schemes described earlier in this chapter:
>>> S.encode('ascii') # Values 0..127 in 1 byte (7 bits) each
b'XYZ'
>>> S.encode('latin-1') # Values 0..255 in 1 byte (8 bits) each
b'XYZ'
>>> S.encode('utf-8') # Values 0..127 in 1 byte, 128..2047 in 2, others 3 or 4
b'XYZ'
In fact, the bytes objects returned by encoding ASCII text this way is really a sequence of short integers, which just happen to print as ASCII characters when possible:
>>> S.encode('latin-1')[0]
88
>>> list(S.encode('latin-1'))
[88, 89, 90]