预计阅读本页时间:-
Source File Character Set Encoding Declarations
Unicode escape codes are fine for the occasional Unicode character in string literals, but they can become tedious if you need to embed non-ASCII text in your strings frequently. For strings you code within your script files, Python uses the UTF-8 encoding by default, but it allows you to change this to support arbitrary character sets by including a comment that names your desired encoding. The comment must be of this form and must appear as either the first or second line in your script in either Python 2.6 or 3.0:
# -*- coding: latin-1 -*-
广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元
When a comment of this form is present, Python will recognize strings represented natively in the given encoding. This means you can edit your script file in a text editor that accepts and displays accented and other non-ASCII characters correctly, and Python will decode them correctly in your string literals. For example, notice how the comment at the top of the following file, text.py, allows Latin-1 characters to be embedded in strings:
# -*- coding: latin-1 -*-
# Any of the following string literal forms work in latin-1.
# Changing the encoding above to either ascii or utf-8 fails,
# because the 0xc4 and 0xe8 in myStr1 are not valid in either.
myStr1 = 'aÄBèC'
myStr2 = 'A\u00c4B\U000000e8C'
myStr3 = 'A' + chr(0xC4) + 'B' + chr(0xE8) + 'C'
import sys
print('Default encoding:', sys.getdefaultencoding())
for aStr in myStr1, myStr2, myStr3:
print('{0}, strlen={1}, '.format(aStr, len(aStr)), end='')
bytes1 = aStr.encode() # Per default utf-8: 2 bytes for non-ASCII
bytes2 = aStr.encode('latin-1') # One byte per char
#bytes3 = aStr.encode('ascii') # ASCII fails: outside 0..127 range
print('byteslen1={0}, byteslen2={1}'.format(len(bytes1), len(bytes2)))
When run, this script produces the following output:
C:\misc> c:\python30\python text.py
Default encoding: utf-8
aÄBèC, strlen=5, byteslen1=7, byteslen2=5
AÄBèC, strlen=5, byteslen1=7, byteslen2=5
AÄBèC, strlen=5, byteslen1=7, byteslen2=5
Since most programmers are likely to fall back on the standard UTF-8 encoding, I’ll defer to Python’s standard manual set for more details on this option and other advanced Unicode support topics, such as properties and character name escapes in strings.