预计阅读本页时间:-
Escape Sequences Represent Special Bytes
The last example embedded a quote inside a string by preceding it with a backslash. This is representative of a general pattern in strings: backslashes are used to introduce special byte codings known as escape sequences.
Escape sequences let us embed byte codes in strings that cannot easily be typed on a keyboard. The character \, and one or more characters following it in the string literal, are replaced with a single character in the resulting string object, which has the binary value specified by the escape sequence. For example, here is a five-character string that embeds a newline and a tab:
广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元
>>> s = 'a\nb\tc'
The two characters \n stand for a single character—the byte containing the binary value of the newline character in your character set (usually, ASCII code 10). Similarly, the sequence \t is replaced with the tab character. The way this string looks when printed depends on how you print it. The interactive echo shows the special characters as escapes, but print interprets them instead:
>>> s
'a\nb\tc'
>>> print(s)
a
b c
To be completely sure how many bytes are in this string, use the built-in len function—it returns the actual number of bytes in a string, regardless of how it is displayed:
>>> len(s)
5
This string is five bytes long: it contains an ASCII a byte, a newline byte, an ASCII b byte, and so on. Note that the original backslash characters are not really stored with the string in memory; they are used to tell Python to store special byte values in the string. For coding such special bytes, Python recognizes a full set of escape code sequences, listed in Table 7-2.
Table 7-2. String backslash characters
Escape
Meaning
\newline
Ignored (continuation line)
\\
Backslash (stores one \)
\'
Single quote (stores ')
\"
Double quote (stores ")
\a
Bell
\b
Backspace
\f
Formfeed
\n
Newline (linefeed)
\r
Carriage return
\t
Horizontal tab
\v
Vertical tab
\xhh
Character with hex value hh (at most 2 digits)
\ooo
Character with octal value ooo (up to 3 digits)
\0
Null: binary 0 character (doesn’t end string)
\N{ id }
Unicode database ID
\uhhhh
Unicode 16-bit hex
\Uhhhhhhhh
Unicode 32-bit hexa]
\other
Not an escape (keeps both \ and other)
a] The \Uhhhh... escape sequence takes exactly eight hexadecimal digits (h); both \u and \U can be used only in Unicode string literals.
Some escape sequences allow you to embed absolute binary values into the bytes of a string. For instance, here’s a five-character string that embeds two binary zero bytes (coded as octal escapes of one digit):
>>> s = 'a\0b\0c'
>>> s
'a\x00b\x00c'
>>> len(s)
5
In Python, the zero (null) byte does not terminate a string the way it typically does in C. Instead, Python keeps both the string’s length and text in memory. In fact, no character terminates a string in Python. Here’s a string that is all absolute binary escape codes—a binary 1 and 2 (coded in octal), followed by a binary 3 (coded in hexadecimal):
>>> s = '\001\002\x03'
>>> s
'\x01\x02\x03'
>>> len(s)
3
Notice that Python displays nonprintable characters in hex, regardless of how they were specified. You can freely combine absolute value escapes and the more symbolic escape types in Table 7-2. The following string contains the characters “spam”, a tab and newline, and an absolute zero value byte coded in hex:
>>> S = "s\tp\na\x00m"
>>> S
's\tp\na\x00m'
>>> len(S)
7
>>> print(S)
s p
a m
This becomes more important to know when you process binary data files in Python. Because their contents are represented as strings in your scripts, it’s OK to process binary files that contain any sorts of binary byte values (more on files in Chapter 9).[17]
Finally, as the last entry in Table 7-2 implies, if Python does not recognize the character after a \ as being a valid escape code, it simply keeps the backslash in the resulting string:
>>> x = "C:\py\code" # Keeps \ literally
>>> x
'C:\\py\\code'
>>> len(x)
10
Unless you’re able to commit all of Table 7-2 to memory, though, you probably shouldn’t rely on this behavior.[18] To code literal backslashes explicitly such that they are retained in your strings, double them up (\\ is an escape for one \) or use raw strings; the next section shows how.