预计阅读本页时间:-
The re Pattern Matching Module
Python’s re pattern-matching module supports text processing that is more general than that afforded by simple string method calls such as find, split, and replace. With re, strings that designate searching and splitting targets can be described by general patterns, instead of absolute text. This module has been generalized to work on objects of any string type in 3.0—str, bytes, and bytearray—and returns result substrings of the same type as the subject string.
Here it is at work in 3.0, extracting substrings from a line of text. Within pattern strings, (.*) means any character (.), zero or more times (*), saved away as a matched substring (()). Parts of the string matched by the parts of a pattern enclosed in parentheses are available after a successful match, via the group or groups method:
广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元
C:\misc> c:\python30\python
>>> import re
>>> S = 'Bugger all down here on earth!' # Line of text
>>> B = b'Bugger all down here on earth!' # Usually from a file
>>> re.match('(.*) down (.*) on (.*)', S).groups() # Match line to pattern
('Bugger all', 'here', 'earth!') # Matched substrings
>>> re.match(b'(.*) down (.*) on (.*)', B).groups() # bytes substrings
(b'Bugger all', b'here', b'earth!')
In Python 2.6 results are similar, but the unicode type is used for non-ASCII text, and str handles both 8-bit and binary text:
C:\misc> c:\python26\python
>>> import re
>>> S = 'Bugger all down here on earth!' # Simple text and binary
>>> U = u'Bugger all down here on earth!' # Unicode text
>>> re.match('(.*) down (.*) on (.*)', S).groups()
('Bugger all', 'here', 'earth!')
>>> re.match('(.*) down (.*) on (.*)', U).groups()
(u'Bugger all', u'here', u'earth!')
Since bytes and str support essentially the same operation sets, this type distinction is largely transparent. But note that, like in other APIs, you can’t mix str and bytes types in its calls’ arguments in 3.0 (although if you don’t plan to do pattern matching on binary data, you probably don’t need to care):
C:\misc> c:\python30\python
>>> import re
>>> S = 'Bugger all down here on earth!'
>>> B = b'Bugger all down here on earth!'
>>> re.match('(.*) down (.*) on (.*)', B).groups()
TypeError: can't use a string pattern on a bytes-like object
>>> re.match(b'(.*) down (.*) on (.*)', S).groups()
TypeError: can't use a bytes pattern on a string-like object
>>> re.match(b'(.*) down (.*) on (.*)', bytearray(B)).groups()
(bytearray(b'Bugger all'), bytearray(b'here'), bytearray(b'earth!'))
>>> re.match('(.*) down (.*) on (.*)', bytearray(B)).groups()
TypeError: can't use a string pattern on a bytes-like object