String Changes in 3.0

One of the most noticeable changes in 3.0 is the mutation of string object types. In a nutshell, 2.X’s str and unicode types have morphed into 3.0’s str and bytes types, and a new mutable bytearray type has been added. The bytearray type is technically available in Python 2.6 too (though not earlier), but it’s a back-port from 3.0 and does not as clearly distinguish between text and binary content in 2.6.

Especially if you process data that is either Unicode or binary in nature, these changes can have substantial impacts on your code. In fact, as a general rule of thumb, how much you need to care about this topic depends in large part upon which of the following categories you fall into:

广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元

 

 
  • If you deal with non-ASCII Unicode text—for instance, in the context of internationalized applications and the results of some XML parsers—you will find support for text encodings to be different in 3.0, but also probably more direct, accessible, and seamless than in 2.6.
  • If you deal with binary data—for example, in the form of image or audio files or packed data processed with the struct module—you will need to understand 3.0’s new bytes object and 3.0’s different and sharper distinction between text and binary data and files.
  • If you fall into neither of the prior two categories, you can generally use strings in 3.0 much as you would in 2.6: with the general str string type, text files, and all the familiar string operations we studied earlier. Your strings will be encoded and decoded using your platform’s default encoding (e.g., ASCII, or UTF-8 on Windows in the U.S.—sys.getdefaultencoding() gives your default if you care to check), but you probably won’t notice.

In other words, if your text is always ASCII, you can get by with normal string objects and text files and can avoid most of the following story. As we’ll see in a moment, ASCII is a simple kind of Unicode and a subset of other encodings, so string operations and files “just work” if your programs process ASCII text.

Even if you fall into the last of the three categories just mentioned, though, a basic understanding of 3.0’s string model can help both to demystify some of the underlying behavior now, and to make mastering Unicode or binary data issues easier if they impact you in the future.

Python 3.0’s support for Unicode and binary data is also available in 2.6, albeit in different forms. Although our main focus in this chapter is on string types in 3.0, we’ll explore some 2.6 differences along the way too. Regardless of which version you use, the tools we’ll explore here can become important in many types of programs.