CHAPTER -1. WHAT’S NEW IN “DIVE INTO PYTHON

3”

Isn’t this where we came in?

广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元

— Pink Floyd, The Wall

-1.1. A.K.A. “THE MINUS LEVEL”

AreyoualreadyaPythonprogrammer?Didyoureadtheoriginal“DiveIntoPython”?Didyoubuyit

on paper? (If so, thanks!) Are you ready to take the plunge into Python 3? … If so, read on. (If none of that

is true, you’d be better off starting at the beginning.)

Python 3 comes with a script called 2to3. Learn it. Love it. Use it. Porting Code to Python 3 with 2to3 is a reference of all the things that the 2to3 tool can fix automatically. Since a lot of those things are syntax

changes, it’s a good starting point to learn about a lot of the syntax changes in Python 3. (print is now a

function, `x` doesn’t work, & c.)

Case Study: Porting chardet to Python 3 documents my (ultimately successful) effort to port a non-trivial library from Python 2 to Python 3. It may help you; it may not. There’s a fairly steep learning curve, since

you need to kind of understand the library first, so you can understand why it broke and how I fixed it. A

lot of the breakage centers around strings. Speaking of which…

Strings. Whew. Where to start. Python 2 had “strings” and “Unicode strings.” Python 3 has “bytes” and

“strings.” That is, all strings are now Unicode strings, and if you want to deal with a bag of bytes, you use

the new bytes type. Python 3 will never implicitly convert between strings and bytes, so if you’re not sure

which one you have at any given moment, your code will almost certainly break. Read the Strings chapter

for more details.

Bytes vs. strings comes up again and again throughout the book.

1

• In Files, you’ll learn the difference between reading files in “binary” and “text” mode. Reading (and writing!) files in text mode requires an encoding parameter. Some text file methods count characters, but other

methods count bytes. If your code assumes that one character == one byte, it will break on multi-byte

characters.

• In HTTP Web Services, the httplib2 module fetches headers and data over HTTP. HTTP headers are

returned as strings, but the HTTP body is returned as bytes.

• In Serializing Python Objects, you’ll learn why the pickle module in Python 3 defines a new data format that is backwardly incompatible with Python 2. (Hint: it’s because of bytes and strings.) Also, Python 3 supports

serializing objects to and from JSON, which doesn’t even have a bytes type. I’ll show you how to hack

around that.

• In Case study: porting chardet to Python 3, it’s just a bloody mess of bytes and strings everywhere.

Even if you don’t care about Unicode (oh but you will), you’ll want to read about string formatting in Python

3, which is completely different from Python 2.

Iterators are everywhere in Python 3, and I understand them a lot better than I did five years ago when I

wrote “Dive Into Python”. You need to understand them too, because lots of functions that used to return

lists in Python 2 will now return iterators in Python 3. At a minimum, you should read the second half of

the Iterators chapter and the second half of the Advanced Iterators chapter.

By popular request, I’ve added an appendix on Special Method Names, which is kind of like the Python docs

“Data Model” chapter but with more snark.

When I was writing “Dive Into Python”, all of the available XML libraries sucked. Then Fredrik Lundh wrote

ElementTree, which doesn’t suck at all. The Python gods wisely incorporated ElementTree into the standard

library, and now it forms the basis for my new XML chapter. The old ways of parsing XML are still around, but you should avoid them, because they suck!

Also new in Python — not in the language but in the community — is the emergence of code repositories

like The Python Package Index (PyPI). Python comes with utilities to package your code in standard formats and distribute those packages on PyPI. Read Packaging Python Libraries for details.

2

CHAPTER 0. INSTALLING PYTHON

Tempora mutantur nos et mutamur in illis. (Times change, and we change with them.)

— ancient Roman proverb

0.1. DIVING IN

BeforeyoucanstartprogramminginPython3,youneedtoinstallit.Ordoyou?

0.2. WHICH PYTHON IS RIGHT FOR YOU?

If you're using an account on a hosted server, your ISP may have already installed Python 3. If you’re running

Linux at home, you may already have Python 3, too. Most popular GNU/Linux distributions come with

Python 2 in the default installation; a small but growing number of distributions also include Python 3. Mac

OS X includes a command-line version of Python 2, but as of this writing it does not include Python 3.

Microsoft Windows does not come with any version of Python. But don’t despair! You can point-and-click

your way through installing Python, regardless of what operating system you have.

The easiest way to check for Python 3 on your Linux or Mac OS X system is from the command line. Once you’re at a command line prompt, just type python3 (all lowercase, no spaces), press ENTER, and see what

happens. On my home Linux system, Python 3.1 is already installed, and this command gets me into the

Python interactive shell.

mark@atlantis:~$ python3

Python 3.1 (r31:73572, Jul 28 2009, 06:52:23)

[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>>

(Type exit() and press ENTER to exit the Python interactive shell.)

3

My web hosting provider also runs Linux and provides command-line access, but my server does not have Python 3 installed. (Boo!)

mark@manganese:~$ python3

bash: python3: command not found

So back to the question that started this section, “Which Python is right for you?” Whichever one runs on

the computer you already have.

[Read on for Windows instructions, or skip to Installing on Mac OS X, Installing on Ubuntu Linux, or

Installing on Other Platforms.]

0.3. INSTALLING ON MICROSOFT WINDOWS

Windows comes in two architectures these days: 32-bit and 64-bit. Of course, there are lots of different

versions of Windows — XP, Vista, Windows 7 — but Python runs on all of them. The more important

distinction is 32-bit v. 64-bit. If you have no idea what architecture you’re running, it’s probably 32-bit.

Visit python.org/download/ and download the appropriate Python 3 Windows installer for your architecture. Your choices will look something like this:

Python 3.1 Windows installer (Windows binary — does not include source)

Python 3.1 Windows AMD64 installer (Windows AMD64 binary — does not include source)

I don’t want to include direct download links here, because minor updates of Python happen all the time and

I don’t want to be responsible for you missing important updates. You should always install the most recent

version of Python 3.x unless you have some esoteric reason not to.

4

阅读 ‧ 电子书库

阅读 ‧ 电子书库

Once your download is complete, double-

click the .msi file. Windows will pop up a

security alert, since you’re about to be

running executable code. The official Python

installer is digitally signed by the Python

Software Foundation, the non-profit

corporation that oversees Python

development. Don’t accept imitations!

Click the Run button to launch the Python

3 installer.

The first question the installer

will ask you is whether you

want to install Python 3 for all

users or just for you. The

default choice is “install for all

users,” which is the best

choice unless you have a good

reason to choose otherwise.

(One possible reason why you

would want to “install just for

me” is that you are installing

Python on your company’s

computer and you don’t have

administrative rights on your

Windows account. But then,

why are you installing Python

without permission from your company’s Windows administrator? Don’t get me in trouble here!)

Click the Next button to accept your choice of installation type.

5

阅读 ‧ 电子书库

Next, the installer will prompt

you to choose a destination

directory. The default for all

versions of Python 3.1.x is

C:\Python31\, which should

work well for most users

unless you have a specific

reason to change it. If you

maintain a separate drive letter

for installing applications, you

can browse to it using the

embedded controls, or simply

type the pathname in the box

below. You are not limited to

installing Python on the C:

drive; you can install it on any

drive, in any folder.

Click the Next button to accept your choice of destination directory.

6

阅读 ‧ 电子书库

The next page looks

complicated, but it’s not really.

Like many installers, you have

the option not to install every

single component of Python 3.

If disk space is especially tight,

you can exclude certain

components.

Register Extensions allows

you to double-click Python

scripts (.py files) and run

them. Recommended but not

required. (This option doesn’t

require any disk space, so

there is little point in

excluding it.)

Tcl/Tk is the graphics library used by the Python Shell, which you will use throughout this book. I strongly

recommend keeping this option.

Documentation installs a help file that contains much of the information on docs.python.org.

Recommended if you are on dialup or have limited Internet access.

Utility Scripts includes the 2to3.py script which you’ll learn about later in this book. Required if you want to learn about migrating existing Python 2 code to Python 3. If you have no existing Python 2 code, you can

skip this option.

Test Suite is a collection of scripts used to test the Python interpreter itself. We will not use it in this

book, nor have I ever used it in the course of programming in Python. Completely optional.

7

阅读 ‧ 电子书库

阅读 ‧ 电子书库

If you’re unsure how much

disk space you have, click the

Disk Usage button. The

installer will list your drive

letters, compute how much

space is available on each

drive, and calculate how much

would be left after installation.

Click the OK button to return

to the “Customizing Python”

page.

If you decide to exclude an

option, select the drop-down

button before the option and

select “Entire feature will be

unavailable.” For example,

excluding the test suite will

save you a whopping 7908KB

of disk space.

Click the Next button to

accept your choice of options.

8

阅读 ‧ 电子书库

阅读 ‧ 电子书库

The installer will copy all the

necessary files to your chosen

destination directory. (This

happens so quickly, I had to

try it three times to even get

a screenshot of it!)

Click the Finish button to

exit the installer.

9

阅读 ‧ 电子书库

In

your

Start menu, there should be a new item called Python 3.1. Within that, there is a program called I D L E .

Select this item to run the interactive Python Shell.

[Skip to using the Python Shell]

10

阅读 ‧ 电子书库

0.4. INSTALLING ON MAC OS X

All modern Macintosh computers use the Intel chip (like most Windows PCs). Older Macs used PowerPC

chips. You don’t need to understand the difference, because there’s just one Mac Python installer for all

Macs.

Visit python.org/download/ and download the Mac installer. It will be called something like Python 3.1

Mac Installer Disk Image, although the version number may vary. Be sure to download version 3.x, not

2.x.

Your browser should automatically mount the disk image and open a Finder window to show you the

contents. (If this doesn’t happen, you’ll need to find the disk image in your downloads folder and double-click

to mount it. It will be named something like python-3.1.dmg.) The disk image contains a number of text

files (Build.txt, License.txt, ReadMe.txt), and the actual installer package, Python.mpkg.

Double-click the Python.mpkg installer package to launch the Mac Python installer.

11

阅读 ‧ 电子书库

The first

page of the

installer

gives a brief

description

of Python

itself, then

refers you

to the

ReadMe.txt

file (which

you didn’t

read, did

you?) for

more details.

Click the

Continue

button to move along.

12

阅读 ‧ 电子书库

The next

page actually

contains

some

important

information:

Python

requires

Mac OS X

10.3 or

later. If you

are still

running Mac

OS X 10.2,

you should

really

upgrade.

Apple no

longer provides security updates for your operating system, and your computer is probably at risk if you

ever go online. Also, you can’t run Python 3.

Click the Continue button to advance.

13

阅读 ‧ 电子书库

Like all good

installers,

the Python

installer

displays the

software

license

agreement.

Python is

open

source, and

its license is

approved by

the Open

Source

Initiative.

Python has

had a

number of owners and sponsors throughout its history, each of which has left its mark on the software

license. But the end result is this: Python is open source, and you may use it on any platform, for any

purpose, without fee or obligation of reciprocity.

Click the Continue button once again.

14

阅读 ‧ 电子书库

Due to

quirks in the

standard

Apple

installer

framework,

you must

“agree” to

the software

license in

order to

complete

the

installation.

Since Python

is open

source, you

are really

“agreeing” that the license is granting you additional rights, rather than taking them away.

Click the Agree button to continue.

15

阅读 ‧ 电子书库

The next

screen

allows you

to change

your install

location.

You must

install

Python on

your boot

drive, but

due to

limitations of

the installer,

it does not

enforce this.

In truth, I

have never

had the need to change the install location.

From this screen, you can also customize the installation to exclude certain features. If you want to do this,

click the Customize button; otherwise click the Install button.

16

阅读 ‧ 电子书库

If you

choose a

Custom

Install, the

installer will

present you

with the

following list

of features:

Python

Framework. This is the guts of Python, and is both selected and disabled because it must be installed.

GUI Applications includes IDLE, the graphical Python Shell which you will use throughout this book. I

strongly recommend keeping this option selected.

UNIX command-line tools includes the command-line python3 application. I strongly recommend keeping

this option, too.

Python Documentation contains much of the information on docs.python.org. Recommended if you are on dialup or have limited Internet access.

Shell profile updater controls whether to update your shell profile (used in Terminal.app) to ensure that

this version of Python is on the search path of your shell. You probably don’t need to change this.

Fix system Python should not be changed. (It tells your Mac to use Python 3 as the default Python for all

scripts, including built-in system scripts from Apple. This would be very bad, since most of those scripts are

written for Python 2, and they would fail to run properly under Python 3.)

Click the Install button to continue.

17

阅读 ‧ 电子书库

Because it

installs

system-wide

frameworks

and binaries

in /usr/

local/bin/,

the installer

will ask you

for an

administrative password. There is no way to install Mac Python without administrator privileges.

Click the OK button to begin the installation.

18

阅读 ‧ 电子书库

阅读 ‧ 电子书库

The installer

will display a

progress

meter while

it installs the

features

you’ve

selected.

Assuming all

went well,

the installer

will give you

a big green

checkmark

to tell you

that the

installation

completed

successfully.

19

阅读 ‧ 电子书库

Click the Close button to exit the installer.

Assuming you didn’t change the

install location, you can find the

newly installed files in the

Python 3.1 folder within your

/Applications folder. The

most important piece is IDLE,

the graphical Python Shell.

Double-click IDLE to launch the

Python Shell.

20

阅读 ‧ 电子书库

The Python Shell is where

you will spend most of

your time exploring

Python. Examples

throughout this book will

assume that you can find

your way into the Python

Shell.

[Skip to using the Python

Shell]

0.5. INSTALLING ON UBUNTU LINUX

Modern Linux distributions are backed by vast repositories of precompiled applications, ready to install. The

exact details vary by distribution. In Ubuntu Linux, the easiest way to install Python 3 is through the Add/

Remove application in your Applications menu.

21

阅读 ‧ 电子书库

阅读 ‧ 电子书库

When you first launch the Add/Remove application, it will show you a list of preselected applications in

different categories. Some are already installed; most are not. Because the repository contains over 10,000

applications, there are different filters you can apply to see small parts of the repository. The default filter is

“Canonical-maintained applications,” which is a small subset of the total number of applications that are

officially supported by Canonical, the company that creates and maintains Ubuntu Linux.

22

阅读 ‧ 电子书库

阅读 ‧ 电子书库

Python 3 is not maintained by Canonical, so the first step is to drop down this filter menu and select “All

Open Source applications.”

Once you’ve widened the filter to include all open source applications, use the Search box immediately after

the filter menu to search for Python 3.

23

阅读 ‧ 电子书库

Now the list of applications narrows to just those matching Python 3. You’re going to check two packages.

The first is Python (v3.0). This contains the Python interpreter itself.

The second package you want is immediately above: IDLE (using Python-3.0). This is a graphical Python

Shell that you will use throughout this book.

After you’ve checked those two packages, click the Apply Changes button to continue.

24

阅读 ‧ 电子书库

阅读 ‧ 电子书库

The

package

manager

will ask

you to

confirm

that you

want to

add both

IDLE

(using

Python-3.0) and Python (v3.0).

Click the Apply button to continue.

The package manager will show you a progress meter while it

downloads the necessary packages from Canonical’s Internet

repository.

25

阅读 ‧ 电子书库

阅读 ‧ 电子书库

Once the packages are

downloaded, the package

manager will automatically begin

installing them.

If all went well,

the package

manager will

confirm that

both packages

were successfully

installed. From

here, you can

double-click IDLE

to launch the

Python Shell, or

click the Close button to exit the package manager.

You can always relaunch the Python Shell by going to your Applications menu, then the Programming

submenu, and selecting IDLE.

26

阅读 ‧ 电子书库

The

Python Shell is where you will spend most of your time exploring Python. Examples throughout this book

will assume that you can find your way into the Python Shell.

[Skip to using the Python Shell]

27

0.6. INSTALLING ON OTHER PLATFORMS

Python 3 is available on a number of different platforms. In particular, it is available in virtually every Linux,

B S D , and Solaris-based distribution. For example, RedHat Linux uses the yum package manager. FreeBSD has

its ports and packages collection, SUSE has zypper, and Solaris has pkgadd. A quick web search for Python 3 + your operating system should tell you whether a Python 3 package is available, and if so, how to install it.

0.7. USING THE PYTHON SHELL

The Python Shell is where you can explore Python syntax, get interactive help on commands, and debug

short programs. The graphical Python Shell (named IDLE) also contains a decent text editor that supports

Python syntax coloring and integrates with the Python Shell. If you don’t already have a favorite text editor,

you should give IDLE a try.

First things first. The Python Shell itself is an amazing interactive playground. Throughout this book, you’ll see

examples like this:

>>> 1 + 1

2

The three angle brackets, >>>, denote the Python Shell prompt. Don’t type that part. That’s just to let you

know that this example is meant to be followed in the Python Shell.

1 + 1 is the part you type. You can type any valid Python expression or command in the Python Shell.

Don’t be shy; it won’t bite! The worst that will happen is you’ll get an error message. Commands get

executed immediately (once you press ENTER); expressions get evaluated immediately, and the Python Shell

prints out the result.

2 is the result of evaluating this expression. As it happens, 1 + 1 is a valid Python expression. The result, of

course, is 2.

28

Let’s try another one.

>>> print('Hello world!')

Hello world!

Pretty simple, no? But there’s lots more you can do in the Python shell. If you ever get stuck — you can’t

remember a command, or you can’t remember the proper arguments to pass a certain function — you can

get interactive help in the Python Shell. Just type help and press ENTER.

>>> help

Type help() for interactive help, or help(object) for help about object.

There are two modes of help. You can get help about a single object, which just prints out the

documentation and returns you to the Python Shell prompt. You can also enter help mode, where instead of

evaluating Python expressions, you just type keywords or command names and it will print out whatever it

knows about that command.

To enter the interactive help mode, type help() and press ENTER.

29

>>> help()

Welcome to Python 3.0! This is the online help utility.

If this is your first time using Python, you should definitely check out

the tutorial on the Internet at http://docs.python.org/tutorial/.

Enter the name of any module, keyword, or topic to get help on writing

Python programs and using Python modules. To quit this help utility and

return to the interpreter, just type "quit".

To get a list of available modules, keywords, or topics, type "modules",

"keywords", or "topics". Each module also comes with a one-line summary

of what it does; to list the modules whose summaries contain a given word

such as "spam", type "modules spam".

help>

Note how the prompt changes from >>> to help>. This reminds you that you’re in the interactive help

mode. Now you can enter any keyword, command, module name, function name — pretty much anything

Python understands — and read documentation on it.

30

help> print

Help on built-in function print in module builtins:

print(...)

print(value, ..., sep=' ', end='\n', file=sys.stdout)

Prints the values to a stream, or to sys.stdout by default.

Optional keyword arguments:

file: a file-like object (stream); defaults to the current sys.stdout.

sep: string inserted between values, default a space.

end: string appended after the last value, default a newline.

help> PapayaWhip

no Python documentation found for 'PapayaWhip'

help> quit

You are now leaving help and returning to the Python interpreter.

If you want to ask for help on a particular object directly from the

interpreter, you can type "help(object)". Executing "help('string')"

has the same effect as typing a particular string at the help> prompt.

>>>

1. To get documentation on the print() function, just type print and press ENTER. The interactive help mode

will display something akin to a man page: the function name, a brief synopsis, the function’s arguments and

their default values, and so on. If the documentation seems opaque to you, don’t panic. You’ll learn more

about all these concepts in the next few chapters.

2. Of course, the interactive help mode doesn’t know everything. If you type something that isn’t a Python

command, module, function, or other built-in keyword, the interactive help mode will just shrug its virtual

shoulders.

3. To quit the interactive help mode, type quit and press ENTER.

4. The prompt changes back to >>> to signal that you’ve left the interactive help mode and returned to the

Python Shell.

31

I D L E , the graphical Python Shell, also includes a Python-aware text editor.

0.8. PYTHON EDITORS AND IDES

I D L E is not the only game in town when it comes to writing programs in Python. While it’s useful to get

started with learning the language itself, many developers prefer other text editors or Integrated

Development Environments (IDEs). I won’t cover them here, but the Python community maintains a list of

Python-aware editors that covers a wide range of supported platforms and software licenses.

You might also want to check out the list of Python-aware IDEs, although few of them support Python 3 yet.

One that does is PyDev, a plugin for Eclipse that turns Eclipse into a full-fledged Python IDE. Both Eclipse and PyDev are cross-platform and open source.

On the commercial front, there is ActiveState’s Komodo IDE. It has per-user licensing, but students can get a discount, and a free time-limited trial version is available.

I’ve been programming in Python for nine years, and I edit my Python programs in GNU Emacs and debug them in the command-line Python Shell. There’s no right or wrong way to develop in Python. Find a way

that works for you!

32

CHAPTER 1. YOUR FIRST PYTHON PROGRAM

Don’t bury your burden in saintly silence. You have a problem? Great. Rejoice, dive in, and investigate.

Ven. Henepola Gunaratana

1.1. DIVING IN

ConventiondictatesthatIshouldboreyouwiththefundamentalbuildingblocksofprogramming,sowe

can slowly work up to building something useful. Let’s skip all that. Here is a complete, working Python

program. It probably makes absolutely no sense to you. Don’t worry about that, because you’re going to

dissect it line by line. But read through it first and see what, if anything, you can make of it.

33

SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],

1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}

def approximate_size(size, a_kilobyte_is_1024_bytes=True):

'''Convert a file size to human-readable form.

Keyword arguments:

size -- file size in bytes

a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024

if False, use multiples of 1000

Returns: string

'''

if size < 0:

raise ValueError('number must be non-negative')

multiple = 1024 if a_kilobyte_is_1024_bytes else 1000

for suffix in SUFFIXES[multiple]:

size /= multiple

if size < multiple:

return '{0:.1f} {1}'.format(size, suffix)

raise ValueError('number too large')

if __name__ == '__main__':

print(approximate_size(1000000000000, False))

print(approximate_size(1000000000000))

Now let’s run this program on the command line. On Windows, it will look something like this:

c:\home\diveintopython3\examples> c:\python31\python.exe humansize.py

1.0 TB

931.3 GiB

34

On Mac OS X or Linux, it would look something like this:

you@localhost:~/diveintopython3/examples$ python3 humansize.py

1.0 TB

931.3 GiB

What just happened? You executed your first Python program. You called the Python interpreter on the

command line, and you passed the name of the script you wanted Python to execute. The script defines a

single function, the approximate_size() function, which takes an exact file size in bytes and calculates a

“pretty” (but approximate) size. (You’ve probably seen this in Windows Explorer, or the Mac OS X Finder,

or Nautilus or Dolphin or Thunar on Linux. If you display a folder of documents as a multi-column list, it

will display a table with the document icon, the document name, the size, type, last-modified date, and so on.

If the folder contains a 1093-byte file named TODO, your file manager won’t display TODO 1093 bytes; it’ll say

something like TODO 1 KB instead. That’s what the approximate_size() function does.)

Look at the bottom of the script, and you’ll see two calls to print(approximate_size(arguments)). These

are function calls — first calling the approximate_size() function and passing a number of arguments, then

taking the return value and passing it straight on to the print() function. The print() function is built-in;

you’ll never see an explicit declaration of it. You can just use it, anytime, anywhere. (There are lots of built-

in functions, and lots more functions that are separated into modules. Patience, grasshopper.)

So why does running the script on the command line give you the same output every time? We’ll get to

that. First, let’s look at that approximate_size() function.

1.2. DECLARING FUNCTIONS

Python has functions like most other languages, but it does not have separate header files like C++ or

interface/implementation sections like Pascal. When you need a function, just declare it, like this:

def approximate_size(size, a_kilobyte_is_1024_bytes=True):

35

The keyword def starts the function declaration,

followed by the function name, followed by the

arguments in parentheses. Multiple arguments are

separated with commas.

When you

Also note that the function doesn’t define a return

datatype. Python functions do not specify the datatype

need a

of their return value; they don’t even specify whether or

not they return a value. (In fact, every Python function

function,

returns a value; if the function ever executes a return

statement, it will return that value, otherwise it will

just declare

return None, the Python null value.)

it.

☞ In some languages, functions (that return a

value) start with function, and subroutines

(that do not return a value) start with sub.

There are no subroutines in Python. Everything is a function, all functions return a

value (even if it’s None), and all functions start with def.

The approximate_size() function takes the two arguments — size and

a_kilobyte_is_1024_bytes — but neither argument specifies a datatype. In Python, variables are never

explicitly typed. Python figures out what type a variable is and keeps track of it internally.

☞ In Java and other statically-typed languages, you must specify the datatype of the

function return value and each function argument. In Python, you never explicitly

specify the datatype of anything. Based on what value you assign, Python keeps track

of the datatype internally.

36

1.2.1. OPTIONAL AND NAMED ARGUMENTS

Python allows function arguments to have default values; if the function is called without the argument, the

argument gets its default value. Furthermore, arguments can be specified in any order by using named

arguments.

Let’s take another look at that approximate_size() function declaration:

def approximate_size(size, a_kilobyte_is_1024_bytes=True):

The second argument, a_kilobyte_is_1024_bytes, specifies a default value of True. This means the

argument is optional; you can call the function without it, and Python will act as if you had called it with True

as a second parameter.

Now look at the bottom of the script:

if __name__ == '__main__':

print(approximate_size(1000000000000, False))

print(approximate_size(1000000000000))

1. This calls the approximate_size() function with two arguments. Within the approximate_size() function,

a_kilobyte_is_1024_bytes will be False, since you explicitly passed False as the second argument.

2. This calls the approximate_size() function with only one argument. But that’s OK, because the second

argument is optional! Since the caller doesn’t specify, the second argument defaults to True, as defined by

the function declaration.

You can also pass values into a function by name.

37

>>> from humansize import approximate_size

>>> approximate_size(4000, a_kilobyte_is_1024_bytes=False)

'4.0 KB'

>>> approximate_size(size=4000, a_kilobyte_is_1024_bytes=False)

'4.0 KB'

>>> approximate_size(a_kilobyte_is_1024_bytes=False, size=4000)

'4.0 KB'

>>> approximate_size(a_kilobyte_is_1024_bytes=False, 4000)

File "<stdin>", line 1

SyntaxError: non-keyword arg after keyword arg

>>> approximate_size(size=4000, False)

File "<stdin>", line 1

SyntaxError: non-keyword arg after keyword arg

1. This calls the approximate_size() function with 4000 for the first argument (size) and False for the

argument named a_kilobyte_is_1024_bytes. (That happens to be the second argument, but doesn’t

matter, as you’ll see in a minute.)

2. This calls the approximate_size() function with 4000 for the argument named size and False for the

argument named a_kilobyte_is_1024_bytes. (These named arguments happen to be in the same order as

the arguments are listed in the function declaration, but that doesn’t matter either.)

3. This calls the approximate_size() function with False for the argument named

a_kilobyte_is_1024_bytes and 4000 for the argument named size. (See? I told you the order didn’t

matter.)

4. This call fails, because you have a named argument followed by an unnamed (positional) argument, and that

never works. Reading the argument list from left to right, once you have a single named argument, the rest

of the arguments must also be named.

5. This call fails too, for the same reason as the previous call. Is that surprising? After all, you passed 4000 for

the argument named size, then “obviously” that False value was meant for the

a_kilobyte_is_1024_bytes argument. But Python doesn’t work that way. As soon as you have a named

argument, all arguments to the right of that need to be named arguments, too.

38

1.3. WRITING READABLE CODE

I won’t bore you with a long finger-wagging speech about the importance of documenting your code. Just

know that code is written once but read many times, and the most important audience for your code is

yourself, six months after writing it ( i.e. after you’ve forgotten everything but need to fix something). Python

makes it easy to write readable code, so take advantage of it. You’ll thank me in six months.

1.3.1. DOCUMENTATION STRINGS

You can document a Python function by giving it a documentation string (docstring for short). In this

program, the approximate_size() function has a docstring:

def approximate_size(size, a_kilobyte_is_1024_bytes=True):

'''Convert a file size to human-readable form.

Keyword arguments:

size -- file size in bytes

a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024

if False, use multiples of 1000

Returns: string

'''

Triple quotes signify a multi-line string. Everything

between the start and end quotes is part of a single

string, including carriage returns, leading white space,

and other quote characters. You can use them

anywhere, but you’ll see them most often used when

Every

defining a docstring.

function

39

Triple quotes are also an easy way to define

a string with both single and double quotes,

like qq/.../ in Perl 5.

deserves a

Everything between the triple quotes is the function’s

docstring, which documents what the function does. A

decent

docstring, if it exists, must be the first thing defined in

a function (that is, on the next line after the function

docstring.

declaration). You don’t technically need to give your

function a docstring, but you always should. I know

you’ve heard this in every programming class you’ve

ever taken, but Python gives you an added incentive: the

docstring is available at runtime as an attribute of the function.

☞ Many Python IDEs use the docstring to provide context-sensitive documentation, so

that when you type a function name, its docstring appears as a tooltip. This can be

incredibly helpful, but it’s only as good as the docstrings you write.

1.4. THE import SEARCH PATH

Before this goes any further, I want to briefly mention the library search path. Python looks in several places

when you try to import a module. Specifically, it looks in all the directories defined in sys.path. This is just

a list, and you can easily view it or modify it with standard list methods. (You’ll learn more about lists in

Native Datatypes.)

40

>>> import sys

>>> sys.path

['',

'/usr/lib/python31.zip',

'/usr/lib/python3.1',

'/usr/lib/python3.1/plat-linux2@EXTRAMACHDEPPATH@',

'/usr/lib/python3.1/lib-dynload',

'/usr/lib/python3.1/dist-packages',

'/usr/local/lib/python3.1/dist-packages']

>>> sys

<module 'sys' (built-in)>

>>> sys.path.insert(0, '/home/mark/diveintopython3/examples')

>>> sys.path

['/home/mark/diveintopython3/examples',

'',

'/usr/lib/python31.zip',

'/usr/lib/python3.1',

'/usr/lib/python3.1/plat-linux2@EXTRAMACHDEPPATH@',

'/usr/lib/python3.1/lib-dynload',

'/usr/lib/python3.1/dist-packages',

'/usr/local/lib/python3.1/dist-packages']

1. Importing the sys module makes all of its functions and attributes available.

2. sys.path is a list of directory names that constitute the current search path. (Yours will look different,

depending on your operating system, what version of Python you’re running, and where it was originally

installed.) Python will look through these directories (in this order) for a .py file whose name matches what

you’re trying to import.

3. Actually, I lied; the truth is more complicated than that, because not all modules are stored as .py files.

Some are built-in modules; they are actually baked right into Python itself. Built-in modules behave just like

regular modules, but their Python source code is not available, because they are not written in Python! (Like

Python itself, these built-in modules are written in C.)

4. You can add a new directory to Python’s search path at runtime by adding the directory name to sys.path,

and then Python will look in that directory as well, whenever you try to import a module. The effect lasts as

long as Python is running.

41

5. By using sys.path.insert(0, new_path), you inserted a new directory as the first item of the sys.path

list, and therefore at the beginning of Python’s search path. This is almost always what you want. In case of

naming conflicts (for example, if Python ships with version 2 of a particular library but you want to use

version 3), this ensures that your modules will be found and used instead of the modules that came with

Python.

1.5. EVERYTHING IS AN OBJECT

In case you missed it, I just said that Python functions have attributes, and that those attributes are available

at runtime. A function, like everything else in Python, is an object.

Run the interactive Python shell and follow along:

>>> import humansize

>>> print(humansize.approximate_size(4096, True))

4.0 KiB

>>> print(humansize.approximate_size.__doc__)

Convert a file size to human-readable form.

Keyword arguments:

size -- file size in bytes

a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024

if False, use multiples of 1000

Returns: string

1. The first line imports the humansize program as a module — a chunk of code that you can use interactively,

or from a larger Python program. Once you import a module, you can reference any of its public functions,

classes, or attributes. Modules can do this to access functionality in other modules, and you can do it in the

42

Python interactive shell too. This is an important concept, and you’ll see a lot more of it throughout this

book.

2. When you want to use functions defined in imported modules, you need to include the module name. So

you can’t just say approximate_size; it must be humansize.approximate_size. If you’ve used classes in

Java, this should feel vaguely familiar.

3. Instead of calling the function as you would expect to, you asked for one of the function’s attributes,

__doc__.

☞ import in Python is like require in Perl. Once you import a Python module, you

access its functions with module.function; once you require a Perl module, you

access its functions with module::function.

1.5.1. WHAT’S AN OBJECT?

Everything in Python is an object, and everything can have attributes and methods. All functions have a built-

in attribute __doc__, which returns the docstring defined in the function’s source code. The sys module is

an object which has (among other things) an attribute called path. And so forth.

Still, this doesn’t answer the more fundamental question: what is an object? Different programming languages

define “object” in different ways. In some, it means that all objects must have attributes and methods; in

others, it means that all objects are subclassable. In Python, the definition is looser. Some objects have

neither attributes nor methods, but they could. Not all objects are subclassable. But everything is an object in

the sense that it can be assigned to a variable or passed as an argument to a function.

You may have heard the term “first-class object” in other programming contexts. In Python, functions are

first-class objects. You can pass a function as an argument to another function. Modules are first-class objects.

You can pass an entire module as an argument to a function. Classes are first-class objects, and individual

instances of a class are also first-class objects.

This is important, so I’m going to repeat it in case you missed it the first few times: everything in Python is an

object. Strings are objects. Lists are objects. Functions are objects. Classes are objects. Class instances are

objects. Even modules are objects.

43

1.6. INDENTING CODE

Python functions have no explicit begin or end, and no curly braces to mark where the function code starts

and stops. The only delimiter is a colon (:) and the indentation of the code itself.

def approximate_size(size, a_kilobyte_is_1024_bytes=True):

if size < 0:

raise ValueError('number must be non-negative')

multiple = 1024 if a_kilobyte_is_1024_bytes else 1000

for suffix in SUFFIXES[multiple]:

size /= multiple

if size < multiple:

return '{0:.1f} {1}'.format(size, suffix)

raise ValueError('number too large')

1. Code blocks are defined by their indentation. By “code block,” I mean functions, if statements, for loops,

while loops, and so forth. Indenting starts a block and unindenting ends it. There are no explicit braces,

brackets, or keywords. This means that whitespace is significant, and must be consistent. In this example, the

function code is indented four spaces. It doesn’t need to be four spaces, it just needs to be consistent. The

first line that is not indented marks the end of the function.

2. In Python, an if statement is followed by a code block. If the if expression evaluates to true, the indented

block is executed, otherwise it falls to the else block (if any). Note the lack of parentheses around the

expression.

3. This line is inside the if code block. This raise statement will raise an exception (of type ValueError), but

only if size < 0.

4. This is not the end of the function. Completely blank lines don’t count. They can make the code more

readable, but they don’t count as code block delimiters. The function continues on the next line.

44

5. The for loop also marks the start of a code block. Code blocks can contain multiple lines, as long as they

are all indented the same amount. This for loop has three lines of code in it. There is no other special

syntax for multi-line code blocks. Just indent and get on with your life.

After some initial protests and several snide analogies to Fortran, you will make peace with this and start

seeing its benefits. One major benefit is that all Python programs look similar, since indentation is a language

requirement and not a matter of style. This makes it easier to read and understand other people’s Python

code.

☞ Python uses carriage returns to separate statements and a colon and indentation to

separate code blocks. C++ and Java use semicolons to separate statements and curly

braces to separate code blocks.

1.7. EXCEPTIONS

Exceptions are everywhere in Python. Virtually every module in the standard Python library uses them, and

Python itself will raise them in a lot of different circumstances. You’ll see them repeatedly throughout this

book.

What is an exception? Usually it’s an error, an indication that something went wrong. (Not all exceptions are

errors, but never mind that for now.) Some programming languages encourage the use of error return

codes, which you check. Python encourages the use of exceptions, which you handle.

When an error occurs in the Python Shell, it prints out some details about the exception and how it

happened, and that’s that. This is called an unhandled exception. When the exception was raised, there was

no code to explicitly notice it and deal with it, so it bubbled its way back up to the top level of the Python

Shell, which spits out some debugging information and calls it a day. In the shell, that's no big deal, but if that

happened while your actual Python program was running, the entire program would come to a screeching

halt if nothing handles the exception. Maybe that’s what you want, maybe it isn’t.

45

☞ Unlike Java, Python functions don’t declare which exceptions they might raise. It’s up

to you to determine what possible exceptions you need to catch.

An exception doesn’t need to result in a complete program crash, though. Exceptions can be handled.

Sometimes an exception is really because you have a bug in your code (like accessing a variable that doesn’t

exist), but sometimes an exception is something you can anticipate. If you’re opening a file, it might not

exist. If you’re importing a module, it might not be installed. If you’re connecting to a database, it might be

unavailable, or you might not have the correct security credentials to access it. If you know a line of code

may raise an exception, you should handle the exception using a try...except block.

☞ Python uses try...except blocks to handle exceptions, and the raise statement to

generate them. Java and C++ use try...catch blocks to handle exceptions, and the

throw statement to generate them.

The approximate_size() function raises exceptions in two different cases: if the given size is larger than

the function is designed to handle, or if it’s less than zero.

if size < 0:

raise ValueError('number must be non-negative')

The syntax for raising an exception is simple enough. Use the raise statement, followed by the exception

name, and an optional human-readable string for debugging purposes. The syntax is reminiscent of calling a

function. (In reality, exceptions are implemented as classes, and this raise statement is actually creating an

instance of the ValueError class and passing the string 'number must be non-negative' to its initialization

method. But we’re getting ahead of ourselves!)

☞ You don’t need to handle an exception in the function that raises it. If one function

doesn’t handle it, the exception is passed to the calling function, then that function’s

calling function, and so on “up the stack.” If the exception is never handled, your

program will crash, Python will print a “traceback” to standard error, and that’s the

46

end of that. Again, maybe that’s what you want; it depends on what your program

does.

1.7.1. CATCHING IMPORT ERRORS

One of Python’s built-in exceptions is ImportError, which is raised when you try to import a module and

fail. This can happen for a variety of reasons, but the simplest case is when the module doesn’t exist in your

import search path. You can use this to include optional features in your program. For example, the

chardet library provides character encoding auto-detection. Perhaps your program wants to use this library if it exists, but continue gracefully if the user hasn’t installed it. You can do this with a try..except block.

try:

import chardet

except ImportError:

chardet = None

Later, you can check for the presence of the chardet module with a simple if statement:

if chardet:

# do something

else:

# continue anyway

Another common use of the ImportError exception is when two modules implement a common API, but

one is more desirable than the other. (Maybe it’s faster, or it uses less memory.) You can try to import one

module but fall back to a different module if the first import fails. For example, the XML chapter talks about two modules that implement a common API, called the ElementTree API. The first, lxml, is a third-party

module that you need to download and install yourself. The second, xml.etree.ElementTree, is slower but

is part of the Python 3 standard library.

47

try:

from lxml import etree

except ImportError:

import xml.etree.ElementTree as etree

By the end of this try..except block, you have imported some module and named it etree. Since both

modules implement a common API, the rest of your code doesn’t need to keep checking which module got

imported. And since the module that did get imported is always called etree, the rest of your code doesn’t

need to be littered with if statements to call differently-named modules.

1.8. UNBOUND VARIABLES

Take another look at this line of code from the approximate_size() function:

multiple = 1024 if a_kilobyte_is_1024_bytes else 1000

You never declare the variable multiple, you just assign a value to it. That’s OK, because Python lets you

do that. What Python will not let you do is reference a variable that has never been assigned a value. Trying

to do so will raise a NameError exception.

>>> x

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

NameError: name 'x' is not defined

>>> x = 1

>>> x

1

You will thank Python for this one day.

48

1.9. EVERYTHING IS CASE-SENSITIVE

All names in Python are case-sensitive: variable names, function names, class names, module names, exception

names. If you can get it, set it, call it, construct it, import it, or raise it, it’s case-sensitive.

>>> an_integer = 1

>>> an_integer

1

>>> AN_INTEGER

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

NameError: name 'AN_INTEGER' is not defined

>>> An_Integer

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

NameError: name 'An_Integer' is not defined

>>> an_inteGer

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

NameError: name 'an_inteGer' is not defined

And so on.

49

1.10. RUNNING SCRIPTS

Python modules are objects and have several useful

attributes. You can use this to easily test your modules

as you write them, by including a special block of code

that executes when you run the Python file on the

command line. Take the last few lines of humansize.py:

Everything

if __name__ == '__main__':

in Python is

an object.

print(approximate_size(1000000000000, False))

print(approximate_size(1000000000000))

☞ Like C, Python uses == for comparison and = for assignment. Unlike C, Python does

not support in-line assignment, so there’s no chance of accidentally assigning the value

you thought you were comparing.

So what makes this if statement special? Well, modules are objects, and all modules have a built-in attribute

__name__. A module’s __name__ depends on how you’re using the module. If you import the module, then

__name__ is the module’s filename, without a directory path or file extension.

>>> import humansize

>>> humansize.__name__

'humansize'

50

But you can also run the module directly as a standalone program, in which case __name__ will be a special

default value, __main__. Python will evaluate this if statement, find a true expression, and execute the if

code block. In this case, to print two values.

c:\home\diveintopython3> c:\python31\python.exe humansize.py

1.0 TB

931.3 GiB

And that’s your first Python program!

1.11. FURTHER READING

PEP 257: Docstring Conventions explains what distinguishes a good docstring from a great docstring.

Python Tutorial: Documentation Strings also touches on the subject.

PEP 8: Style Guide for Python Code discusses good indentation style.

Python Reference Manual explains what it means to say that everything in Python is an object, because some people are pedants and like to discuss that sort of thing at great length.

51

CHAPTER 2. NATIVE DATATYPES

Wonder is the foundation of all philosophy, inquiry its progress, ignorance its end.

— Michel de Montaigne

2.1. DIVING IN

Datatypes.SetasideyourfirstPythonprogramforjustaminute,andlet’stalkaboutdatatypes.In

Python, every value has a datatype, but you don’t need to declare the datatype of variables. How does that work? Based on each variable’s original assignment, Python figures out what type it is and keeps tracks of

that internally.

Python has many native datatypes. Here are the important ones:

1. Booleans are either True or False.

2. Numbers can be integers (1 and 2), floats (1.1 and 1.2), fractions (1/2 and 2/3), or even complex

numbers.

3. Strings are sequences of Unicode characters, e.g. an HTML document.

4. Bytes and byte arrays, e.g. a JPEG image file.

5. Lists are ordered sequences of values.

6. Tuples are ordered, immutable sequences of values.

7. Sets are unordered bags of values.

8. Dictionaries are unordered bags of key-value pairs.

Of course, there are more types than these. Everything is an object in Python, so there are types like module, function, class, method, file, and even compiled code. You’ve already seen some of these: modules have

names, functions have docstrings, & c. You’ll learn about classes in Classes & Iterators, and about files in

Files.

52

Strings and bytes are important enough — and complicated enough — that they get their own chapter. Let’s

look at the others first.

2.2. BOOLEANS

Booleans are either true or false. Python has two

constants, cleverly named True and False, which can be

used to assign boolean values directly. Expressions can

also evaluate to a boolean value. In certain places (like

if statements), Python expects an expression to

You can use

evaluate to a boolean value. These places are called

boolean contexts. You can use virtually any expression in

virtually

a boolean context, and Python will try to determine its

truth value. Different datatypes have different rules

any

about which values are true or false in a boolean

context. (This will make more sense once you see some

expression

concrete examples later in this chapter.)

in a boolean

For example, take this snippet from humansize.py:

if size < 0:

context.

raise ValueError('number must be non-negative')

size is an integer, 0 is an integer, and < is a numerical operator. The result of the expression size < 0 is

always a boolean. You can test this yourself in the Python interactive shell:

53

>>> size = 1

>>> size < 0

False

>>> size = 0

>>> size < 0

False

>>> size = -1

>>> size < 0

True

Due to some legacy issues left over from Python 2, booleans can be treated as numbers. True is 1; False is

0.

>>> True + True

2

>>> True - False

1

>>> True * False

0

>>> True / False

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

ZeroDivisionError: int division or modulo by zero

Ew, ew, ew! Don’t do that. Forget I even mentioned it.

54

2.3. NUMBERS

Numbers are awesome. There are so many to choose from. Python supports both integers and floating point

numbers. There’s no type declaration to distinguish them; Python tells them apart by the presence or

absence of a decimal point.

>>> type(1)

<class 'int'>

>>> isinstance(1, int)

True

>>> 1 + 1

2

>>> 1 + 1.0

2.0

>>> type(2.0)

<class 'float'>

1. You can use the type() function to check the type of any value or variable. As you might expect, 1 is an

int.

2. Similarly, you can use the isinstance() function to check whether a value or variable is of a given type.

3. Adding an int to an int yields an int.

4. Adding an int to a float yields a float. Python coerces the int into a float to perform the addition,

then returns a float as the result.

2.3.1. COERCING INTEGERS TO FLOATS AND VICE-VERSA

As you just saw, some operators (like addition) will coerce integers to floating point numbers as needed.

You can also coerce them by yourself.

55

>>> float(2)

2.0

>>> int(2.0)

2

>>> int(2.5)

2

>>> int(-2.5)

-2

>>> 1.12345678901234567890

1.1234567890123457

>>> type(1000000000000000)

<class 'int'>

1. You can explicitly coerce an int to a float by calling the float() function.

2. Unsurprisingly, you can also coerce a float to an int by calling int().

3. The int() function will truncate, not round.

4. The int() function truncates negative numbers towards 0. It’s a true truncate function, not a floor function.

5. Floating point numbers are accurate to 15 decimal places.

6. Integers can be arbitrarily large.

☞ Python 2 had separate types for int and long. The int datatype was limited by

sys.maxint, which varied by platform but was usually 232-1. Python 3 has just one

integer type, which behaves mostly like the old long type from Python 2. See PEP

237 for details.

2.3.2. COMMON NUMERICAL OPERATIONS

You can do all kinds of things with numbers.

56

>>> 11 / 2

5.5

>>> 11 // 2

5

>>> −11 // 2

−6

>>> 11.0 // 2

5.0

>>> 11 ** 2

121

>>> 11 % 2

1

1. The / operator performs floating point division. It returns a float even if both the numerator and

denominator are ints.

2. The // operator performs a quirky kind of integer division. When the result is positive, you can think of it

as truncating (not rounding) to 0 decimal places, but be careful with that.

3. When integer-dividing negative numbers, the // operator rounds “up” to the nearest integer. Mathematically

speaking, it’s rounding “down” since −6 is less than −5, but it could trip you up if you were expecting it to

truncate to −5.

4. The // operator doesn’t always return an integer. If either the numerator or denominator is a float, it will

still round to the nearest integer, but the actual return value will be a float.

5. The ** operator means “raised to the power of.” 112 is 121.

6. The % operator gives the remainder after performing integer division. 11 divided by 2 is 5 with a remainder

of 1, so the result here is 1.

☞ In Python 2, the / operator usually meant integer division, but you could make it

behave like floating point division by including a special directive in your code. In

Python 3, the / operator always means floating point division. See PEP 238 for details.

57

2.3.3. FRACTIONS

Python isn’t limited to integers and floating point numbers. It can also do all the fancy math you learned in

high school and promptly forgot about.

>>> import fractions

>>> x = fractions.Fraction(1, 3)

>>> x

Fraction(1, 3)

>>> x * 2

Fraction(2, 3)

>>> fractions.Fraction(6, 4)

Fraction(3, 2)

>>> fractions.Fraction(0, 0)

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File "fractions.py", line 96, in __new__

raise ZeroDivisionError('Fraction(%s, 0)' % numerator)

ZeroDivisionError: Fraction(0, 0)

1. To start using fractions, import the fractions module.

2. To define a fraction, create a Fraction object and pass in the numerator and denominator.

3. You can perform all the usual mathematical operations with fractions. Operations return a new Fraction

object. 2 * (1/3) = (2/3)

4. The Fraction object will automatically reduce fractions. (6/4) = (3/2)

5. Python has the good sense not to create a fraction with a zero denominator.

2.3.4. TRIGONOMETRY

You can also do basic trigonometry in Python.

58

>>> import math

>>> math.pi

3.1415926535897931

>>> math.sin(math.pi / 2)

1.0

>>> math.tan(math.pi / 4)

0.99999999999999989

1. The math module has a constant for π, the ratio of a circle’s circumference to its diameter.

2. The math module has all the basic trigonometric functions, including sin(), cos(), tan(), and variants like

asin().

3. Note, however, that Python does not have infinite precision. tan(π / 4) should return 1.0, not

0.99999999999999989.

2.3.5. NUMBERS IN A BOOLEAN CONTEXT

You can use numbers in a boolean context, such as an

if statement. Zero values are false, and non-zero values

are true.

Zero values

are false,

and non-

zero values

are true.

59

>>> def is_it_true(anything):

...

if anything:

...

print("yes, it's true")

...

else:

...

print("no, it's false")

...

>>> is_it_true(1)

yes, it's true

>>> is_it_true(-1)

yes, it's true

>>> is_it_true(0)

no, it's false

>>> is_it_true(0.1)

yes, it's true

>>> is_it_true(0.0)

no, it's false

>>> import fractions

>>> is_it_true(fractions.Fraction(1, 2))

yes, it's true

>>> is_it_true(fractions.Fraction(0, 1))

no, it's false

1. Did you know you can define your own functions in the Python interactive shell? Just press ENTER at the end

of each line, and ENTER on a blank line to finish.

2. In a boolean context, non-zero integers are true; 0 is false.

3. Non-zero floating point numbers are true; 0.0 is false. Be careful with this one! If there’s the slightest

rounding error (not impossible, as you saw in the previous section) then Python will be testing

0.0000000000001 instead of 0 and will return True.

4. Fractions can also be used in a boolean context. Fraction(0, n) is false for all values of n. All other

fractions are true.

60

2.4. LISTS

Lists are Python’s workhorse datatype. When I say “list,” you might be thinking “array whose size I have to

declare in advance, that can only contain items of the same type, & c.” Don’t think that. Lists are much

cooler than that.

☞ A list in Python is like an array in Perl 5. In Perl 5, variables that store arrays always

start with the @ character; in Python, variables can be named anything, and Python

keeps track of the datatype internally.

☞ A list in Python is much more than an array in Java (although it can be used as one if

that’s really all you want out of life). A better analogy would be to the ArrayList

class, which can hold arbitrary objects and can expand dynamically as new items are

added.

2.4.1. CREATING A LIST

Creating a list is easy: use square brackets to wrap a comma-separated list of values.

>>> a_list = ['a', 'b', 'mpilgrim', 'z', 'example']

>>> a_list

['a', 'b', 'mpilgrim', 'z', 'example']

>>> a_list[0]

'a'

>>> a_list[4]

'example'

>>> a_list[-1]

'example'

>>> a_list[-3]

'mpilgrim'

61

1. First, you define a list of five items. Note that they retain their original order. This is not an accident. A list

is an ordered set of items.

2. A list can be used like a zero-based array. The first item of any non-empty list is always a_list[0].

3. The last item of this five-item list is a_list[4], because lists are always zero-based.

4. A negative index accesses items from the end of the list counting backwards. The last item of any non-empty

list is always a_list[-1].

5. If the negative index is confusing to you, think of it this way: a_list[-n] == a_list[len(a_list) - n]. So

in this list, a_list[-3] == a_list[5 - 3] == a_list[2].

2.4.2. SLICING A LIST

Once you’ve defined a list, you can get any part of it as

a new list. This is called slicing the list.

>>> a_list

['a', 'b', 'mpilgrim', 'z', 'example']

a_list[0] is

>>> a_list[1:3]

['b', 'mpilgrim']

the first

>>> a_list[1:-1]

['b', 'mpilgrim', 'z']

item of

>>> a_list[0:3]

['a', 'b', 'mpilgrim']

a_list.

>>> a_list[:3]

['a', 'b', 'mpilgrim']

>>> a_list[3:]

['z', 'example']

>>> a_list[:]

['a', 'b', 'mpilgrim', 'z', 'example']

1. You can get a part of a list, called a “slice”, by specifying two indices. The return value is a new list

containing all the items of the list, in order, starting with the first slice index (in this case a_list[1]), up to

but not including the second slice index (in this case a_list[3]).

62

2. Slicing works if one or both of the slice indices is negative. If it helps, you can think of it this way: reading

the list from left to right, the first slice index specifies the first item you want, and the second slice index

specifies the first item you don’t want. The return value is everything in between.

3. Lists are zero-based, so a_list[0:3] returns the first three items of the list, starting at a_list[0], up to

but not including a_list[3].

4. If the left slice index is 0, you can leave it out, and 0 is implied. So a_list[:3] is the same as a_list[0:3],

because the starting 0 is implied.

5. Similarly, if the right slice index is the length of the list, you can leave it out. So a_list[3:] is the same as

a_list[3:5], because this list has five items. There is a pleasing symmetry here. In this five-item list,

a_list[:3] returns the first 3 items, and a_list[3:] returns the last two items. In fact, a_list[:n] will

always return the first n items, and a_list[n:] will return the rest, regardless of the length of the list.

6. If both slice indices are left out, all items of the list are included. But this is not the same as the original

a_list variable. It is a new list that happens to have all the same items. a_list[:] is shorthand for making

a complete copy of a list.

2.4.3. ADDING ITEMS TO A LIST

There are four ways to add items to a list.

>>> a_list = ['a']

>>> a_list = a_list + [2.0, 3]

>>> a_list

['a', 2.0, 3]

>>> a_list.append(True)

>>> a_list

['a', 2.0, 3, True]

>>> a_list.extend(['four', 'Ω'])

>>> a_list

['a', 2.0, 3, True, 'four', 'Ω']

>>> a_list.insert(0, 'Ω')

>>> a_list

['Ω', 'a', 2.0, 3, True, 'four', 'Ω']

63

1. The + operator concatenates lists to create a new list. A list can contain any number of items; there is no

size limit (other than available memory). However, if memory is a concern, you should be aware that list

concatenation creates a second list in memory. In this case, that new list is immediately assigned to the

existing variable a_list. So this line of code is really a two-step process — concatenation then

assignment — which can (temporarily) consume a lot of memory when you’re dealing with large lists.

2. A list can contain items of any datatype, and the items in a single list don’t all need to be the same type.

Here we have a list containing a string, a floating point number, and an integer.

3. The append() method adds a single item to the end of the list. (Now we have four different datatypes in the

list!)

4. Lists are implemented as classes. “Creating” a list is really instantiating a class. As such, a list has methods

that operate on it. The extend() method takes one argument, a list, and appends each of the items of the

argument to the original list.

5. The insert() method inserts a single item into a list. The first argument is the index of the first item in the

list that will get bumped out of position. List items do not need to be unique; for example, there are now

two separate items with the value 'Ω': the first item, a_list[0], and the last item, a_list[6].

☞ a_list.insert(0, value) is like the unshift() function in Perl. It adds an item to

the beginning of the list, and all the other items have their positional index bumped

up to make room.

Let’s look closer at the difference between append() and extend().

64

>>> a_list = ['a', 'b', 'c']

>>> a_list.extend(['d', 'e', 'f'])

>>> a_list

['a', 'b', 'c', 'd', 'e', 'f']

>>> len(a_list)

6

>>> a_list[-1]

'f'

>>> a_list.append(['g', 'h', 'i'])

>>> a_list

['a', 'b', 'c', 'd', 'e', 'f', ['g', 'h', 'i']]

>>> len(a_list)

7

>>> a_list[-1]

['g', 'h', 'i']

1. The extend() method takes a single argument, which is always a list, and adds each of the items of that list

to a_list.

2. If you start with a list of three items and extend it with a list of another three items, you end up with a list

of six items.

3. On the other hand, the append() method takes a single argument, which can be any datatype. Here, you’re

calling the append() method with a list of three items.

4. If you start with a list of six items and append a list onto it, you end up with... a list of seven items. Why

seven? Because the last item (which you just appended) is itself a list. Lists can contain any type of data,

including other lists. That may be what you want, or it may not. But it’s what you asked for, and it’s what

you got.

65

2.4.4. SEARCHING FOR VALUES IN A LIST

>>> a_list = ['a', 'b', 'new', 'mpilgrim', 'new']

>>> a_list.count('new')

2

>>> 'new' in a_list

True

>>> 'c' in a_list

False

>>> a_list.index('mpilgrim')

3

>>> a_list.index('new')

2

>>> a_list.index('c')

Traceback (innermost last):

File "<interactive input>", line 1, in ?

ValueError: list.index(x): x not in list

1. As you might expect, the count() method returns the number of occurrences of a specific value in a list.

2. If all you want to know is whether a value is in the list or not, the in operator is slightly faster than using

the count() method. The in operator always returns True or False; it will not tell you how many times

the value appears in the list.

3. Neither the in operator nor the count() method will tell you where in the list a value appears. If you need

to know where in the list a value is, call the index() method. By default it will search the entire list,

although you can specify an optional second argument of the (0-based) index to start from, and even an

optional third argument of the (0-based) index to stop searching.

4. The index() method finds the first occurrence of a value in the list. In this case, 'new' occurs twice in the

list, in a_list[2] and a_list[4], but the index() method will return only the index of the first

occurrence.

5. As you might not expect, if the value is not found in the list, the index() method will raise an exception.

Wait, what? That’s right: the index() method raises an exception if it doesn’t find the value in the list. This

is notably different from most languages, which will return some invalid index (like -1). While this may seem

annoying at first, I think you will come to appreciate it. It means your program will crash at the source of

66

the problem instead of failing strangely and silently later. Remember, -1 is a valid list index. If the index() method returned -1, that could lead to some not-so-fun debugging sessions!

2.4.5. REMOVING ITEMS FROM A LIST

Lists can expand and contract automatically. You’ve seen

the expansion part. There are several different ways to

remove items from a list as well.

Lists never

have gaps.

>>> a_list = ['a', 'b', 'new', 'mpilgrim', 'new']

>>> a_list[1]

'b'

>>> del a_list[1]

>>> a_list

['a', 'new', 'mpilgrim', 'new']

>>> a_list[1]

'new'

1. You can use the del statement to delete a specific item from a list.

2. Accessing index 1 after deleting index 1 does not result in an error. All items after the deleted item shift

their positional index to “fill the gap” created by deleting the item.

Don’t know the positional index? Not a problem; you can remove items by value instead.

67

>>> a_list.remove('new')

>>> a_list

['a', 'mpilgrim', 'new']

>>> a_list.remove('new')

>>> a_list

['a', 'mpilgrim']

>>> a_list.remove('new')

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

ValueError: list.remove(x): x not in list

1. You can also remove an item from a list with the remove() method. The remove() method takes a value

and removes the first occurrence of that value from the list. Again, all items after the deleted item will have

their positional indices bumped down to “fill the gap.” Lists never have gaps.

2. You can call the remove() method as often as you like, but it will raise an exception if you try to remove a

value that isn’t in the list.

2.4.6. REMOVING ITEMS FROM A LIST: BONUS ROUND

Another interesting list method is pop(). The pop() method is yet another way to remove items from a list,

but with a twist.

68

>>> a_list = ['a', 'b', 'new', 'mpilgrim']

>>> a_list.pop()

'mpilgrim'

>>> a_list

['a', 'b', 'new']

>>> a_list.pop(1)

'b'

>>> a_list

['a', 'new']

>>> a_list.pop()

'new'

>>> a_list.pop()

'a'

>>> a_list.pop()

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

IndexError: pop from empty list

1. When called without arguments, the pop() list method removes the last item in the list and returns the value

it removed.

2. You can pop arbitrary items from a list. Just pass a positional index to the pop() method. It will remove

that item, shift all the items after it to “fill the gap,” and return the value it removed.

3. Calling pop() on an empty list raises an exception.

☞ Calling the pop() list method without an argument is like the pop() function in Perl.

It removes the last item from the list and returns the value of the removed item.

Perl has another function, shift(), which removes the first item and returns its

value; in Python, this is equivalent to a_list.pop(0).

69

2.4.7. LISTS IN A BOOLEAN CONTEXT

You can also use a list in a boolean context, such as an

if statement.

>>> def is_it_true(anything):

...

if anything:

Empty lists

...

print("yes, it's true")

...

else:

are false; all

...

print("no, it's false")

...

other lists

>>> is_it_true([])

no, it's false

are true.

>>> is_it_true(['a'])

yes, it's true

>>> is_it_true([False])

yes, it's true

1. In a boolean context, an empty list is false.

2. Any list with at least one item is true.

3. Any list with at least one item is true. The value of the items is irrelevant.

2.5. TUPLES

A tuple is an immutable list. A tuple can not be changed in any way once it is created.

70

>>> a_tuple = ("a", "b", "mpilgrim", "z", "example")

>>> a_tuple

('a', 'b', 'mpilgrim', 'z', 'example')

>>> a_tuple[0]

'a'

>>> a_tuple[-1]

'example'

>>> a_tuple[1:3]

('b', 'mpilgrim')

1. A tuple is defined in the same way as a list, except that the whole set of elements is enclosed in parentheses

instead of square brackets.

2. The elements of a tuple have a defined order, just like a list. Tuple indices are zero-based, just like a list, so

the first element of a non-empty tuple is always a_tuple[0].

3. Negative indices count from the end of the tuple, just like a list.

4. Slicing works too, just like a list. When you slice a list, you get a new list; when you slice a tuple, you get a

new tuple.

The major difference between tuples and lists is that tuples can not be changed. In technical terms, tuples

are immutable. In practical terms, they have no methods that would allow you to change them. Lists have

methods like append(), extend(), insert(), remove(), and pop(). Tuples have none of these methods.

You can slice a tuple (because that creates a new tuple), and you can check whether a tuple contains a

particular value (because that doesn’t change the tuple), and… that’s about it.

71

# continued from the previous example

>>> a_tuple

('a', 'b', 'mpilgrim', 'z', 'example')

>>> a_tuple.append("new")

Traceback (innermost last):

File "<interactive input>", line 1, in ?

AttributeError: 'tuple' object has no attribute 'append'

>>> a_tuple.remove("z")

Traceback (innermost last):

File "<interactive input>", line 1, in ?

AttributeError: 'tuple' object has no attribute 'remove'

>>> a_tuple.index("example")

4

>>> "z" in a_tuple

True

1. You can’t add elements to a tuple. Tuples have no append() or extend() method.

2. You can’t remove elements from a tuple. Tuples have no remove() or pop() method.

3. You can find elements in a tuple, since this doesn’t change the tuple.

4. You can also use the in operator to check if an element exists in the tuple.

So what are tuples good for?

• Tuples are faster than lists. If you’re defining a constant set of values and all you’re ever going to do with it

is iterate through it, use a tuple instead of a list.

• It makes your code safer if you “write-protect” data that doesn’t need to be changed. Using a tuple instead

of a list is like having an implied assert statement that shows this data is constant, and that special thought

(and a specific function) is required to override that.

• Some tuples can be used as dictionary keys (specifically, tuples that contain immutable values like strings,

numbers, and other tuples). Lists can never be used as dictionary keys, because lists are not immutable.

72

Tuples can be converted into lists, and vice-versa. The built-in tuple() function takes

a list and returns a tuple with the same elements, and the list() function takes a

tuple and returns a list. In effect, tuple() freezes a list, and list() thaws a tuple.

2.5.1. TUPLES IN A BOOLEAN CONTEXT

You can use tuples in a boolean context, such as an if statement.

>>> def is_it_true(anything):

...

if anything:

...

print("yes, it's true")

...

else:

...

print("no, it's false")

...

>>> is_it_true(())

no, it's false

>>> is_it_true(('a', 'b'))

yes, it's true

>>> is_it_true((False,))

yes, it's true

>>> type((False))

<class 'bool'>

>>> type((False,))

<class 'tuple'>

1. In a boolean context, an empty tuple is false.

2. Any tuple with at least one item is true.

3. Any tuple with at least one item is true. The value of the items is irrelevant. But what’s that comma doing

there?

4. To create a tuple of one item, you need a comma after the value. Without the comma, Python just assumes

you have an extra pair of parentheses, which is harmless, but it doesn’t create a tuple.

73