预计阅读本页时间:-
CHAPTER -1. WHAT’S NEW IN “DIVE INTO PYTHON
3”
❝ Isn’t this where we came in? ❞
广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元
— Pink Floyd, The Wall
-1.1. A.K.A. “THE MINUS LEVEL”
AreyoualreadyaPythonprogrammer?Didyoureadtheoriginal“DiveIntoPython”?Didyoubuyit
on paper? (If so, thanks!) Are you ready to take the plunge into Python 3? … If so, read on. (If none of that
is true, you’d be better off starting at the beginning.)
Python 3 comes with a script called 2to3. Learn it. Love it. Use it. Porting Code to Python 3 with 2to3 is a reference of all the things that the 2to3 tool can fix automatically. Since a lot of those things are syntax
changes, it’s a good starting point to learn about a lot of the syntax changes in Python 3. (print is now a
function, `x` doesn’t work, & c.)
Case Study: Porting chardet to Python 3 documents my (ultimately successful) effort to port a non-trivial library from Python 2 to Python 3. It may help you; it may not. There’s a fairly steep learning curve, since
you need to kind of understand the library first, so you can understand why it broke and how I fixed it. A
lot of the breakage centers around strings. Speaking of which…
Strings. Whew. Where to start. Python 2 had “strings” and “Unicode strings.” Python 3 has “bytes” and
“strings.” That is, all strings are now Unicode strings, and if you want to deal with a bag of bytes, you use
the new bytes type. Python 3 will never implicitly convert between strings and bytes, so if you’re not sure
which one you have at any given moment, your code will almost certainly break. Read the Strings chapter
for more details.
Bytes vs. strings comes up again and again throughout the book.
1
• In Files, you’ll learn the difference between reading files in “binary” and “text” mode. Reading (and writing!) files in text mode requires an encoding parameter. Some text file methods count characters, but other
methods count bytes. If your code assumes that one character == one byte, it will break on multi-byte
characters.
• In HTTP Web Services, the httplib2 module fetches headers and data over HTTP. HTTP headers are
returned as strings, but the HTTP body is returned as bytes.
• In Serializing Python Objects, you’ll learn why the pickle module in Python 3 defines a new data format that is backwardly incompatible with Python 2. (Hint: it’s because of bytes and strings.) Also, Python 3 supports
serializing objects to and from JSON, which doesn’t even have a bytes type. I’ll show you how to hack
around that.
• In Case study: porting chardet to Python 3, it’s just a bloody mess of bytes and strings everywhere.
Even if you don’t care about Unicode (oh but you will), you’ll want to read about string formatting in Python
3, which is completely different from Python 2.
Iterators are everywhere in Python 3, and I understand them a lot better than I did five years ago when I
wrote “Dive Into Python”. You need to understand them too, because lots of functions that used to return
lists in Python 2 will now return iterators in Python 3. At a minimum, you should read the second half of
the Iterators chapter and the second half of the Advanced Iterators chapter.
By popular request, I’ve added an appendix on Special Method Names, which is kind of like the Python docs
“Data Model” chapter but with more snark.
When I was writing “Dive Into Python”, all of the available XML libraries sucked. Then Fredrik Lundh wrote
ElementTree, which doesn’t suck at all. The Python gods wisely incorporated ElementTree into the standard
library, and now it forms the basis for my new XML chapter. The old ways of parsing XML are still around, but you should avoid them, because they suck!
Also new in Python — not in the language but in the community — is the emergence of code repositories
like The Python Package Index (PyPI). Python comes with utilities to package your code in standard formats and distribute those packages on PyPI. Read Packaging Python Libraries for details.
2
CHAPTER 0. INSTALLING PYTHON
❝ Tempora mutantur nos et mutamur in illis. (Times change, and we change with them.) ❞
— ancient Roman proverb
0.1. DIVING IN
BeforeyoucanstartprogramminginPython3,youneedtoinstallit.Ordoyou?
0.2. WHICH PYTHON IS RIGHT FOR YOU?
If you're using an account on a hosted server, your ISP may have already installed Python 3. If you’re running
Linux at home, you may already have Python 3, too. Most popular GNU/Linux distributions come with
Python 2 in the default installation; a small but growing number of distributions also include Python 3. Mac
OS X includes a command-line version of Python 2, but as of this writing it does not include Python 3.
Microsoft Windows does not come with any version of Python. But don’t despair! You can point-and-click
your way through installing Python, regardless of what operating system you have.
The easiest way to check for Python 3 on your Linux or Mac OS X system is from the command line. Once you’re at a command line prompt, just type python3 (all lowercase, no spaces), press ENTER, and see what
happens. On my home Linux system, Python 3.1 is already installed, and this command gets me into the
Python interactive shell.
mark@atlantis:~$ python3
Python 3.1 (r31:73572, Jul 28 2009, 06:52:23)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
(Type exit() and press ENTER to exit the Python interactive shell.)
3
My web hosting provider also runs Linux and provides command-line access, but my server does not have Python 3 installed. (Boo!)
mark@manganese:~$ python3
bash: python3: command not found
So back to the question that started this section, “Which Python is right for you?” Whichever one runs on
the computer you already have.
[Read on for Windows instructions, or skip to Installing on Mac OS X, Installing on Ubuntu Linux, or
Installing on Other Platforms.]
⁂
0.3. INSTALLING ON MICROSOFT WINDOWS
Windows comes in two architectures these days: 32-bit and 64-bit. Of course, there are lots of different
versions of Windows — XP, Vista, Windows 7 — but Python runs on all of them. The more important
distinction is 32-bit v. 64-bit. If you have no idea what architecture you’re running, it’s probably 32-bit.
Visit python.org/download/ and download the appropriate Python 3 Windows installer for your architecture. Your choices will look something like this:
• Python 3.1 Windows installer (Windows binary — does not include source)
• Python 3.1 Windows AMD64 installer (Windows AMD64 binary — does not include source)
I don’t want to include direct download links here, because minor updates of Python happen all the time and
I don’t want to be responsible for you missing important updates. You should always install the most recent
version of Python 3.x unless you have some esoteric reason not to.
4
Once your download is complete, double-
click the .msi file. Windows will pop up a
security alert, since you’re about to be
running executable code. The official Python
installer is digitally signed by the Python
Software Foundation, the non-profit
corporation that oversees Python
development. Don’t accept imitations!
Click the Run button to launch the Python
3 installer.
The first question the installer
will ask you is whether you
want to install Python 3 for all
users or just for you. The
default choice is “install for all
users,” which is the best
choice unless you have a good
reason to choose otherwise.
(One possible reason why you
would want to “install just for
me” is that you are installing
Python on your company’s
computer and you don’t have
administrative rights on your
Windows account. But then,
why are you installing Python
without permission from your company’s Windows administrator? Don’t get me in trouble here!)
Click the Next button to accept your choice of installation type.
5
Next, the installer will prompt
you to choose a destination
directory. The default for all
versions of Python 3.1.x is
C:\Python31\, which should
work well for most users
unless you have a specific
reason to change it. If you
maintain a separate drive letter
for installing applications, you
can browse to it using the
embedded controls, or simply
type the pathname in the box
below. You are not limited to
installing Python on the C:
drive; you can install it on any
drive, in any folder.
Click the Next button to accept your choice of destination directory.
6
The next page looks
complicated, but it’s not really.
Like many installers, you have
the option not to install every
single component of Python 3.
If disk space is especially tight,
you can exclude certain
components.
◦ Register Extensions allows
you to double-click Python
scripts (.py files) and run
them. Recommended but not
required. (This option doesn’t
require any disk space, so
there is little point in
excluding it.)
◦ Tcl/Tk is the graphics library used by the Python Shell, which you will use throughout this book. I strongly
recommend keeping this option.
◦ Documentation installs a help file that contains much of the information on docs.python.org.
Recommended if you are on dialup or have limited Internet access.
◦ Utility Scripts includes the 2to3.py script which you’ll learn about later in this book. Required if you want to learn about migrating existing Python 2 code to Python 3. If you have no existing Python 2 code, you can
skip this option.
◦ Test Suite is a collection of scripts used to test the Python interpreter itself. We will not use it in this
book, nor have I ever used it in the course of programming in Python. Completely optional.
7
If you’re unsure how much
disk space you have, click the
Disk Usage button. The
installer will list your drive
letters, compute how much
space is available on each
drive, and calculate how much
would be left after installation.
Click the OK button to return
to the “Customizing Python”
page.
If you decide to exclude an
option, select the drop-down
button before the option and
select “Entire feature will be
unavailable.” For example,
excluding the test suite will
save you a whopping 7908KB
of disk space.
Click the Next button to
accept your choice of options.
8
The installer will copy all the
necessary files to your chosen
destination directory. (This
happens so quickly, I had to
try it three times to even get
a screenshot of it!)
Click the Finish button to
exit the installer.
9
In
your
Start menu, there should be a new item called Python 3.1. Within that, there is a program called I D L E .
Select this item to run the interactive Python Shell.
[Skip to using the Python Shell]
⁂
10
0.4. INSTALLING ON MAC OS X
All modern Macintosh computers use the Intel chip (like most Windows PCs). Older Macs used PowerPC
chips. You don’t need to understand the difference, because there’s just one Mac Python installer for all
Macs.
Visit python.org/download/ and download the Mac installer. It will be called something like Python 3.1
Mac Installer Disk Image, although the version number may vary. Be sure to download version 3.x, not
2.x.
Your browser should automatically mount the disk image and open a Finder window to show you the
contents. (If this doesn’t happen, you’ll need to find the disk image in your downloads folder and double-click
to mount it. It will be named something like python-3.1.dmg.) The disk image contains a number of text
files (Build.txt, License.txt, ReadMe.txt), and the actual installer package, Python.mpkg.
Double-click the Python.mpkg installer package to launch the Mac Python installer.
11
The first
page of the
installer
gives a brief
description
of Python
itself, then
refers you
to the
ReadMe.txt
file (which
you didn’t
read, did
you?) for
more details.
Click the
Continue
button to move along.
12
The next
page actually
contains
some
important
information:
Python
requires
Mac OS X
10.3 or
later. If you
are still
running Mac
OS X 10.2,
you should
really
upgrade.
Apple no
longer provides security updates for your operating system, and your computer is probably at risk if you
ever go online. Also, you can’t run Python 3.
Click the Continue button to advance.
13
Like all good
installers,
the Python
installer
displays the
software
license
agreement.
Python is
open
source, and
its license is
Python has
had a
number of owners and sponsors throughout its history, each of which has left its mark on the software
license. But the end result is this: Python is open source, and you may use it on any platform, for any
purpose, without fee or obligation of reciprocity.
Click the Continue button once again.
14
Due to
quirks in the
standard
Apple
installer
framework,
you must
“agree” to
the software
license in
order to
complete
the
installation.
Since Python
is open
source, you
are really
“agreeing” that the license is granting you additional rights, rather than taking them away.
Click the Agree button to continue.
15
The next
screen
allows you
to change
your install
location.
You must
install
Python on
your boot
drive, but
due to
limitations of
the installer,
it does not
enforce this.
In truth, I
have never
had the need to change the install location.
From this screen, you can also customize the installation to exclude certain features. If you want to do this,
click the Customize button; otherwise click the Install button.
16
If you
choose a
Custom
Install, the
installer will
present you
with the
following list
of features:
◦ Python
Framework. This is the guts of Python, and is both selected and disabled because it must be installed.
◦ GUI Applications includes IDLE, the graphical Python Shell which you will use throughout this book. I
strongly recommend keeping this option selected.
◦ UNIX command-line tools includes the command-line python3 application. I strongly recommend keeping
this option, too.
◦ Python Documentation contains much of the information on docs.python.org. Recommended if you are on dialup or have limited Internet access.
◦ Shell profile updater controls whether to update your shell profile (used in Terminal.app) to ensure that
this version of Python is on the search path of your shell. You probably don’t need to change this.
◦ Fix system Python should not be changed. (It tells your Mac to use Python 3 as the default Python for all
scripts, including built-in system scripts from Apple. This would be very bad, since most of those scripts are
written for Python 2, and they would fail to run properly under Python 3.)
Click the Install button to continue.
17
Because it
installs
system-wide
frameworks
and binaries
in /usr/
local/bin/,
the installer
will ask you
for an
administrative password. There is no way to install Mac Python without administrator privileges.
Click the OK button to begin the installation.
18
The installer
will display a
progress
meter while
it installs the
features
you’ve
selected.
Assuming all
went well,
the installer
will give you
a big green
checkmark
to tell you
that the
installation
completed
successfully.
19
Click the Close button to exit the installer.
Assuming you didn’t change the
install location, you can find the
newly installed files in the
Python 3.1 folder within your
/Applications folder. The
most important piece is IDLE,
the graphical Python Shell.
Double-click IDLE to launch the
Python Shell.
20
The Python Shell is where
you will spend most of
your time exploring
Python. Examples
throughout this book will
assume that you can find
your way into the Python
Shell.
[Skip to using the Python
⁂
0.5. INSTALLING ON UBUNTU LINUX
Modern Linux distributions are backed by vast repositories of precompiled applications, ready to install. The
exact details vary by distribution. In Ubuntu Linux, the easiest way to install Python 3 is through the Add/
Remove application in your Applications menu.
21
When you first launch the Add/Remove application, it will show you a list of preselected applications in
different categories. Some are already installed; most are not. Because the repository contains over 10,000
applications, there are different filters you can apply to see small parts of the repository. The default filter is
“Canonical-maintained applications,” which is a small subset of the total number of applications that are
officially supported by Canonical, the company that creates and maintains Ubuntu Linux.
22
Python 3 is not maintained by Canonical, so the first step is to drop down this filter menu and select “All
Open Source applications.”
Once you’ve widened the filter to include all open source applications, use the Search box immediately after
the filter menu to search for Python 3.
23
Now the list of applications narrows to just those matching Python 3. You’re going to check two packages.
The first is Python (v3.0). This contains the Python interpreter itself.
The second package you want is immediately above: IDLE (using Python-3.0). This is a graphical Python
Shell that you will use throughout this book.
After you’ve checked those two packages, click the Apply Changes button to continue.
24
The
package
manager
will ask
you to
confirm
that you
want to
add both
IDLE
(using
Python-3.0) and Python (v3.0).
Click the Apply button to continue.
The package manager will show you a progress meter while it
downloads the necessary packages from Canonical’s Internet
repository.
25
Once the packages are
downloaded, the package
manager will automatically begin
installing them.
If all went well,
the package
manager will
confirm that
both packages
were successfully
installed. From
here, you can
double-click IDLE
to launch the
Python Shell, or
click the Close button to exit the package manager.
You can always relaunch the Python Shell by going to your Applications menu, then the Programming
submenu, and selecting IDLE.
26
The
Python Shell is where you will spend most of your time exploring Python. Examples throughout this book
will assume that you can find your way into the Python Shell.
[Skip to using the Python Shell]
⁂
27
0.6. INSTALLING ON OTHER PLATFORMS
Python 3 is available on a number of different platforms. In particular, it is available in virtually every Linux,
B S D , and Solaris-based distribution. For example, RedHat Linux uses the yum package manager. FreeBSD has
its ports and packages collection, SUSE has zypper, and Solaris has pkgadd. A quick web search for Python 3 + your operating system should tell you whether a Python 3 package is available, and if so, how to install it.
⁂
0.7. USING THE PYTHON SHELL
The Python Shell is where you can explore Python syntax, get interactive help on commands, and debug
short programs. The graphical Python Shell (named IDLE) also contains a decent text editor that supports
Python syntax coloring and integrates with the Python Shell. If you don’t already have a favorite text editor,
you should give IDLE a try.
First things first. The Python Shell itself is an amazing interactive playground. Throughout this book, you’ll see
examples like this:
>>> 1 + 1
2
The three angle brackets, >>>, denote the Python Shell prompt. Don’t type that part. That’s just to let you
know that this example is meant to be followed in the Python Shell.
1 + 1 is the part you type. You can type any valid Python expression or command in the Python Shell.
Don’t be shy; it won’t bite! The worst that will happen is you’ll get an error message. Commands get
executed immediately (once you press ENTER); expressions get evaluated immediately, and the Python Shell
prints out the result.
2 is the result of evaluating this expression. As it happens, 1 + 1 is a valid Python expression. The result, of
course, is 2.
28
Let’s try another one.
>>> print('Hello world!')
Hello world!
Pretty simple, no? But there’s lots more you can do in the Python shell. If you ever get stuck — you can’t
remember a command, or you can’t remember the proper arguments to pass a certain function — you can
get interactive help in the Python Shell. Just type help and press ENTER.
>>> help
Type help() for interactive help, or help(object) for help about object.
There are two modes of help. You can get help about a single object, which just prints out the
documentation and returns you to the Python Shell prompt. You can also enter help mode, where instead of
evaluating Python expressions, you just type keywords or command names and it will print out whatever it
knows about that command.
To enter the interactive help mode, type help() and press ENTER.
29
>>> help()
Welcome to Python 3.0! This is the online help utility.
If this is your first time using Python, you should definitely check out
the tutorial on the Internet at http://docs.python.org/tutorial/.
Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules. To quit this help utility and
return to the interpreter, just type "quit".
To get a list of available modules, keywords, or topics, type "modules",
"keywords", or "topics". Each module also comes with a one-line summary
of what it does; to list the modules whose summaries contain a given word
such as "spam", type "modules spam".
help>
Note how the prompt changes from >>> to help>. This reminds you that you’re in the interactive help
mode. Now you can enter any keyword, command, module name, function name — pretty much anything
Python understands — and read documentation on it.
30
help> print
①
Help on built-in function print in module builtins:
print(...)
print(value, ..., sep=' ', end='\n', file=sys.stdout)
Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
help> PapayaWhip
②
no Python documentation found for 'PapayaWhip'
help> quit
③
You are now leaving help and returning to the Python interpreter.
If you want to ask for help on a particular object directly from the
interpreter, you can type "help(object)". Executing "help('string')"
has the same effect as typing a particular string at the help> prompt.
>>>
④
1. To get documentation on the print() function, just type print and press ENTER. The interactive help mode
will display something akin to a man page: the function name, a brief synopsis, the function’s arguments and
their default values, and so on. If the documentation seems opaque to you, don’t panic. You’ll learn more
about all these concepts in the next few chapters.
2. Of course, the interactive help mode doesn’t know everything. If you type something that isn’t a Python
command, module, function, or other built-in keyword, the interactive help mode will just shrug its virtual
shoulders.
3. To quit the interactive help mode, type quit and press ENTER.
4. The prompt changes back to >>> to signal that you’ve left the interactive help mode and returned to the
Python Shell.
31
I D L E , the graphical Python Shell, also includes a Python-aware text editor.
⁂
0.8. PYTHON EDITORS AND IDES
I D L E is not the only game in town when it comes to writing programs in Python. While it’s useful to get
started with learning the language itself, many developers prefer other text editors or Integrated
Development Environments (IDEs). I won’t cover them here, but the Python community maintains a list of
Python-aware editors that covers a wide range of supported platforms and software licenses.
You might also want to check out the list of Python-aware IDEs, although few of them support Python 3 yet.
One that does is PyDev, a plugin for Eclipse that turns Eclipse into a full-fledged Python IDE. Both Eclipse and PyDev are cross-platform and open source.
On the commercial front, there is ActiveState’s Komodo IDE. It has per-user licensing, but students can get a discount, and a free time-limited trial version is available.
I’ve been programming in Python for nine years, and I edit my Python programs in GNU Emacs and debug them in the command-line Python Shell. There’s no right or wrong way to develop in Python. Find a way
that works for you!
32
CHAPTER 1. YOUR FIRST PYTHON PROGRAM
❝ Don’t bury your burden in saintly silence. You have a problem? Great. Rejoice, dive in, and investigate. ❞
1.1. DIVING IN
ConventiondictatesthatIshouldboreyouwiththefundamentalbuildingblocksofprogramming,sowe
can slowly work up to building something useful. Let’s skip all that. Here is a complete, working Python
program. It probably makes absolutely no sense to you. Don’t worry about that, because you’re going to
dissect it line by line. But read through it first and see what, if anything, you can make of it.
33
SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
'''Convert a file size to human-readable form.
Keyword arguments:
size -- file size in bytes
a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024
if False, use multiples of 1000
Returns: string
'''
if size < 0:
raise ValueError('number must be non-negative')
multiple = 1024 if a_kilobyte_is_1024_bytes else 1000
for suffix in SUFFIXES[multiple]:
size /= multiple
if size < multiple:
return '{0:.1f} {1}'.format(size, suffix)
raise ValueError('number too large')
if __name__ == '__main__':
print(approximate_size(1000000000000, False))
print(approximate_size(1000000000000))
Now let’s run this program on the command line. On Windows, it will look something like this:
c:\home\diveintopython3\examples> c:\python31\python.exe humansize.py
1.0 TB
931.3 GiB
34
On Mac OS X or Linux, it would look something like this:
you@localhost:~/diveintopython3/examples$ python3 humansize.py
1.0 TB
931.3 GiB
What just happened? You executed your first Python program. You called the Python interpreter on the
command line, and you passed the name of the script you wanted Python to execute. The script defines a
single function, the approximate_size() function, which takes an exact file size in bytes and calculates a
“pretty” (but approximate) size. (You’ve probably seen this in Windows Explorer, or the Mac OS X Finder,
or Nautilus or Dolphin or Thunar on Linux. If you display a folder of documents as a multi-column list, it
will display a table with the document icon, the document name, the size, type, last-modified date, and so on.
If the folder contains a 1093-byte file named TODO, your file manager won’t display TODO 1093 bytes; it’ll say
something like TODO 1 KB instead. That’s what the approximate_size() function does.)
Look at the bottom of the script, and you’ll see two calls to print(approximate_size(arguments)). These
are function calls — first calling the approximate_size() function and passing a number of arguments, then
taking the return value and passing it straight on to the print() function. The print() function is built-in;
you’ll never see an explicit declaration of it. You can just use it, anytime, anywhere. (There are lots of built-
in functions, and lots more functions that are separated into modules. Patience, grasshopper.)
So why does running the script on the command line give you the same output every time? We’ll get to
that. First, let’s look at that approximate_size() function.
⁂
1.2. DECLARING FUNCTIONS
Python has functions like most other languages, but it does not have separate header files like C++ or
interface/implementation sections like Pascal. When you need a function, just declare it, like this:
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
35
The keyword def starts the function declaration,
followed by the function name, followed by the
arguments in parentheses. Multiple arguments are
separated with commas.
When you
Also note that the function doesn’t define a return
datatype. Python functions do not specify the datatype
need a
of their return value; they don’t even specify whether or
not they return a value. (In fact, every Python function
function,
returns a value; if the function ever executes a return
statement, it will return that value, otherwise it will
just declare
return None, the Python null value.)
it.
☞ In some languages, functions (that return a
value) start with function, and subroutines
(that do not return a value) start with sub.
There are no subroutines in Python. Everything is a function, all functions return a
value (even if it’s None), and all functions start with def.
The approximate_size() function takes the two arguments — size and
a_kilobyte_is_1024_bytes — but neither argument specifies a datatype. In Python, variables are never
explicitly typed. Python figures out what type a variable is and keeps track of it internally.
☞ In Java and other statically-typed languages, you must specify the datatype of the
function return value and each function argument. In Python, you never explicitly
specify the datatype of anything. Based on what value you assign, Python keeps track
of the datatype internally.
36
1.2.1. OPTIONAL AND NAMED ARGUMENTS
Python allows function arguments to have default values; if the function is called without the argument, the
argument gets its default value. Furthermore, arguments can be specified in any order by using named
arguments.
Let’s take another look at that approximate_size() function declaration:
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
The second argument, a_kilobyte_is_1024_bytes, specifies a default value of True. This means the
argument is optional; you can call the function without it, and Python will act as if you had called it with True
as a second parameter.
Now look at the bottom of the script:
if __name__ == '__main__':
print(approximate_size(1000000000000, False))
①
print(approximate_size(1000000000000))
②
1. This calls the approximate_size() function with two arguments. Within the approximate_size() function,
a_kilobyte_is_1024_bytes will be False, since you explicitly passed False as the second argument.
2. This calls the approximate_size() function with only one argument. But that’s OK, because the second
argument is optional! Since the caller doesn’t specify, the second argument defaults to True, as defined by
the function declaration.
You can also pass values into a function by name.
37
>>> from humansize import approximate_size
>>> approximate_size(4000, a_kilobyte_is_1024_bytes=False)
①
'4.0 KB'
>>> approximate_size(size=4000, a_kilobyte_is_1024_bytes=False)
②
'4.0 KB'
>>> approximate_size(a_kilobyte_is_1024_bytes=False, size=4000)
③
'4.0 KB'
>>> approximate_size(a_kilobyte_is_1024_bytes=False, 4000)
④
File "<stdin>", line 1
SyntaxError: non-keyword arg after keyword arg
>>> approximate_size(size=4000, False)
⑤
File "<stdin>", line 1
SyntaxError: non-keyword arg after keyword arg
1. This calls the approximate_size() function with 4000 for the first argument (size) and False for the
argument named a_kilobyte_is_1024_bytes. (That happens to be the second argument, but doesn’t
matter, as you’ll see in a minute.)
2. This calls the approximate_size() function with 4000 for the argument named size and False for the
argument named a_kilobyte_is_1024_bytes. (These named arguments happen to be in the same order as
the arguments are listed in the function declaration, but that doesn’t matter either.)
3. This calls the approximate_size() function with False for the argument named
a_kilobyte_is_1024_bytes and 4000 for the argument named size. (See? I told you the order didn’t
matter.)
4. This call fails, because you have a named argument followed by an unnamed (positional) argument, and that
never works. Reading the argument list from left to right, once you have a single named argument, the rest
of the arguments must also be named.
5. This call fails too, for the same reason as the previous call. Is that surprising? After all, you passed 4000 for
the argument named size, then “obviously” that False value was meant for the
a_kilobyte_is_1024_bytes argument. But Python doesn’t work that way. As soon as you have a named
argument, all arguments to the right of that need to be named arguments, too.
⁂
38
1.3. WRITING READABLE CODE
I won’t bore you with a long finger-wagging speech about the importance of documenting your code. Just
know that code is written once but read many times, and the most important audience for your code is
yourself, six months after writing it ( i.e. after you’ve forgotten everything but need to fix something). Python
makes it easy to write readable code, so take advantage of it. You’ll thank me in six months.
1.3.1. DOCUMENTATION STRINGS
You can document a Python function by giving it a documentation string (docstring for short). In this
program, the approximate_size() function has a docstring:
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
'''Convert a file size to human-readable form.
Keyword arguments:
size -- file size in bytes
a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024
if False, use multiples of 1000
Returns: string
'''
Triple quotes signify a multi-line string. Everything
between the start and end quotes is part of a single
string, including carriage returns, leading white space,
and other quote characters. You can use them
anywhere, but you’ll see them most often used when
Every
defining a docstring.
function
☞
39
Triple quotes are also an easy way to define
a string with both single and double quotes,
like qq/.../ in Perl 5.
deserves a
Everything between the triple quotes is the function’s
docstring, which documents what the function does. A
decent
docstring, if it exists, must be the first thing defined in
a function (that is, on the next line after the function
docstring.
declaration). You don’t technically need to give your
function a docstring, but you always should. I know
you’ve heard this in every programming class you’ve
ever taken, but Python gives you an added incentive: the
docstring is available at runtime as an attribute of the function.
☞ Many Python IDEs use the docstring to provide context-sensitive documentation, so
that when you type a function name, its docstring appears as a tooltip. This can be
incredibly helpful, but it’s only as good as the docstrings you write.
⁂
1.4. THE import SEARCH PATH
Before this goes any further, I want to briefly mention the library search path. Python looks in several places
when you try to import a module. Specifically, it looks in all the directories defined in sys.path. This is just
a list, and you can easily view it or modify it with standard list methods. (You’ll learn more about lists in
40
>>> import sys
①
>>> sys.path
②
['',
'/usr/lib/python31.zip',
'/usr/lib/python3.1',
'/usr/lib/python3.1/plat-linux2@EXTRAMACHDEPPATH@',
'/usr/lib/python3.1/lib-dynload',
'/usr/lib/python3.1/dist-packages',
'/usr/local/lib/python3.1/dist-packages']
>>> sys
③
<module 'sys' (built-in)>
>>> sys.path.insert(0, '/home/mark/diveintopython3/examples')
④
>>> sys.path
⑤
['/home/mark/diveintopython3/examples',
'',
'/usr/lib/python31.zip',
'/usr/lib/python3.1',
'/usr/lib/python3.1/plat-linux2@EXTRAMACHDEPPATH@',
'/usr/lib/python3.1/lib-dynload',
'/usr/lib/python3.1/dist-packages',
'/usr/local/lib/python3.1/dist-packages']
1. Importing the sys module makes all of its functions and attributes available.
2. sys.path is a list of directory names that constitute the current search path. (Yours will look different,
depending on your operating system, what version of Python you’re running, and where it was originally
installed.) Python will look through these directories (in this order) for a .py file whose name matches what
you’re trying to import.
3. Actually, I lied; the truth is more complicated than that, because not all modules are stored as .py files.
Some are built-in modules; they are actually baked right into Python itself. Built-in modules behave just like
regular modules, but their Python source code is not available, because they are not written in Python! (Like
Python itself, these built-in modules are written in C.)
4. You can add a new directory to Python’s search path at runtime by adding the directory name to sys.path,
and then Python will look in that directory as well, whenever you try to import a module. The effect lasts as
long as Python is running.
41
5. By using sys.path.insert(0, new_path), you inserted a new directory as the first item of the sys.path
list, and therefore at the beginning of Python’s search path. This is almost always what you want. In case of
naming conflicts (for example, if Python ships with version 2 of a particular library but you want to use
version 3), this ensures that your modules will be found and used instead of the modules that came with
Python.
⁂
1.5. EVERYTHING IS AN OBJECT
In case you missed it, I just said that Python functions have attributes, and that those attributes are available
at runtime. A function, like everything else in Python, is an object.
Run the interactive Python shell and follow along:
>>> import humansize
①
>>> print(humansize.approximate_size(4096, True))
②
4.0 KiB
>>> print(humansize.approximate_size.__doc__)
③
Convert a file size to human-readable form.
Keyword arguments:
size -- file size in bytes
a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024
if False, use multiples of 1000
Returns: string
1. The first line imports the humansize program as a module — a chunk of code that you can use interactively,
or from a larger Python program. Once you import a module, you can reference any of its public functions,
classes, or attributes. Modules can do this to access functionality in other modules, and you can do it in the
42
Python interactive shell too. This is an important concept, and you’ll see a lot more of it throughout this
book.
2. When you want to use functions defined in imported modules, you need to include the module name. So
you can’t just say approximate_size; it must be humansize.approximate_size. If you’ve used classes in
Java, this should feel vaguely familiar.
3. Instead of calling the function as you would expect to, you asked for one of the function’s attributes,
__doc__.
☞ import in Python is like require in Perl. Once you import a Python module, you
access its functions with module.function; once you require a Perl module, you
access its functions with module::function.
1.5.1. WHAT’S AN OBJECT?
Everything in Python is an object, and everything can have attributes and methods. All functions have a built-
in attribute __doc__, which returns the docstring defined in the function’s source code. The sys module is
an object which has (among other things) an attribute called path. And so forth.
Still, this doesn’t answer the more fundamental question: what is an object? Different programming languages
define “object” in different ways. In some, it means that all objects must have attributes and methods; in
others, it means that all objects are subclassable. In Python, the definition is looser. Some objects have
neither attributes nor methods, but they could. Not all objects are subclassable. But everything is an object in
the sense that it can be assigned to a variable or passed as an argument to a function.
You may have heard the term “first-class object” in other programming contexts. In Python, functions are
first-class objects. You can pass a function as an argument to another function. Modules are first-class objects.
You can pass an entire module as an argument to a function. Classes are first-class objects, and individual
instances of a class are also first-class objects.
This is important, so I’m going to repeat it in case you missed it the first few times: everything in Python is an
object. Strings are objects. Lists are objects. Functions are objects. Classes are objects. Class instances are
objects. Even modules are objects.
43
⁂
1.6. INDENTING CODE
Python functions have no explicit begin or end, and no curly braces to mark where the function code starts
and stops. The only delimiter is a colon (:) and the indentation of the code itself.
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
①
if size < 0:
②
raise ValueError('number must be non-negative')
③
④
multiple = 1024 if a_kilobyte_is_1024_bytes else 1000
for suffix in SUFFIXES[multiple]:
⑤
size /= multiple
if size < multiple:
return '{0:.1f} {1}'.format(size, suffix)
raise ValueError('number too large')
1. Code blocks are defined by their indentation. By “code block,” I mean functions, if statements, for loops,
while loops, and so forth. Indenting starts a block and unindenting ends it. There are no explicit braces,
brackets, or keywords. This means that whitespace is significant, and must be consistent. In this example, the
function code is indented four spaces. It doesn’t need to be four spaces, it just needs to be consistent. The
first line that is not indented marks the end of the function.
2. In Python, an if statement is followed by a code block. If the if expression evaluates to true, the indented
block is executed, otherwise it falls to the else block (if any). Note the lack of parentheses around the
expression.
3. This line is inside the if code block. This raise statement will raise an exception (of type ValueError), but
only if size < 0.
4. This is not the end of the function. Completely blank lines don’t count. They can make the code more
readable, but they don’t count as code block delimiters. The function continues on the next line.
44
5. The for loop also marks the start of a code block. Code blocks can contain multiple lines, as long as they
are all indented the same amount. This for loop has three lines of code in it. There is no other special
syntax for multi-line code blocks. Just indent and get on with your life.
After some initial protests and several snide analogies to Fortran, you will make peace with this and start
seeing its benefits. One major benefit is that all Python programs look similar, since indentation is a language
requirement and not a matter of style. This makes it easier to read and understand other people’s Python
code.
☞ Python uses carriage returns to separate statements and a colon and indentation to
separate code blocks. C++ and Java use semicolons to separate statements and curly
braces to separate code blocks.
⁂
1.7. EXCEPTIONS
Exceptions are everywhere in Python. Virtually every module in the standard Python library uses them, and
Python itself will raise them in a lot of different circumstances. You’ll see them repeatedly throughout this
book.
What is an exception? Usually it’s an error, an indication that something went wrong. (Not all exceptions are
errors, but never mind that for now.) Some programming languages encourage the use of error return
codes, which you check. Python encourages the use of exceptions, which you handle.
When an error occurs in the Python Shell, it prints out some details about the exception and how it
happened, and that’s that. This is called an unhandled exception. When the exception was raised, there was
no code to explicitly notice it and deal with it, so it bubbled its way back up to the top level of the Python
Shell, which spits out some debugging information and calls it a day. In the shell, that's no big deal, but if that
happened while your actual Python program was running, the entire program would come to a screeching
halt if nothing handles the exception. Maybe that’s what you want, maybe it isn’t.
45
☞ Unlike Java, Python functions don’t declare which exceptions they might raise. It’s up
to you to determine what possible exceptions you need to catch.
An exception doesn’t need to result in a complete program crash, though. Exceptions can be handled.
Sometimes an exception is really because you have a bug in your code (like accessing a variable that doesn’t
exist), but sometimes an exception is something you can anticipate. If you’re opening a file, it might not
exist. If you’re importing a module, it might not be installed. If you’re connecting to a database, it might be
unavailable, or you might not have the correct security credentials to access it. If you know a line of code
may raise an exception, you should handle the exception using a try...except block.
☞ Python uses try...except blocks to handle exceptions, and the raise statement to
generate them. Java and C++ use try...catch blocks to handle exceptions, and the
throw statement to generate them.
The approximate_size() function raises exceptions in two different cases: if the given size is larger than
the function is designed to handle, or if it’s less than zero.
if size < 0:
raise ValueError('number must be non-negative')
The syntax for raising an exception is simple enough. Use the raise statement, followed by the exception
name, and an optional human-readable string for debugging purposes. The syntax is reminiscent of calling a
function. (In reality, exceptions are implemented as classes, and this raise statement is actually creating an
instance of the ValueError class and passing the string 'number must be non-negative' to its initialization
method. But we’re getting ahead of ourselves!)
☞ You don’t need to handle an exception in the function that raises it. If one function
doesn’t handle it, the exception is passed to the calling function, then that function’s
calling function, and so on “up the stack.” If the exception is never handled, your
program will crash, Python will print a “traceback” to standard error, and that’s the
46
end of that. Again, maybe that’s what you want; it depends on what your program
does.
1.7.1. CATCHING IMPORT ERRORS
One of Python’s built-in exceptions is ImportError, which is raised when you try to import a module and
fail. This can happen for a variety of reasons, but the simplest case is when the module doesn’t exist in your
import search path. You can use this to include optional features in your program. For example, the
chardet library provides character encoding auto-detection. Perhaps your program wants to use this library if it exists, but continue gracefully if the user hasn’t installed it. You can do this with a try..except block.
try:
import chardet
except ImportError:
chardet = None
Later, you can check for the presence of the chardet module with a simple if statement:
if chardet:
# do something
else:
# continue anyway
Another common use of the ImportError exception is when two modules implement a common API, but
one is more desirable than the other. (Maybe it’s faster, or it uses less memory.) You can try to import one
module but fall back to a different module if the first import fails. For example, the XML chapter talks about two modules that implement a common API, called the ElementTree API. The first, lxml, is a third-party
module that you need to download and install yourself. The second, xml.etree.ElementTree, is slower but
is part of the Python 3 standard library.
47
try:
from lxml import etree
except ImportError:
import xml.etree.ElementTree as etree
By the end of this try..except block, you have imported some module and named it etree. Since both
modules implement a common API, the rest of your code doesn’t need to keep checking which module got
imported. And since the module that did get imported is always called etree, the rest of your code doesn’t
need to be littered with if statements to call differently-named modules.
⁂
1.8. UNBOUND VARIABLES
Take another look at this line of code from the approximate_size() function:
multiple = 1024 if a_kilobyte_is_1024_bytes else 1000
You never declare the variable multiple, you just assign a value to it. That’s OK, because Python lets you
do that. What Python will not let you do is reference a variable that has never been assigned a value. Trying
to do so will raise a NameError exception.
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> x = 1
>>> x
1
You will thank Python for this one day.
48
⁂
1.9. EVERYTHING IS CASE-SENSITIVE
All names in Python are case-sensitive: variable names, function names, class names, module names, exception
names. If you can get it, set it, call it, construct it, import it, or raise it, it’s case-sensitive.
>>> an_integer = 1
>>> an_integer
1
>>> AN_INTEGER
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'AN_INTEGER' is not defined
>>> An_Integer
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'An_Integer' is not defined
>>> an_inteGer
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'an_inteGer' is not defined
And so on.
⁂
49
1.10. RUNNING SCRIPTS
Python modules are objects and have several useful
attributes. You can use this to easily test your modules
as you write them, by including a special block of code
that executes when you run the Python file on the
command line. Take the last few lines of humansize.py:
Everything
if __name__ == '__main__':
in Python is
an object.
print(approximate_size(1000000000000, False))
print(approximate_size(1000000000000))
☞ Like C, Python uses == for comparison and = for assignment. Unlike C, Python does
not support in-line assignment, so there’s no chance of accidentally assigning the value
you thought you were comparing.
So what makes this if statement special? Well, modules are objects, and all modules have a built-in attribute
__name__. A module’s __name__ depends on how you’re using the module. If you import the module, then
__name__ is the module’s filename, without a directory path or file extension.
>>> import humansize
>>> humansize.__name__
'humansize'
50
But you can also run the module directly as a standalone program, in which case __name__ will be a special
default value, __main__. Python will evaluate this if statement, find a true expression, and execute the if
code block. In this case, to print two values.
c:\home\diveintopython3> c:\python31\python.exe humansize.py
1.0 TB
931.3 GiB
And that’s your first Python program!
⁂
1.11. FURTHER READING
• PEP 257: Docstring Conventions explains what distinguishes a good docstring from a great docstring.
• Python Tutorial: Documentation Strings also touches on the subject.
• PEP 8: Style Guide for Python Code discusses good indentation style.
• Python Reference Manual explains what it means to say that everything in Python is an object, because some people are pedants and like to discuss that sort of thing at great length.
51
CHAPTER 2. NATIVE DATATYPES
❝ Wonder is the foundation of all philosophy, inquiry its progress, ignorance its end. ❞
— Michel de Montaigne
2.1. DIVING IN
Datatypes.SetasideyourfirstPythonprogramforjustaminute,andlet’stalkaboutdatatypes.In
Python, every value has a datatype, but you don’t need to declare the datatype of variables. How does that work? Based on each variable’s original assignment, Python figures out what type it is and keeps tracks of
that internally.
Python has many native datatypes. Here are the important ones:
1. Booleans are either True or False.
2. Numbers can be integers (1 and 2), floats (1.1 and 1.2), fractions (1/2 and 2/3), or even complex
3. Strings are sequences of Unicode characters, e.g. an HTML document.
4. Bytes and byte arrays, e.g. a JPEG image file.
5. Lists are ordered sequences of values.
6. Tuples are ordered, immutable sequences of values.
7. Sets are unordered bags of values.
8. Dictionaries are unordered bags of key-value pairs.
Of course, there are more types than these. Everything is an object in Python, so there are types like module, function, class, method, file, and even compiled code. You’ve already seen some of these: modules have
names, functions have docstrings, & c. You’ll learn about classes in Classes & Iterators, and about files in
52
Strings and bytes are important enough — and complicated enough — that they get their own chapter. Let’s
look at the others first.
⁂
2.2. BOOLEANS
Booleans are either true or false. Python has two
constants, cleverly named True and False, which can be
used to assign boolean values directly. Expressions can
also evaluate to a boolean value. In certain places (like
if statements), Python expects an expression to
You can use
evaluate to a boolean value. These places are called
boolean contexts. You can use virtually any expression in
virtually
a boolean context, and Python will try to determine its
truth value. Different datatypes have different rules
any
about which values are true or false in a boolean
context. (This will make more sense once you see some
expression
concrete examples later in this chapter.)
in a boolean
For example, take this snippet from humansize.py:
if size < 0:
context.
raise ValueError('number must be non-negative')
size is an integer, 0 is an integer, and < is a numerical operator. The result of the expression size < 0 is
always a boolean. You can test this yourself in the Python interactive shell:
53
>>> size = 1
>>> size < 0
False
>>> size = 0
>>> size < 0
False
>>> size = -1
>>> size < 0
True
Due to some legacy issues left over from Python 2, booleans can be treated as numbers. True is 1; False is
0.
>>> True + True
2
>>> True - False
1
>>> True * False
0
>>> True / False
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: int division or modulo by zero
Ew, ew, ew! Don’t do that. Forget I even mentioned it.
⁂
54
2.3. NUMBERS
Numbers are awesome. There are so many to choose from. Python supports both integers and floating point
numbers. There’s no type declaration to distinguish them; Python tells them apart by the presence or
absence of a decimal point.
>>> type(1)
①
<class 'int'>
>>> isinstance(1, int)
②
True
>>> 1 + 1
③
2
>>> 1 + 1.0
④
2.0
>>> type(2.0)
<class 'float'>
1. You can use the type() function to check the type of any value or variable. As you might expect, 1 is an
int.
2. Similarly, you can use the isinstance() function to check whether a value or variable is of a given type.
3. Adding an int to an int yields an int.
4. Adding an int to a float yields a float. Python coerces the int into a float to perform the addition,
then returns a float as the result.
2.3.1. COERCING INTEGERS TO FLOATS AND VICE-VERSA
As you just saw, some operators (like addition) will coerce integers to floating point numbers as needed.
You can also coerce them by yourself.
55
>>> float(2)
①
2.0
>>> int(2.0)
②
2
>>> int(2.5)
③
2
>>> int(-2.5)
④
-2
>>> 1.12345678901234567890
⑤
1.1234567890123457
>>> type(1000000000000000)
⑥
<class 'int'>
1. You can explicitly coerce an int to a float by calling the float() function.
2. Unsurprisingly, you can also coerce a float to an int by calling int().
3. The int() function will truncate, not round.
4. The int() function truncates negative numbers towards 0. It’s a true truncate function, not a floor function.
5. Floating point numbers are accurate to 15 decimal places.
6. Integers can be arbitrarily large.
☞ Python 2 had separate types for int and long. The int datatype was limited by
sys.maxint, which varied by platform but was usually 232-1. Python 3 has just one
integer type, which behaves mostly like the old long type from Python 2. See PEP
237 for details.
2.3.2. COMMON NUMERICAL OPERATIONS
You can do all kinds of things with numbers.
56
>>> 11 / 2
①
5.5
>>> 11 // 2
②
5
>>> −11 // 2
③
−6
>>> 11.0 // 2
④
5.0
>>> 11 ** 2
⑤
121
>>> 11 % 2
⑥
1
1. The / operator performs floating point division. It returns a float even if both the numerator and
denominator are ints.
2. The // operator performs a quirky kind of integer division. When the result is positive, you can think of it
as truncating (not rounding) to 0 decimal places, but be careful with that.
3. When integer-dividing negative numbers, the // operator rounds “up” to the nearest integer. Mathematically
speaking, it’s rounding “down” since −6 is less than −5, but it could trip you up if you were expecting it to
truncate to −5.
4. The // operator doesn’t always return an integer. If either the numerator or denominator is a float, it will
still round to the nearest integer, but the actual return value will be a float.
5. The ** operator means “raised to the power of.” 112 is 121.
6. The % operator gives the remainder after performing integer division. 11 divided by 2 is 5 with a remainder
of 1, so the result here is 1.
☞ In Python 2, the / operator usually meant integer division, but you could make it
behave like floating point division by including a special directive in your code. In
Python 3, the / operator always means floating point division. See PEP 238 for details.
57
2.3.3. FRACTIONS
Python isn’t limited to integers and floating point numbers. It can also do all the fancy math you learned in
high school and promptly forgot about.
>>> import fractions
①
>>> x = fractions.Fraction(1, 3)
②
>>> x
Fraction(1, 3)
>>> x * 2
③
Fraction(2, 3)
>>> fractions.Fraction(6, 4)
④
Fraction(3, 2)
>>> fractions.Fraction(0, 0)
⑤
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "fractions.py", line 96, in __new__
raise ZeroDivisionError('Fraction(%s, 0)' % numerator)
ZeroDivisionError: Fraction(0, 0)
1. To start using fractions, import the fractions module.
2. To define a fraction, create a Fraction object and pass in the numerator and denominator.
3. You can perform all the usual mathematical operations with fractions. Operations return a new Fraction
object. 2 * (1/3) = (2/3)
4. The Fraction object will automatically reduce fractions. (6/4) = (3/2)
5. Python has the good sense not to create a fraction with a zero denominator.
2.3.4. TRIGONOMETRY
You can also do basic trigonometry in Python.
58
>>> import math
>>> math.pi
①
3.1415926535897931
>>> math.sin(math.pi / 2)
②
1.0
>>> math.tan(math.pi / 4)
③
0.99999999999999989
1. The math module has a constant for π, the ratio of a circle’s circumference to its diameter.
2. The math module has all the basic trigonometric functions, including sin(), cos(), tan(), and variants like
asin().
3. Note, however, that Python does not have infinite precision. tan(π / 4) should return 1.0, not
0.99999999999999989.
2.3.5. NUMBERS IN A BOOLEAN CONTEXT
You can use numbers in a boolean context, such as an
if statement. Zero values are false, and non-zero values
are true.
Zero values
are false,
and non-
zero values
are true.
59
>>> def is_it_true(anything):
①
...
if anything:
...
print("yes, it's true")
...
else:
...
print("no, it's false")
...
>>> is_it_true(1)
②
yes, it's true
>>> is_it_true(-1)
yes, it's true
>>> is_it_true(0)
no, it's false
>>> is_it_true(0.1)
③
yes, it's true
>>> is_it_true(0.0)
no, it's false
>>> import fractions
>>> is_it_true(fractions.Fraction(1, 2))
④
yes, it's true
>>> is_it_true(fractions.Fraction(0, 1))
no, it's false
1. Did you know you can define your own functions in the Python interactive shell? Just press ENTER at the end
of each line, and ENTER on a blank line to finish.
2. In a boolean context, non-zero integers are true; 0 is false.
3. Non-zero floating point numbers are true; 0.0 is false. Be careful with this one! If there’s the slightest
rounding error (not impossible, as you saw in the previous section) then Python will be testing
0.0000000000001 instead of 0 and will return True.
4. Fractions can also be used in a boolean context. Fraction(0, n) is false for all values of n. All other
fractions are true.
⁂
60
2.4. LISTS
Lists are Python’s workhorse datatype. When I say “list,” you might be thinking “array whose size I have to
declare in advance, that can only contain items of the same type, & c.” Don’t think that. Lists are much
cooler than that.
☞ A list in Python is like an array in Perl 5. In Perl 5, variables that store arrays always
start with the @ character; in Python, variables can be named anything, and Python
keeps track of the datatype internally.
☞ A list in Python is much more than an array in Java (although it can be used as one if
that’s really all you want out of life). A better analogy would be to the ArrayList
class, which can hold arbitrary objects and can expand dynamically as new items are
added.
2.4.1. CREATING A LIST
Creating a list is easy: use square brackets to wrap a comma-separated list of values.
>>> a_list = ['a', 'b', 'mpilgrim', 'z', 'example']
①
>>> a_list
['a', 'b', 'mpilgrim', 'z', 'example']
>>> a_list[0]
②
'a'
>>> a_list[4]
③
'example'
>>> a_list[-1]
④
'example'
>>> a_list[-3]
⑤
'mpilgrim'
61
1. First, you define a list of five items. Note that they retain their original order. This is not an accident. A list
is an ordered set of items.
2. A list can be used like a zero-based array. The first item of any non-empty list is always a_list[0].
3. The last item of this five-item list is a_list[4], because lists are always zero-based.
4. A negative index accesses items from the end of the list counting backwards. The last item of any non-empty
list is always a_list[-1].
5. If the negative index is confusing to you, think of it this way: a_list[-n] == a_list[len(a_list) - n]. So
in this list, a_list[-3] == a_list[5 - 3] == a_list[2].
2.4.2. SLICING A LIST
Once you’ve defined a list, you can get any part of it as
a new list. This is called slicing the list.
>>> a_list
['a', 'b', 'mpilgrim', 'z', 'example']
a_list[0] is
>>> a_list[1:3]
①
['b', 'mpilgrim']
the first
>>> a_list[1:-1]
②
['b', 'mpilgrim', 'z']
item of
>>> a_list[0:3]
③
['a', 'b', 'mpilgrim']
a_list.
>>> a_list[:3]
④
['a', 'b', 'mpilgrim']
>>> a_list[3:]
⑤
['z', 'example']
>>> a_list[:]
⑥
['a', 'b', 'mpilgrim', 'z', 'example']
1. You can get a part of a list, called a “slice”, by specifying two indices. The return value is a new list
containing all the items of the list, in order, starting with the first slice index (in this case a_list[1]), up to
but not including the second slice index (in this case a_list[3]).
62
2. Slicing works if one or both of the slice indices is negative. If it helps, you can think of it this way: reading
the list from left to right, the first slice index specifies the first item you want, and the second slice index
specifies the first item you don’t want. The return value is everything in between.
3. Lists are zero-based, so a_list[0:3] returns the first three items of the list, starting at a_list[0], up to
but not including a_list[3].
4. If the left slice index is 0, you can leave it out, and 0 is implied. So a_list[:3] is the same as a_list[0:3],
because the starting 0 is implied.
5. Similarly, if the right slice index is the length of the list, you can leave it out. So a_list[3:] is the same as
a_list[3:5], because this list has five items. There is a pleasing symmetry here. In this five-item list,
a_list[:3] returns the first 3 items, and a_list[3:] returns the last two items. In fact, a_list[:n] will
always return the first n items, and a_list[n:] will return the rest, regardless of the length of the list.
6. If both slice indices are left out, all items of the list are included. But this is not the same as the original
a_list variable. It is a new list that happens to have all the same items. a_list[:] is shorthand for making
a complete copy of a list.
2.4.3. ADDING ITEMS TO A LIST
There are four ways to add items to a list.
>>> a_list = ['a']
>>> a_list = a_list + [2.0, 3]
①
>>> a_list
②
['a', 2.0, 3]
>>> a_list.append(True)
③
>>> a_list
['a', 2.0, 3, True]
>>> a_list.extend(['four', 'Ω'])
④
>>> a_list
['a', 2.0, 3, True, 'four', 'Ω']
>>> a_list.insert(0, 'Ω')
⑤
>>> a_list
['Ω', 'a', 2.0, 3, True, 'four', 'Ω']
63
1. The + operator concatenates lists to create a new list. A list can contain any number of items; there is no
size limit (other than available memory). However, if memory is a concern, you should be aware that list
concatenation creates a second list in memory. In this case, that new list is immediately assigned to the
existing variable a_list. So this line of code is really a two-step process — concatenation then
assignment — which can (temporarily) consume a lot of memory when you’re dealing with large lists.
2. A list can contain items of any datatype, and the items in a single list don’t all need to be the same type.
Here we have a list containing a string, a floating point number, and an integer.
3. The append() method adds a single item to the end of the list. (Now we have four different datatypes in the
list!)
4. Lists are implemented as classes. “Creating” a list is really instantiating a class. As such, a list has methods
that operate on it. The extend() method takes one argument, a list, and appends each of the items of the
argument to the original list.
5. The insert() method inserts a single item into a list. The first argument is the index of the first item in the
list that will get bumped out of position. List items do not need to be unique; for example, there are now
two separate items with the value 'Ω': the first item, a_list[0], and the last item, a_list[6].
☞ a_list.insert(0, value) is like the unshift() function in Perl. It adds an item to
the beginning of the list, and all the other items have their positional index bumped
up to make room.
Let’s look closer at the difference between append() and extend().
64
>>> a_list = ['a', 'b', 'c']
>>> a_list.extend(['d', 'e', 'f'])
①
>>> a_list
['a', 'b', 'c', 'd', 'e', 'f']
>>> len(a_list)
②
6
>>> a_list[-1]
'f'
>>> a_list.append(['g', 'h', 'i'])
③
>>> a_list
['a', 'b', 'c', 'd', 'e', 'f', ['g', 'h', 'i']]
>>> len(a_list)
④
7
>>> a_list[-1]
['g', 'h', 'i']
1. The extend() method takes a single argument, which is always a list, and adds each of the items of that list
to a_list.
2. If you start with a list of three items and extend it with a list of another three items, you end up with a list
of six items.
3. On the other hand, the append() method takes a single argument, which can be any datatype. Here, you’re
calling the append() method with a list of three items.
4. If you start with a list of six items and append a list onto it, you end up with... a list of seven items. Why
seven? Because the last item (which you just appended) is itself a list. Lists can contain any type of data,
including other lists. That may be what you want, or it may not. But it’s what you asked for, and it’s what
you got.
65
2.4.4. SEARCHING FOR VALUES IN A LIST
>>> a_list = ['a', 'b', 'new', 'mpilgrim', 'new']
>>> a_list.count('new')
①
2
>>> 'new' in a_list
②
True
>>> 'c' in a_list
False
>>> a_list.index('mpilgrim')
③
3
>>> a_list.index('new')
④
2
>>> a_list.index('c')
⑤
Traceback (innermost last):
File "<interactive input>", line 1, in ?
ValueError: list.index(x): x not in list
1. As you might expect, the count() method returns the number of occurrences of a specific value in a list.
2. If all you want to know is whether a value is in the list or not, the in operator is slightly faster than using
the count() method. The in operator always returns True or False; it will not tell you how many times
the value appears in the list.
3. Neither the in operator nor the count() method will tell you where in the list a value appears. If you need
to know where in the list a value is, call the index() method. By default it will search the entire list,
although you can specify an optional second argument of the (0-based) index to start from, and even an
optional third argument of the (0-based) index to stop searching.
4. The index() method finds the first occurrence of a value in the list. In this case, 'new' occurs twice in the
list, in a_list[2] and a_list[4], but the index() method will return only the index of the first
occurrence.
5. As you might not expect, if the value is not found in the list, the index() method will raise an exception.
Wait, what? That’s right: the index() method raises an exception if it doesn’t find the value in the list. This
is notably different from most languages, which will return some invalid index (like -1). While this may seem
annoying at first, I think you will come to appreciate it. It means your program will crash at the source of
66
the problem instead of failing strangely and silently later. Remember, -1 is a valid list index. If the index() method returned -1, that could lead to some not-so-fun debugging sessions!
2.4.5. REMOVING ITEMS FROM A LIST
Lists can expand and contract automatically. You’ve seen
the expansion part. There are several different ways to
remove items from a list as well.
Lists never
have gaps.
>>> a_list = ['a', 'b', 'new', 'mpilgrim', 'new']
>>> a_list[1]
'b'
>>> del a_list[1]
①
>>> a_list
['a', 'new', 'mpilgrim', 'new']
>>> a_list[1]
②
'new'
1. You can use the del statement to delete a specific item from a list.
2. Accessing index 1 after deleting index 1 does not result in an error. All items after the deleted item shift
their positional index to “fill the gap” created by deleting the item.
Don’t know the positional index? Not a problem; you can remove items by value instead.
67
>>> a_list.remove('new')
①
>>> a_list
['a', 'mpilgrim', 'new']
>>> a_list.remove('new')
②
>>> a_list
['a', 'mpilgrim']
>>> a_list.remove('new')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.remove(x): x not in list
1. You can also remove an item from a list with the remove() method. The remove() method takes a value
and removes the first occurrence of that value from the list. Again, all items after the deleted item will have
their positional indices bumped down to “fill the gap.” Lists never have gaps.
2. You can call the remove() method as often as you like, but it will raise an exception if you try to remove a
value that isn’t in the list.
2.4.6. REMOVING ITEMS FROM A LIST: BONUS ROUND
Another interesting list method is pop(). The pop() method is yet another way to remove items from a list,
but with a twist.
68
>>> a_list = ['a', 'b', 'new', 'mpilgrim']
>>> a_list.pop()
①
'mpilgrim'
>>> a_list
['a', 'b', 'new']
>>> a_list.pop(1)
②
'b'
>>> a_list
['a', 'new']
>>> a_list.pop()
'new'
>>> a_list.pop()
'a'
>>> a_list.pop()
③
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: pop from empty list
1. When called without arguments, the pop() list method removes the last item in the list and returns the value
it removed.
2. You can pop arbitrary items from a list. Just pass a positional index to the pop() method. It will remove
that item, shift all the items after it to “fill the gap,” and return the value it removed.
3. Calling pop() on an empty list raises an exception.
☞ Calling the pop() list method without an argument is like the pop() function in Perl.
It removes the last item from the list and returns the value of the removed item.
Perl has another function, shift(), which removes the first item and returns its
value; in Python, this is equivalent to a_list.pop(0).
69
2.4.7. LISTS IN A BOOLEAN CONTEXT
You can also use a list in a boolean context, such as an
if statement.
>>> def is_it_true(anything):
...
if anything:
Empty lists
...
print("yes, it's true")
...
else:
are false; all
...
print("no, it's false")
...
other lists
>>> is_it_true([])
①
no, it's false
are true.
>>> is_it_true(['a'])
②
yes, it's true
>>> is_it_true([False])
③
yes, it's true
1. In a boolean context, an empty list is false.
2. Any list with at least one item is true.
3. Any list with at least one item is true. The value of the items is irrelevant.
⁂
2.5. TUPLES
A tuple is an immutable list. A tuple can not be changed in any way once it is created.
70
>>> a_tuple = ("a", "b", "mpilgrim", "z", "example")
①
>>> a_tuple
('a', 'b', 'mpilgrim', 'z', 'example')
>>> a_tuple[0]
②
'a'
>>> a_tuple[-1]
③
'example'
>>> a_tuple[1:3]
④
('b', 'mpilgrim')
1. A tuple is defined in the same way as a list, except that the whole set of elements is enclosed in parentheses
instead of square brackets.
2. The elements of a tuple have a defined order, just like a list. Tuple indices are zero-based, just like a list, so
the first element of a non-empty tuple is always a_tuple[0].
3. Negative indices count from the end of the tuple, just like a list.
4. Slicing works too, just like a list. When you slice a list, you get a new list; when you slice a tuple, you get a
new tuple.
The major difference between tuples and lists is that tuples can not be changed. In technical terms, tuples
are immutable. In practical terms, they have no methods that would allow you to change them. Lists have
methods like append(), extend(), insert(), remove(), and pop(). Tuples have none of these methods.
You can slice a tuple (because that creates a new tuple), and you can check whether a tuple contains a
particular value (because that doesn’t change the tuple), and… that’s about it.
71
# continued from the previous example
>>> a_tuple
('a', 'b', 'mpilgrim', 'z', 'example')
>>> a_tuple.append("new")
①
Traceback (innermost last):
File "<interactive input>", line 1, in ?
AttributeError: 'tuple' object has no attribute 'append'
>>> a_tuple.remove("z")
②
Traceback (innermost last):
File "<interactive input>", line 1, in ?
AttributeError: 'tuple' object has no attribute 'remove'
>>> a_tuple.index("example")
③
4
>>> "z" in a_tuple
④
True
1. You can’t add elements to a tuple. Tuples have no append() or extend() method.
2. You can’t remove elements from a tuple. Tuples have no remove() or pop() method.
3. You can find elements in a tuple, since this doesn’t change the tuple.
4. You can also use the in operator to check if an element exists in the tuple.
So what are tuples good for?
• Tuples are faster than lists. If you’re defining a constant set of values and all you’re ever going to do with it
is iterate through it, use a tuple instead of a list.
• It makes your code safer if you “write-protect” data that doesn’t need to be changed. Using a tuple instead
of a list is like having an implied assert statement that shows this data is constant, and that special thought
(and a specific function) is required to override that.
• Some tuples can be used as dictionary keys (specifically, tuples that contain immutable values like strings,
numbers, and other tuples). Lists can never be used as dictionary keys, because lists are not immutable.
☞
72
Tuples can be converted into lists, and vice-versa. The built-in tuple() function takes
a list and returns a tuple with the same elements, and the list() function takes a
tuple and returns a list. In effect, tuple() freezes a list, and list() thaws a tuple.
2.5.1. TUPLES IN A BOOLEAN CONTEXT
You can use tuples in a boolean context, such as an if statement.
>>> def is_it_true(anything):
...
if anything:
...
print("yes, it's true")
...
else:
...
print("no, it's false")
...
>>> is_it_true(())
①
no, it's false
>>> is_it_true(('a', 'b'))
②
yes, it's true
>>> is_it_true((False,))
③
yes, it's true
>>> type((False))
④
<class 'bool'>
>>> type((False,))
<class 'tuple'>
1. In a boolean context, an empty tuple is false.
2. Any tuple with at least one item is true.
3. Any tuple with at least one item is true. The value of the items is irrelevant. But what’s that comma doing
there?
4. To create a tuple of one item, you need a comma after the value. Without the comma, Python just assumes
you have an extra pair of parentheses, which is harmless, but it doesn’t create a tuple.
73