预计阅读本页时间:-
(The Organism Is Written in the Egg)
What lies at the heart of every living thing is not a fire, not warm breath, not a “spark of life.” It is information, words, instructions. If you want a metaphor, don’t think of fires and sparks and breath. Think, instead, of a billion discrete, digital characters carved in tablets of crystal.
—Richard Dawkins (1986)♦
SCIENTISTS LOVE THEIR FUNDAMENTAL PARTICLES. If traits are handed down from one generation to the next, these traits must take some primal form or have some carrier. Hence the putative particle of protoplasm. “The biologist must be allowed as much scientific use of the imagination as the physicist,” The Popular Science Monthly explained in 1875. “If the one must have his atoms and molecules, the other must have his physiological units, his plastic molecules, his ‘plasticules.’ ”♦
广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元
Plasticule did not catch on, and almost everyone had the wrong idea about heredity anyway. So in 1910 a Danish botanist, Wilhelm Johannsen, self-consciously invented the word gene. He was at pains to correct the common mythology and thought a word might help. The myth was this: that “personal qualities” are transmitted from parent to progeny. This is “the most naïve and oldest conception of heredity,”♦ Johanssen said in a speech to the American Society of Naturalists. It was understandable. If father and daughter are fat, people might be tempted to think that his fatness caused hers, or that he passed it on to her. But that is wrong. As Johannsen declared, “The personal qualities of any individual organism do not at all cause the qualities of its offspring; but the qualities of both ancestor and descendent are in quite the same manner determined by the nature of the ‘sexual substances’—i.e., the gametes—from which they have developed.” What is inherited is more abstract, more in the nature of potentiality.
To banish the fallacious thinking, he proposed a new terminology, beginning with gene: “nothing but a very applicable little word, easily combined with others.”♦ It hardly mattered that neither he nor anyone else knew what a gene actually was; “it may be useful as an expression for the ‘unit-factors,’ ‘elements,’ or ‘allelomorphs.’… As to the nature of the ‘genes’ it is as yet of no value to propose a hypothesis.” Gregor Mendel’s years of research with green and yellow peas showed that such a thing must exist. Colors and other traits vary depending on many factors, such as temperature and soil content, but something is preserved whole; it does not blend or diffuse; it must be quantized.♦ Mendel had discovered the gene, though he did not name it. For him it was more an algebraic convenience than a physical entity.
When Schrödinger contemplated the gene, he faced a problem. How could such a “tiny speck of material” contain the entire complex code-script that determines the elaborate development of the organism? To resolve the difficulty Schrödinger summoned an example not from wave mechanics or theoretical physics but from telegraphy: Morse code. He noted that two signs, dot and dash, could be combined in well-ordered groups to generate all human language. Genes, too, he suggested, must employ a code: “The miniature code should precisely correspond with a highly complicated and specified plan of development and should somehow contain the means to put it into action.”♦
Codes, instructions, signals—all this language, redolent of machinery and engineering, pressed in on biologists like Norman French invading medieval English. In the 1940s the jargon had a precious, artificial feeling, but that soon passed. The new molecular biology began to examine information storage and information transfer. Biologists could count in terms of “bits.” Some of the physicists now turning to biology saw information as exactly the concept needed to discuss and measure biological qualities for which tools had not been available: complexity and order, organization and specificity.♦ Henry Quastler, an early radiologist from Vienna, then at the University of Illinois, was applying information theory to both biology and psychology; he estimated that an amino acid has the information content of a written word and a protein molecule the information content of a paragraph. His colleague Sidney Dancoff suggested to him in 1950 that a chromosomal thread is “a linear coded tape of information”♦:
The entire thread constitutes a “message.” This message can be broken down into sub-units which may be called “paragraphs,” “words,” etc. The smallest message unit is perhaps some flip-flop which can make a yes-no decision.
In 1952 Quastler organized a symposium on information theory in biology, with no purpose but to deploy these new ideas—entropy, noise, messaging, differentiating—in areas from cell structure and enzyme catalysis to large-scale “biosystems.” One researcher constructed an estimate of the number of bits represented by a single bacterium: as much as 1013.♦ (But that was the number needed to describe its entire molecular structure in three dimensions—perhaps there was a more economical description.) The growth of the bacterium could be analyzed as a reduction in the entropy of its part of the universe. Quastler himself wanted to take the measure of higher organisms in terms of information content: not in terms of atoms (“this would be extremely wasteful”) but in terms of “hypothetical instructions to build an organism.”♦ This brought him, of course, to genes.
The whole set of instructions—situated “somewhere in the chromosomes”—is the genome. This is a “catalogue,” he said, containing, if not all, then at least “a substantial fraction of all information about an adult organism.” He emphasized, though, how little was known about genes. Were they discrete physical entities, or did they overlap? Were they “independent sources of information” or did they affect one another? How many were there? Multiplying all these unknowns, he arrived at a result:
that the essential complexity of a single cell and of a whole man are both not more than 1012 nor less than 105 bits; this is an extremely coarse estimate, but is better than no estimate at all.♦
These crude efforts led to nothing, directly. Shannon’s information theory could not be grafted onto biology whole. It hardly mattered. A seismic shift was already under way: from thinking about energy to thinking about information.
Across the Atlantic, an odd little letter arrived at the offices of the journal Nature in London in the spring of 1953, with a list of signatories from Paris, Zurich, Cambridge, and Geneva, most notably Boris Ephrussi, France’s first professor of genetics.♦ The scientists complained of “what seems to us a rather chaotic growth in technical vocabulary.” In particular, they had seen genetic recombination in bacteria described as “transformation,” “induction,” “transduction,” and even “infection.” They proposed to simplify matters:
As a solution to this confusing situation, we would like to suggest the use of the term “interbacterial information” to replace those above. It does not imply necessarily the transfer of material substances, and recognizes the possible future importance of cybernetics at the bacterial level.
This was the product of a wine-flushed lakeside lunch at Locarno, Switzerland—meant as a joke, but entirely plausible to the editors of Nature, who published it forthwith.♦ The youngest of the lunchers and signers was a twenty-five-year-old American named James Watson.
The very next issue of Nature carried another letter from Watson, along with his collaborator, Francis Crick. It made them famous. They had found the gene.
A consensus had emerged that whatever genes were, however they functioned, they would probably be proteins: giant organic molecules made of long chains of amino acids. Alternatively, a few geneticists in the 1940s focused instead on simple viruses—phages. Then again, experiments on heredity in bacteria had persuaded a few researchers, Watson and Crick among them, that genes might lie in a different substance, which, for no known reason, was found within the nucleus of every cell, plant and animal, phages included.♦ This substance was a nucleic acid, particularly deoxyribonucleic acid, or DNA. The people working with nucleic acids, mainly chemists, had not been able to learn much about it, except that the molecules were built up from smaller units, called nucleotides. Watson and Crick thought this must be the secret, and they raced to figure out its structure at the Cavendish Laboratory in Cambridge. They could not see these molecules; they could only seek clues in the shadows cast by X-ray diffraction. But they knew a great deal about the subunits. Each nucleotide contained a “base,” and there were just four different bases, designated as A, C, G, and T. They came in strictly predictable proportions. They must be the letters of the code. The rest was trial and error, fired by imagination.
What they discovered became an icon: the double helix, heralded on magazine covers, emulated in sculpture. DNA is formed of two long sequences of bases, like ciphers coded in a four-letter alphabet, each sequence complementary to the other, coiled together. Unzipped, each strand may serve as a template for replication. (Was it Schrödinger’s “aperiodic crystal”? In terms of physical structure, X-ray diffraction showed DNA to be entirely regular. The aperiodicity lies at the abstract level of language—the sequence of “letters.”) In the local pub, Crick, ebullient, announced to anyone who would listen that they had discovered “the secret of life”; in their one-page note in Nature they were more circumspect. They ended with a remark that has been called “one of the most coy statements in the literature of science”♦:
It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.♦
They dispensed with the timidity in another paper a few weeks later. In each chain the sequence of bases appeared to be irregular—any sequence was possible, they observed. “It follows that in a long molecule many different permutations are possible.”♦ Many permutations—many possible messages. Their next remark set alarms sounding on both sides of the Atlantic: “It therefore seems likely that the precise sequence of the bases is the code which carries the genetical information.” In using these terms, code and information, they were no longer speaking figuratively.
The macromolecules of organic life embody information in an intricate structure. A single hemoglobin molecule comprises four chains of polypeptides, two with 141 amino acids and two with 146, in strict linear sequence, bonded and folded together. Atoms of hydrogen, oxygen, carbon, and iron could mingle randomly for the lifetime of the universe and be no more likely to form hemoglobin than the proverbial chimpanzees to type the works of Shakespeare. Their genesis requires energy; they are built up from simpler, less patterned parts, and the law of entropy applies. For earthly life, the energy comes as photons from the sun. The information comes via evolution.
The DNA molecule was special: the information it bears is its only function. Having recognized this, microbiologists turned to the problem of deciphering the code. Crick, who had been inspired to leave physics for biology when he read Schrödinger’s What Is Life?, sent Schrödinger a copy of the paper but did not receive a reply.
On the other hand, George Gamow saw the Watson-Crick report when he was visiting the Radiation Laboratory at Berkeley. Gamow was a Ukrainian-born cosmologist—an originator of the Big Bang theory—and he knew a big idea when he saw one. He sent off a letter:
Dear Drs. Watson & Crick,
I am a physicist, not a biologist.… But I am very much excited by your article in May 30th Nature, and think that brings Biology over into the group of “exact” sciences.… If your point of view is correct each organism will be characterized by a long number written in quadrucal (?) system with figures 1, 2, 3, 4 standing for different bases.… This would open a very exciting possibility of theoretical research based on combinatorix and the theory of numbers!… I have a feeling this can be done. What do you think?♦
For the next decade, the struggle to understand the genetic code consumed a motley assortment of the world’s great minds, many of them, like Gamow, lacking any useful knowledge of biochemistry. For Watson and Crick, the initial problem had depended on a morass of specialized particulars: hydrogen bonds, salt linkages, phosphate-sugar chains with deoxyribofuranose residues. They had to learn how inorganic ions could be organized in three dimensions; they had to calculate exact angles of chemical bonds. They made models out of cardboard and tin plates. But now the problem was being transformed into an abstract game of symbol manipulation. Closely linked to DNA, its single-stranded cousin, RNA, appeared to play the role of messenger or translator. Gamow said explicitly that the underlying chemistry hardly mattered. He and others who followed him understood this as a puzzle in mathematics—a mapping between messages in different alphabets. If this was a coding problem, the tools they needed came from combinatorics and information theory. Along with physicists, they consulted cryptanalysts.
Gamow himself began impulsively by designing a combinatorial code. As he saw it, the problem was to get from the four bases in DNA to the twenty known amino acids in proteins—a code, therefore, with four letters and twenty words.♦ Pure combinatorics made him think of nucleotide triplets: three-letter words. He had a detailed solution—soon known as his “diamond code”—published in Nature within a few months. A few months after that, Crick showed this to be utterly wrong: experimental data on protein sequences ruled out the diamond code. But Gamow was not giving up. The triplet idea was seductive. An unexpected cast of scientists joined the hunt: Max Delbrück, an ex-physicist now at Caltech in biology; his friend Richard Feynman, the quantum theorist; Edward Teller, the famous bomb maker; another Los Alamos alumnus, the mathematician Nicholas Metropolis; and Sydney Brenner, who joined Crick at the Cavendish.
They all had different coding ideas. Mathematically the problem seemed daunting even to Gamow. “As in the breaking of enemy messages during the war,” he wrote in 1954, “the success depends on the available length of the coded text. As every intelligence officer will tell you, the work is very hard, and the success depends mostly on luck.… I am afraid that the problem cannot be solved without the help of electronic computer.”♦ Gamow and Watson decided to make it a club: the RNA Tie Club, with exactly twenty members. Each member received a woolen tie in black and green, made to Gamow’s design by a haberdasher in Los Angeles. The game playing aside, Gamow wanted to create a communication channel to bypass journal publication. News in science had never moved so fast. “Many of the essential concepts were first proposed in informal discussions on both sides of the Atlantic and were then quickly broadcast to the cognoscenti,” said another member, Gunther Stent, “by private international bush telegraph.”♦ There were false starts, wild guesses, and dead ends, and the established biochemistry community did not always go along willingly.
“People didn’t necessarily believe in the code,” Crick said later. “The majority of biochemists simply weren’t thinking along those lines. It was a completely novel idea, and moreover they were inclined to think it was oversimplified.”♦ They thought the way to understand proteins would be to study enzyme systems and the coupling of peptide units. Which was reasonable enough.
They thought protein synthesis couldn’t be a simple matter of coding from one thing to another; that sounded too much like something a physicist had invented. It didn’t sound like biochemistry to them.… So there was a certain resistance to simple ideas like three nucleotides’ coding an amino acid; people thought it was rather like cheating.
Gamow, at the other extreme, was bypassing the biochemical details to put forward an idea of shocking simplicity: that any living organism is determined by “a long number written in a four-digital system.”♦ He called this “the number of the beast” (from Revelation). If two beasts have the same number, they are identical twins.
By now the word code was so deeply embedded in the conversation that people seldom paused to notice how extraordinary it was to find such a thing—abstract symbols representing arbitrarily different abstract symbols—at work in chemistry, at the level of molecules. The genetic code performed a function with uncanny similarities to the metamathematical code invented by Gödel for his philosophical purposes. Gödel’s code substitutes plain numbers for mathematical expressions and operations; the genetic code uses triplets of nucleotides to represent amino acids. Douglas Hofstadter was the first to make this connection explicitly, in the 1980s: “between the complex machinery in a living cell that enables a DNA molecule to replicate itself and the clever machinery in a mathematical system that enables a formula to say things about itself.”♦ In both cases he saw a twisty feedback loop. “Nobody had ever in the least suspected that one set of chemicals could code for another set,” Hofstadter wrote.
Indeed, the very idea is somewhat baffling: If there is a code, then who invented it? What kinds of messages are written in it? Who writes them? Who reads them?
The Tie Club recognized that the problem was not just information storage but information transfer. DNA serves two different functions. First, it preserves information. It does this by copying itself, from generation to generation, spanning eons—a Library of Alexandria that keeps its data safe by copying itself billions of times. Notwithstanding the beautiful double helix, this information store is essentially one-dimensional: a string of elements arrayed in a line. In human DNA, the nucleotide units number more than a billion, and this detailed gigabit message must be conserved perfectly, or almost perfectly. Second, however, DNA also sends that information outward for use in the making of the organism. The data stored in a one-dimensional strand has to flower forth in three dimensions. This information transfer occurs via messages passing from the nucleic acids to proteins. So DNA not only replicates itself; separately, it dictates the manufacture of something entirely different. These proteins, with their own enormous complexity, serve as the material of a body, the mortar and bricks, and also as the control system, the plumbing and wiring and the chemical signals that control growth.
The replication of DNA is a copying of information. The manufacture of proteins is a transfer of information: the sending of a message. Biologists could see this clearly now, because the message was now well defined and abstracted from any particular substrate. If messages could be borne upon sound waves or electrical pulses, why not by chemical processes?
Gamow framed the issue simply: “The nucleus of a living cell is a storehouse of information.”♦ Furthermore, he said, it is a transmitter of information. The continuity of all life stems from this “information system”; the proper study of genetics is “the language of the cells.”
When Gamow’s diamond code proved wrong, he tried a “triangle code,” and more variations followed—also wrong. Triplet codons remained central, and a solution seemed tantalizingly close but out of reach. A problem was how nature punctuated the seemingly unbroken DNA and RNA strands. No one could see a biological equivalent for the pauses that separate letters in Morse code, or the spaces that separate words. Perhaps every fourth base was a comma. Or maybe (Crick suggested) commas would be unnecessary if some triplets made “sense” and others made “nonsense.”♦ Then again, maybe a sort of tape reader just needed to start at a certain point and count off the nucleotides three by three. Among the mathematicians drawn to this problem were a group at the new Jet Propulsion Laboratory in Pasadena, California, meant to be working on aerospace research. To them it looked like a classic problem in Shannon coding theory: “the sequence of nucleotides as an infinite message, written without punctuation, from which any finite portion must be decodable into a sequence of amino acids by suitable insertion of commas.”♦ They constructed a dictionary of codes. They considered the problem of misprints.
Biochemistry did matter. All the world’s cryptanalysts, lacking petri dishes and laboratory kitchens, would not have been able to guess from among the universe of possible answers. When the genetic code was solved, in the early 1960s, it turned out to be full of redundancy. Much of the mapping from nucleotides to amino acids seemed arbitrary—not as neatly patterned as any of Gamow’s proposals. Some amino acids correspond to just one codon, others to two, four, or six. Particles called ribosomes ratchet along the RNA strand and translate it, three bases at a time. Some codons are redundant; some actually serve as start signals and stop signals. The redundancy serves exactly the purpose that an information theorist would expect. It provides tolerance for errors. Noise affects biological messages like any other. Errors in DNA—misprints—are mutations.
Even before the exact answer was reached, Crick crystallized its fundamental principles in a statement that he called (and is called to this day) the Central Dogma. It is a hypothesis about the direction of evolution and the origin of life; it is provable in terms of Shannon entropy in the possible chemical alphabets:
Once “information” has passed into protein it cannot get out again. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence.♦
The genetic message is independent and impenetrable: no information from events outside can change it.
Information had never been writ so small. Here is scripture at angstrom scale, published where no one can see, the Book of Life in the eye of a needle.
Omne vivum ex ovo. “The complete description of the organism is already written in the egg,”♦ said Sydney Brenner to Horace Freeland Judson, molecular biology’s great chronicler, at Cambridge in the winter of 1971. “Inside every animal there is an internal description of that animal.… What is going to be difficult is the immense amount of detail that will have to be subsumed. The most economical language of description is the molecular, genetic description that is already there. We do not yet know, in that language, what the names are. What does the organism name to itself? We cannot say that an organism has, for example, a name for a finger. There’s no guarantee that in making a hand, the explanation can be couched in the terms we use for making a glove.”
Brenner was in a thoughtful mood, drinking sherry before dinner at King’s College. When he began working with Crick, less than two decades before, molecular biology did not even have a name. Two decades later, in the 1990s, scientists worldwide would undertake the mapping of the entire human genome: perhaps 20,000 genes, 3 billion base pairs. What was the most fundamental change? It was a shift of the frame, from energy and matter to information.
“All of biochemistry up to the fifties was concerned with where you get the energy and the materials for cell function,” Brenner said. “Biochemists only thought about the flux of energy and the flow of matter. Molecular biologists started to talk about the flux of information. Looking back, one can see that the double helix brought the realization that information in biological systems could be studied in much the same way as energy and matter.…
“Look,” he told Judson, “let me give you an example. If you went to a biologist twenty years ago and asked him, How do you make a protein, he would have said, Well, that’s a horrible problem, I don’t know … but the important question is where do you get the energy to make the peptide bond. Whereas the molecular biologist would have said, That’s not the problem, the important problem is where do you get the instructions to assemble the sequence of amino acids, and to hell with the energy; the energy will look after itself.”
By this time, the technical jargon of biologists included the words alphabet, library, editing, proofreading, transcription, translation, nonsense, synonym, and redundancy. Genetics and DNA had drawn the attention not just of cryptographers but of classical linguists. Certain proteins, capable of flipping from one relatively stable state to another, were found to act as relays, accepting ciphered commands and passing them to their neighbors—switching stations in three-dimensional communications networks. Brenner, looking forward, thought the focus would turn to computer science as well. He envisioned a science—though it did not yet have a name—of chaos and complexity. “I think in the next twenty-five years we are going to have to teach biologists another language still,” he said. “I don’t know what it’s called yet; nobody knows. But what one is aiming at, I think, is the fundamental problem of the theory of elaborate systems.” He recalled John von Neumann, at the dawn of information theory and cybernetics, proposing to understand biological processes and mental processes in terms of how a computing machine might operate. “In other words,” said Brenner, “where a science like physics works in terms of laws, or a science like molecular biology, to now, is stated in terms of mechanisms, maybe now what one has to begin to think of is algorithms. Recipes. Procedures.”
If you want to know what a mouse is, ask instead how you could build a mouse. How does the mouse build itself? The mouse’s genes switch one another on and off and perform computation, in steps. “I feel that this new molecular biology has to go in this direction—to explore the high-level logical computers, the programs, the algorithms of development.…
“One would like to be able to fuse the two—to be able to move between the molecular hardware and the logical software of how it’s all organized, without feeling they are different sciences.”
Even now—or especially now—the gene was not what it seemed. Having begun as a botanist’s hunch and an algebraic convenience, it had been tracked down to the chromosome and revealed as molecular coiled strands. It was decoded, enumerated, and catalogued. And then, in the heyday of molecular biology, the idea of the gene broke free of its moorings once again.
The more was known, the harder it was to define. Is a gene nothing more or less than DNA? Is it made of DNA, or is it something carried in DNA? Is it properly pinned down as a material thing at all?
Not everyone agreed there was a problem. Gunther Stent declared in 1977 that one of the field’s great triumphs was the “unambiguous identification” of the Mendelian gene as a particular length of DNA. “It is in this sense that all working geneticists now employ the term ‘gene,’ ”♦ he wrote. To put it technically but succinctly: “The gene is, in fact, a linear array of DNA nucleotides which determines a linear array of protein amino acids.” It was Seymour Benzer, said Stent, who established that definitively.
Yet Benzer himself had not been quite so sanguine. He argued as early as 1957 that the classical gene was dead. It was a concept trying to serve three purposes at once—as a unit of recombination, of mutation, and of function—and already he had strong reason to suspect that these were incompatible. A strand of DNA carries many base pairs, like beads on a string or letters in a sentence; as a physical object it could not be called an elementary unit. Benzer offered a batch of new particle names: “recon,” for the smallest unit that can be interchanged by recombination; “muton,” for the smallest unit of mutational change (a single base pair); and “cistron” for the unit of function—which in turn, he admitted, was difficult to define. “It depends upon what level of function is meant,” he wrote—perhaps just the specification of an amino acid, or perhaps a whole ensemble of steps “leading to one particular physiological end-effect.”♦ Gene was not going away, but that was a lot of weight for one little word to bear.
Part of what was happening was a collision between molecular biology and evolutionary biology, as studied in fields from botany to paleontology. It was as fruitful a collision as any in the history of science—before long, neither side could move forward without the other—but on the way some sparks flared. Quite of few of them were set off by a young zoologist at Oxford, Richard Dawkins. It seemed to Dawkins that many of his colleagues were looking at life the wrong way round.
As molecular biology perfected its knowledge of the details of DNA and grew more skillful in manipulating these molecular prodigies, it was natural to see them as the answer to the great question of life: how do organisms reproduce themselves? We use DNA, just as we use lungs to breathe and eyes to see. We use it. “This attitude is an error of great profundity,”♦ Dawkins wrote. “It is the truth turned crashingly on its head.” DNA came first—by billions of years—and DNA comes first, he argued, when life is viewed from the proper perspective. From that perspective, genes are the focus, the sine qua non, the star of the show. In his first book—published in 1976, meant for a broad audience, provocatively titled The Selfish Gene—he set off decades of debate by declaring: “We are survival machines—robot vehicles blindly programmed to preserve the selfish molecules known as genes.”♦ He said this was a truth he had known for years.
Genes, not organisms, are the true units of natural selection. They began as “replicators”—molecules formed accidentally in the primordial soup, with the unusual property of making copies of themselves.
They are past masters of the survival arts. But do not look for them floating loose in the sea; they gave up that cavalier freedom long ago. Now they swarm in huge colonies, safe inside gigantic lumbering robots, sealed off from the outside world, communicating with it by tortuous indirect routes, manipulating it by remote control. They are in you and in me; they created us, body and mind; and their preservation is the ultimate rationale for our existence. They have come a long way, those replicators. Now they go by the name of genes, and we are their survival machines.♦
This was guaranteed to raise the hackles of organisms who thought of themselves as more than robots. “English biologist Richard Dawkins has recently raised my hackles,” wrote Stephen Jay Gould in 1977, “with his claim that genes themselves are units of selection, and individuals merely their temporary receptacles.”♦ Gould had plenty of company. Speaking for many molecular biologists, Gunther Stent dismissed Dawkins as “a thirty-six-year-old student of animal behavior” and filed him under “the old prescientific tradition of animism, under which natural objects are endowed with souls.”♦
Yet Dawkins’s book was brilliant and transformative. It established a new, multilayered understanding of the gene. At first, the idea of the selfish gene seemed like a trick of perspective, or a joke. Samuel Butler had said a century earlier—and did not claim to be the first—that a hen is only an egg’s way of making another egg. Butler was quite serious, in his way:
Every creature must be allowed to “run” its own development in its own way; the egg’s way may seem a very roundabout manner of doing things; but it is its way, and it is one of which man, upon the whole, has no great reason to complain. Why the fowl should be considered more alive than the egg, and why it should be said that the hen lays the egg, and not that the egg lays the hen, these are questions which lie beyond the power of philosophic explanation, but are perhaps most answerable by considering the conceit of man, and his habit, persisted in during many ages, of ignoring all that does not remind him of himself.♦
He added, “But, perhaps, after all, the real reason is, that the egg does not cackle when it has laid the hen.” Some time later, Butler’s template, X is just a Y’s way of making another Y, began reappearing in many forms. “A scholar,” said Daniel Dennett in 1995, “is just a library’s way of making another library.”♦ Dennett, too, was not entirely joking.
It was prescient of Butler in 1878 to mock a man-centered view of life, but he had read Darwin and could see that all creation had not been designed in behalf of Homo sapiens. “Anthropocentrism is a disabling vice of the intellect,”♦ Edward O. Wilson said a century later, but Dawkins was purveying an even more radical shift of perspective. He was not just nudging aside the human (and the hen) but the organism, in all its multifarious glory. How could biology not be the study of organisms? If anything, he understated the difficulty when he wrote, “It requires a deliberate mental effort to turn biology the right way up again, and remind ourselves that the replicators come first, in importance as well as in history.”♦
A part of Dawkins’s purpose was to explain altruism: behavior in individuals that goes against their own best interests. Nature is full of examples of animals risking their own lives in behalf of their progeny, their cousins, or just fellow members of their genetic club. Furthermore, they share food; they cooperate in building hives and dams; they doggedly protect their eggs. To explain such behavior—to explain any adaptation, for that matter—one asks the forensic detective’s question, cui bono? Who benefits when a bird spots a predator and cries out, warning the flock but also calling attention to itself? It is tempting to think in terms of the good of the group—the family, tribe, or species—but most theorists agree that evolution does not work that way. Natural selection can seldom operate at the level of groups. It turns out, however, that many explanations fall neatly into place if one thinks of the individual as trying to propagate its particular assortment of genes down through the future. Its species shares most of those genes, of course, and its kin share even more. Of course, the individual does not know about its genes. It is not consciously trying to do any such thing. Nor, of course, would anyone impute intention to the gene itself—tiny brainless entity. But it works quite well, as Dawkins showed, to flip perspectives and say that the gene works to maximize its own replication. For example, a gene “might ensure its survival by tending to endow the successive bodies with long legs, which help those bodies escape from predators.”♦ A gene might maximize its own numbers by giving an organism the instinctive impulse to sacrifice its life to save its offspring: the gene itself, the particular clump of DNA, dies with its creature, but copies of the gene live on. The process is blind. It has no foresight, no intention, no knowledge. The genes, too, are blind: “They do not plan ahead,”♦ says Dawkins. “Genes just are, some genes more so than others, and that is all there is to it.”
The history of life begins with the accidental appearance of molecules complex enough to serve as building blocks—replicators. The replicator is an information carrier. It survives and spreads by copying itself. The copies must be coherent and reliable but need not be perfect; on the contrary, for evolution to proceed, errors must appear. Replicators could exist long before DNA, even before proteins. In one scenario, proposed by the Scots biologist Alexander Cairns-Smith, replicators appeared in sticky layers of clay crystals: complex molecules of silicate minerals. In other models the evolutionary playground is the more traditional “primordial soup.” Either way, some of these information-bearing macromolecules disintegrate more quickly than others; some make more or better copies; some have the chemical effect of breaking up competing molecules. Absorbing photon energy like the miniature Maxwell’s demons they are, molecules of ribonucleic acid, RNA, catalyze the formation of bigger and more information-rich molecules. DNA, ever so slightly more stable, possesses the dual capability of copying itself while also manufacturing another sort of molecule, and this provides a special advantage. It can protect itself by building a shell of proteins around it. This is Dawkins’s “survival machine”—first cells, then larger and larger bodies, with growing inventories of membranes and tissues and limbs and organs and skills. They are the genes’ fancy vehicles, racing against other vehicles, converting energy, and even processing information. In the game of survival some vehicles outplay, outmaneuver, and outpropagate others.
It took some time, but the gene-centered, information-based perspective led to a new kind of detective work in tracing the history of life. Where paleontologists look back through the fossil record for skeletal precursors of wings and tails, molecular biologists and biophysicists look for telltale relics of DNA in hemoglobin, oncogenes, and all the rest of the library of proteins and enzymes. “There is a molecular archeology in the making,”♦ says Werner Loewenstein. The history of life is written in terms of negative entropy. “What actually evolves is information in all its forms or transforms. If there were something like a guidebook for living creatures, I think, the first line would read like a biblical commandment, Make thy information larger.”
No one gene makes an organism. Insects and plants and animals are collectives, communal vehicles, cooperative assemblies of a multitude of genes, each playing its part in the organism’s development. It is a complex ensemble in which each gene interacts with thousands of others in a hierarchy of effects extending through both space and time. The body is a colony of genes. Of course, it acts and moves and procreates as a unit, and furthermore, in the case of at least one species, it feels itself, with impressive certainty, to be a unit. The gene-centered perspective has helped biologists appreciate that the genes composing the human genome are only a fraction of the genes carried around in any one person, because humans (like other species) host an entire ecosystem of microbes—bacteria, especially, from our skin to our digestive systems. Our “microbiomes” help us digest food and fight disease, all the while evolving fast and flexibly in service of their own interests. All these genes engage in a grand process of mutual co-evolution—competing with one another, and with their alternative alleles, in nature’s vast gene pool, but no longer competing on their own. Their success or failure comes through interaction. “Selection favors those genes which succeed in the presence of other genes,” says Dawkins, “which in turn succeed in the presence of them.”♦
The effect of any one gene depends on these interactions with the ensemble and depends, too, on effects of the environment and on raw chance. Indeed, just to speak of a gene’s effect became a complex business. It was not enough simply to say that the effect of a gene is the protein it synthesizes. One might want to say that a sheep or a crow has a gene for blackness. This might be a gene that manufactures a protein for black pigment in wool or feathers. But sheep and crows and all the other creatures capable of blackness exhibit it in varying circumstances and degrees; even so simple-seeming a quality seldom has a biological on-off switch. Dawkins suggests the case of a gene that synthesizes a protein that acts as an enzyme with many indirect and distant effects, one of which is to facilitate the synthesis of black pigment.♦ Even more remotely, suppose a gene encourages an organism to seek sunlight, which is in turn necessary for the black pigment. Such a gene serves as a mere co-conspirator but its role may be indispensable. To call it a gene for blackness, however, becomes difficult. And it is harder still to specify genes for more complex qualities—genes for obesity or aggression or nest building or braininess or homosexuality.
Are there genes for such things? Not if a gene is a particular strand of DNA that expresses a protein. Strictly speaking, one cannot say there are genes for almost anything—not even eye color. Instead, one should say that differences in genes tend to cause differences in phenotype (the actualized organism). But from the earliest days of the study of heredity, scientists have spoken of genes more broadly. If a population varies in some trait—say, tallness—and if the variation is subject to natural selection, then by definition it is at least partly genetic. There is a genetic component to the variation in tallness. There is no gene for long legs; there is no gene for a leg at all.♦ To build a leg requires many genes, each issuing instructions in the form of proteins, some making raw materials, some making timers and on-off switches. Some of these genes surely have the effect of making legs longer than they would otherwise be, and it is those genes that we may call, for short, genes for long legs—as long as we remember that long-leggedness is not directly represented or encoded directly in the gene.
So geneticists and zoologists and ethologists and paleontologists all got into the habit of saying “a gene for X” instead of “a genetic contribution to the variation in X.”♦ Dawkins was forcing them to face the logical consequences. If there is any genetic variation in a trait—eye color or obesity—then there must be a gene or genes for that trait. It doesn’t matter that the actual appearance of the trait may depend on an unfathomable array of other factors, which may be environmental or even accidental. By way of illustration, he offered a deliberately extreme example: a gene for reading.
The idea seems absurd, for several reasons. Reading is learned behavior. No one is born able to read. If ever a skill depends on environmental factors, such as education, it is reading. Until a few millennia ago, the behavior did not exist, so it could not have been subject to natural selection. You might as well say (as the geneticist John Maynard Smith did, mockingly) that there is a gene for tying shoelaces. But Dawkins was undaunted. He pointed out that genes are about differences, after all. So he began with a simple counterpoint: might there not be a gene for dyslexia?
All we would need in order to establish the existence of a gene for reading is to discover a gene for not reading, say a gene which induced a brain lesion causing specific dyslexia. Such a dyslexic person might be normal and intelligent in all respects except that he could not read. No geneticist would be particularly surprised if this type of dyslexia turned out to breed true in some Mendelian fashion. Obviously, in this event the gene would only exhibit its effect in an environment which included normal education. In a prehistoric environment it might have had no detectable effect, or it might have had some different effect and have been known to cave-dwelling geneticists as, say, a gene for inability to read animal footprints.…
It follows from the ordinary conventions of genetic terminology that the wild-type gene at the same locus, the gene that the rest of the population has in double dose, would properly be called a gene “for reading.” If you object to that, you must also object to our speaking of a gene for tallness in Mendel’s peas.… In both cases the character of interest is a difference, and in both cases the difference only shows itself in some specified environment. The reason why something so simple as a one gene difference can have such a complex effect … is basically as follows. However complex a given state of the world may be, the difference between that state of the world and some alternative state of the world may be caused by something extremely simple.♦
Can there be a gene for altruism? Yes, says Dawkins, if this means “any gene that influences the development of nervous systems in such a way as to make them likely to behave altruistically.”♦ Such genes—these replicators, these survivors—know nothing about altruism and nothing about reading, of course. Whatever and wherever they are, their phenotypic effects matter only insofar as they help the genes propagate.
Molecular biology, in its signal achievement, had pinpointed the gene in a protein-encoding piece of DNA. This was the hardware definition. The software definition was older and fuzzier: the unit of heredity; the bearer of a phenotypic difference. With the two definitions uneasily coexisting, Dawkins looked past them both.
If genes are meant to be masters of survival, they can hardly be fragments of nucleic acid. Such things are fleeting. To say that a replicator manages to survive for eons is to define the replicator as all the copies considered as one. Thus the gene does not “grow senile,” Dawkins declared.
It is no more likely to die when it is a million years old than when it is only a hundred. It leaps from body to body down the generations, manipulating body after body in its own way and for its own ends, abandoning a succession of mortal bodies before they sink in senility and death.♦
“What I am doing,” he says, “is emphasizing the potential near-immortality of a gene, in the form of copies, as its defining property.” This is where life breaks free from its material moorings. (Unless you already believed in the immortal soul.) The gene is not an information-carrying macromolecule. The gene is the information. The physicist Max Delbrück wrote in 1949, “Today the tendency is to say ‘genes are just molecules, or hereditary particles,’ and thus to do away with the abstractions.”♦ Now the abstractions returned.
Where, then, is any particular gene—say, the gene for long legs in humans? This is a little like asking where is Beethoven’s Piano Sonata in E minor. Is it in the original handwritten score? The printed sheet music? Any one performance—or perhaps the sum of all performances, historical and potential, real and imagined?
The quavers and crotchets inked on paper are not the music. Music is not a series of pressure waves sounding through the air; nor grooves etched in vinyl or pits burned in CDs; nor even the neuronal symphonies stirred up in the brain of the listener. The music is the information. Likewise, the base pairs of DNA are not genes. They encode genes. Genes themselves are made of bits.
♦ He added: “Old terms are mostly compromised by their application in antiquated or erroneous theories and systems, from which they carry splinters of inadequate ideas, not always harmless to the developing insight.”
♦ In listing twenty amino acids, Gamow was getting ahead of what was actually known. The number twenty turned out to be correct, though Gamow’s list was not.