Nucleic Acids and Nucleotides
The nucleic acids, deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), are the carriers and processors of a cell's genetic information. Coded in a cell's DNA is all the information that determines the nature of the cell, controls cell growth and division, and directs biosynthesis of the enzymes and other proteins required for all cellular functions.
Just as proteins are polymers of amino acid units, nucleic acids are polymers of individual building blocks called nucleotides linked together to form a long chain. Each nucleotide is composed of a nucleoside bonded to a phosphate group, and each nucleoside is composed of an aldopentose sugar joined through its anomeric carbon to the nitrogen atom of a heterocyclic amine base
The sugar component in RNA is ribose, and the sugar in DNA is 2-deoxyribose. Recall that the prefix 2-deoxy means that oxygen is missing from C2 of ribose.
DNA contains four different heterocyclic amine bases. Two are substituted purines (adenine and guanine), and two are substituted pyrimidines (cytosine and thymine). Adenine, guanine, and cytosine also occur in RNA, but thymine is replaced in RNA by a different pyrimidine base called uracil.
In both DNA and RNA, the heterocyclic amine base is bonded to C1' of the sugar, and the phosphoric acid is bonded by a phosphate ester linkage to the C5' sugar position. (In referring to nucleic acids, numbers with a prime superscript refer to positions on the sugar, and numbers without a prime refer to positions on the heterocyclic amine base.) The complete structures of all four deoxyribonucleotides and all four ribonucleotides.
Although chemically similar, DNA and RNA differ in size and have different roles within the cell. Molecules of DNA are enormous. They have molecular weights of up to 150 billion and lengths of up to 12 cm when stretched out, and they are found mostly in the nucleus of cells. Molecules of RNA, by contrast, are much smaller (as low as 35,000 in molecular weight) and are found mostly outside the cell nucleus. We'll consider the two kinds of nucleic acids separately, beginning with DNA.
16.7 Structure of DNA
Nucleotides join together in DNA by forming a phosphate ester bond between the 5'-phosphate group on one nucleotide and the 3'-hydroxyl group on the sugar of another nucleotide (Figure 16.4). One end of the nucleic acid polymer thus has a free hydroxyl at C3' (the 3' end), and the other end has a phosphate at C5' (the 5' end).
Generalized structure of DNA.
Just as the structure of a protein depends on the sequence in which individual amino acids are connected, the structure of a nucleic acid depends on the sequence of individual nucleotides. To carry the analogy further, just as a protein has a polyamide backbone with different side chains attached to it, a nucleic acid has an alternating sugar—phosphate backbone with different amine bases attached.
The sequence of nucleotides in a chain is described by starting at the 5' end and identifying the bases in order of occurrence, using the abbreviations A for adenosine, G for guanosine, C for cytidine, and T for thymine (or U for uracil in RNA). Thus, a typical sequence might be written as TAGGCT.
16.8 Base Pairing in DNA:
The Watson-Crick Model
Samples of DNA isolated from different tissues of the same species have the same proportions of heterocyclic bases, but samples from different species can have greatly different proportions of bases. Human DNA, for example, contains about 30% each of A and T and about 20% each of G and C. The bacterium Clostridium. perfringens, however, contains about 37% each of A and T and only 13% each of G and C. Note that in both examples, the bases occur in pairs; A and T are usually present in equal amounts, as are G and C. Why should this be?
In 1953, James Watson and Francis Crick made their now classic proposal for the secondary structure of DNA. According to the Watson—Crick model, DNA consists of two polynucleotide strands, running in opposite directions and coiled around each other in a double helix like the handrails on a spiral staircase. The strands run in opposite directions and are held together by hydrogen bonds between specific pairs of bases. Adenine (A) forms a strong hydrogen bond to thymine (T) but not to G or C. Similarly, G and C form strong hydrogen bonds to each other but not to A or T. The nature of this hydrogen bonding is particularly apparent in electrostatic potential maps, which show the alignment of electron-rich and electron-poor regions along the edges of the bases.
The two strands of the DNA double helix are not identical; rather, they're complementary because of hydrogen bonding. Whenever a G occurs in one strand, a C occurs opposite it in the other strand. When an A occurs in one strand, a T occurs in the other strand. This complementary pairing of bases explains why A and T are always found in equal amounts, as are G and C. A full turn of the DNA double helix is shown in Figure 16.6. The helix is 2.0 nm (20 A) wide, there are 10 base pairs per turn, and each turn is 3.4 nm (34 A) in length.
The two strands of the double helix coil in such a way that two kinds of "grooves" result, a major groove 1.2 nm (12 A) wide and a minor groove 600 pm (6 A) wide. The major groove is slightly deeper than the minor groove, and both are lined by potential hydrogen bond donors and acceptors. As a result, a variety of flat, polycyclic aromatic molecules are able to insert sideways, or intercalate, between the stacked bases. Many cancer-causing and cancer-preventing agents function by interacting with DNA in this way
16.9 Nucleic Acids and Heredity
The genetic information of an organism is stored as a sequence of deoxyribonucleotides strung together in the DNA chain. For the information to be preserved and passed on to future generations, a mechanism must exist for copying DNA. For the information to be used, a mechanism must exist for decoding the DNA message and implementing the instructions it contains.
What Crick called the "central dogma of molecular genetics" says that the function of DNA is to store information and pass it on to RNA. The function of RNA, in turn, is to read, decode, and use the information received from DNA to make proteins. By decoding the right bit of DNA at the right time, an organism uses genetic information to synthesize the thousands of proteins necessary for functioning.
Three fundamental processes take place in the transfer of genetic information:
• Replication is the process by which identical copies of DNA are made so that genetic information can be preserved and handed down to succeeding generations.
• Transcription is the process by which the genetic messages are read and carried out of the cell nucleus to ribosomes, where protein synthesis occurs.
• Translation is the process by which the genetic messages are decoded and used to synthesize proteins.
16.10 Replication of DNA
DNA replication is an enzyme-catalyzed process that begins by a partial unwinding of the double helix. As the strands separate and bases are exposed, new nucleotides line up on each strand in a complementary manner, A to T and C to G, and two new strands begin to grow. Each new strand is complementary to its old template strand, and two new DNA double helices are produced (Figure 16.7). Since each of the new DNA molecules contains one old strand and one new strand, the process is described as semiconservative replication.
The process by which the individual nucleotides are joined to create new DNA strands involves many steps and many different enzymes. Addition of new nucleotide units to the growing chain takes place in the 5' -> 3' direction and is catalyzed by the enzyme DNA polymerase. The key step is the addition of a 5'-mononucleoside triphosphate to the free 3'-hydroxyl group of the growing chain as the 3'-hydroxyl attacks the triphosphate and expels a diphosphate leaving group.
FIGURE 16.7 T
Schematic representation of DMA replication. The original double-stranded DMA partially unwinds, bases are exposed, nucleotides line up on each strand in a complementary manner, and two new strands begin to grow.
The magnitude of the replication process is staggering. The nucleus of a human cell contains 46 chromosomes (23 pairs), each of which consists of one very large DNA molecule. Each chromosome, in turn, is made up of several thousand DNA segments called genes, and the sum of all genes in a human cell (the genome) is estimated to be approximately 3 billion base pairs. Despite the size of these massive molecules, the base sequence is faithfully copied during replication, with an error occurring only about once each 10 to 100 billion bases. •
RNA is structurally similar to DNA. Both are sugar-phosphate polymers, and both have heterocyclic bases attached. The only differences are that RNA contains ribose rather than 2-deoxyribose and uracil rather than thymine. Uracil in RNA forms strong hydrogen bonds to its complementary base, adenine, just as thymine does in DNA. In addition, RNA molecules are much smaller than DNA, and RNA remains single-stranded rather than double-stranded.
There are three major kinds of ribonucleic acid, each of which serves a specific function.
• Messenger RNA (mRNA) carries genetic messages from DNA to ribosomes, where protein synthesis occurs.
• Ribosomal RNA (rRNA) provides the physical makeup of ribosomes.
• Transfer RNA (tRNA) transports specific amino acids to the ribosomes, where they are joined together to make proteins.
The conversion of the information in DNA into proteins begins in the nucleus of cells with the synthesis of mRNA by the process of transcription. Several turns of the DNA double helix unwind, forming a "bubble" and exposing the bases of the two strands. Ribonucleotides line up in the proper order by hydrogen bonding to their complementary bases on DNA, bond formation occurs in the 5' —> 3' direction, and the growing RNA molecule unwinds from DNA
Synthesis of RNA using a DNA segment as template.
Unlike what happens in DNA replication, where both strands are copied, only one of the two DNA strands is transcribed into mRNA. The strand that contains the gene is called the coding strand, or sense strand, and the strand that gets transcribed is called the template strand, or antisense strand. Since the template strand and the coding strand are complementary, and since the template strand and the RNA molecule are also complementary, the RNA molecule produced during transcription is a copy of the coding strand. The only difference is that the RNA molecule has a U everywhere the DNA coding strand has a T.
Transcription of DNA by the process just discussed raises many questions. How does the DNA know where to unwind? Where along the chain does one gene stop and the next one start? How do the ribonucleotides know the right place along the template strand to begin lining up and the right place to stop? The picture that has emerged is that a DNA chain contains specific base sequences called promoter sites that lie at positions 10 base pairs and 35 base pairs upstream from the coding region and signal the beginning of a gene. Similarly, there are other base sequences near the end of the gene that signal a stop.
16.12 RNA and Protein
The primary cellular function of RNA is to direct biosynthesis of the thousands of diverse peptides and proteins required by an organism. The mechanics of protein biosynthesis are directed by mRNA and take place on ribosomes, small granular particles in the cytoplasm of a cell that consist of about 60% rRNA and 40% protein. On the ribosome, mRNA serves as a template to pass on the genetic information it has transcribed from DNA.
The specific ribonucleotide sequence in mRNA forms a message that determines the order in which different amino acid residues are to be joined. Each "word," or codon, along the mRNA chain consists of a sequence of three ribonucleotides that is specific for a given amino acid. For example, the series UUC on mRNA is a codon directing incorporation of the amino acid phenylalanine into the growing protein. Of the 43 = 64 possible triplets of the four bases in RNA, 61 code for specific amino acids (most amino acids are specified by more than one codon) and 3 code for chain termination. Table 16.3 shows the meaning of each codon.
The message carried by mRNA is read by tRNA in a process called translation. There are 61 different tRNA's, one for each of the 61 codons in Table 16.3 that specifies an amino acid. A typical tRNA is roughly the shape of a cloverleaf, as shown in Figure 16.9. It consists of about 70 to 100 ribonucleotides and is bonded to a specific amino acid by an ester linkage through the 3'-hydroxyl on ribose at the end of the tRNA. Each tRNA also contains in its chain a segment called an anticodon, a sequence of three ribonucleotides that is complementary to the codon sequence. For example, the codon sequence UUC present on mRNA is read by a phenylalanine-bearing tRNA having the complementary anticodon sequence AAG. [Remember that nucleotide sequences are written in the 5' —» 3' direction, so the sequence in an anticodon must be reversed. That is, the complement to (5')-UUC-(3') is (3')-AAG-(5'), which is written as (5')-GAA-(3').]
As each successive codon on mRNA is read, appropriate tRNA's bring the correct amino acids into position for enzyme-mediated transfer to the growing peptide. When synthesis of the proper protein is completed, a "stop" codon signals the end, and the protein is released from the ribosome. The entire process of protein biosynthesis is illustrated schematically in Figure 16.10.
Structure of a tRNA molecule. The tRNA is a roughly cloverleaf-shaped molecule containing an anticodon triplet on one "leaf" and a covalently attached amino acid residue at its 3' end. The example shown is a yeast tRNA that codes for phenylalanine. The nucleotides not specifically identified are chemically modified analogs of the four usual nucleotides.
A schematic representation of protein biosynthesis. The mRNA containing codon base sequences is read by tRNA containing complementary anticodon base sequences. Transfer RNA's assemble the proper amino acids into position for incorporation into the peptide.
16.13 Sequencing DNA
One of the greatest scientific revolutions in history is now occurring in molecular biology as scientists are learning how to manipulate and harness the genetic machinery of organisms. None of the extraordinary advances of the past decade would have been possible, however, were it not for the discovery in 1977 of methods for sequencing immense DNA chains.
The first step in DNA sequencing is to cleave the enormous chain at predictable points to produce smaller, more manageable pieces, a task accomplished by the use of enzymes called restriction endonucleases. Each different restriction enzyme, of which more than 200 are available, cleaves a DNA molecule at a well-defined point in the chain wherever a specific base sequence occurs. For example, the restriction enzyme Alul cleaves between G and C in the four-base sequence AG-CT. Note that the sequence is a palindrome, meaning that it reads the same from left to right and right to left; that is, the sequence (5')-AG-CT-(3') is identical to its complement, (3')-TC-GA-(5'). The same is true for other restriction endonucleases.
If the original DNA molecule is cut with another restriction enzyme having a different specificity for cleavage, still other segments are produced whose sequences partially overlap those produced by the first enzyme. Sequencing of all the segments, followed by identification of the overlapping regions, then allows complete DNA sequencing.
Two methods of DNA sequencing are in general use. Both operate along similar lines, but the Maxam-Gilbert method uses chemical techniques, while the Sanger dideoxy method uses enzymatic reactions. The Maxam—Gilbert method is used in specialized instances, but it is the Sanger method that has allowed the sequencing of the entire human genome of 3 billion base pairs. The dideoxy method used in commercial sequencing instruments begins with a mixture of the following:
• The restriction fragment to be sequenced
• A small piece of DNA called a primer, whose sequence is complementary to that on the 3' end of the restriction fragment
• The four 2'-deoxyribonucleoside triphosphates (dNTPs)
• Very small amounts of the four 2',3'-dideoxyribonucleoside triphosphates (ddNTPs), each of which is labeled with a fluorescent dye of a different color. (A 2',3'-dideoryribonucleoside triphosphate is one in which both 2' and 3' —OH groups are missing from ribose.)
DNA polymerase enzyme is then added to this mix, and a strand of DNA complementary to the restriction fragment begins to grow from the end of the primer. Most of the time, only normal deoxyribonucleotides are incorporated into the growing chain, but every so often, a dideoxyribonucleotide is incorporated. When that happens, DNA synthesis stops because the chain end no longer has a 3'-hydroxyl group for adding further nucleotides.
After reaction is complete, the product consists of a mixture of DNA fragments of all possible lengths, each terminated by one of the four dye-labeled dideoxyribonucleotides. When this product mixture is then submitted to electrophoresis (Section 15.2), each fragment migrates at a rate that depends on the number of negatively charged phosphate groups (the number of nucleotides) it contains. Smaller pieces move rapidly, and larger pieces move more slowly. The technique is so sensitive that up to 1100 DNA fragments, differing in size by only one nucleotide, can be separated.
After separation by electrophoresis according to size, the identity of the terminal dideoxyribonucleotide in each piece—and thus the sequence of the restriction fragment—is identified simply by noting the color with which it fluoresces. Figure 16.11 shows a typical result. ..
The sequence of a restriction fragment determined by the Sanger dideoxy method can be read simply by noting the colors of the dye attached to each of the various terminal nucleotides.
So efficient is the automated dideoxy method that sequences up to 1100 nucleotides in length can be rapidly sequenced with 98% accuracy. After a decade of work, preliminary sequence information for the entire human genome of 3 billion base pairs was announced early in 2001. Remarkably, our genome appears to contain only 30,000 to 40,000 genes, about one-third the generally predicted number and only twice the number found in the common roundworm.
16.14 The Polymerase Chain Reaction
Once a gene sequence is known, obtaining an amount of DNA large enough for study is often the next step. The method used is the polymerase chain reaction (PCR), which has been described as being to genes what Gutenberg's invention of the printing press was to the written word. Just as the printing press produces multiple copies of a book, PCR produces multiple copies of a given DNA sequence. Starting from less than Ipicogram of DNA with a chain length of 10,000 nucleotides (1 pg = 10-12 g; about 100,000 molecules), PCR makes it possible to obtain several micrograms (1 mg = 10~6 g; about 1011 molecules) in just a few hours.
The key to the polymerase chain reaction is the discovery of Tag DNA polymerase, a heat-stable enzyme isolated from the thermophilic bacterium Thermus aquaticus found in a hot spring in Yellowstone National Park. Tag polymerase is able to take a single strand of DNA and, starting from a short "primer" piece that is complementary to one end of the chain, finish constructing the entire complementary strand. The overall process takes three steps, as shown schematically in Figure 16.12.
STEP 1 The double-stranded DNA to be amplified is heated in the presence of Tag polymerase, Mg2+ ion, the four deoxyribonucleotide triphosphate monomers (dNTPs), and a large excess of two short DNA primer pieces of about 20 bases each. Each primer is complementary to the sequence at the end of one of the target DNA segments. At a temperature of 95°C, double-stranded DNA spontaneously breaks apart into two single strands.
STEP 2 The temperature is lowered to between 37°C and 50°C, allowing the primers, because of their relatively high concentration, to anneal to a complementary sequence at the end of each target strand.
STEP 3 The temperature is then raised to 72°C, and Tag polymerase catalyzes the addition of further nucleotides to the two primed DNA strands. When replication of each strand is finished, two copies of the original DNA now exist. Repeating the denature-anneal-synthesize cycle a second time yields four DNA copies, repeating a third time yields eight copies, and so on, in an exponential series.
PCR has been automated, and 30 or so cycles can be carried out in an hour, resulting in a theoretical amplification factor of 230 (~109). In practice, however, the efficiency of each cycle is less than 100%, and an experimental amplification of about 106 to 108 is routinely achieved for 30 cycles.
The polymerase chain reaction. Double-stranded DNA is heated to 95°C in the presence of two short primer sequences, each of which is complementary to the end of one of the strands. After the DNA denatures, the temperature is lowered and the primer sequences anneal to the strand ends. Raising the temperature in the presence of Taq polymerase, Mg2+, and a mixture of the four deoxynucleotide triphosphates (dNTPs) effects strand replication, producing two DNA copies. Each further repetition of the sequence again doubles the number of copies.