Nucleic Acids and Nucleotides
The nucleic acids, deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), are the carriers and processors of a cell's
genetic information. Coded in a cell's DNA is all the information that determines the nature of the
cell, controls cell growth and division, and directs biosynthesis of the
enzymes and other proteins required for all cellular functions.
Just as
proteins are polymers of amino acid units, nucleic acids are polymers of
individual building blocks called nucleotides
linked together to form a long chain. Each nucleotide is composed of a nucleoside bonded to a phosphate group,
and each nucleoside is composed of an aldopentose sugar joined through its
anomeric carbon to the nitrogen atom of a heterocyclic amine base
The sugar
component in RNA is ribose, and the sugar in DNA is 2-deoxyribose. Recall that
the prefix 2-deoxy means that oxygen
is missing from C2 of ribose.
DNA contains four different heterocyclic amine bases. Two are substituted purines (adenine and guanine), and two are substituted pyrimidines (cytosine and thymine). Adenine, guanine, and cytosine also occur in
RNA, but thymine is replaced in RNA by a different pyrimidine base called
uracil.
In both DNA and RNA, the heterocyclic amine base is bonded to C1' of the sugar, and the phosphoric acid
is bonded by a phosphate ester linkage to the C5' sugar position. (In referring
to nucleic acids, numbers with a prime superscript refer to positions on the
sugar, and numbers without a prime refer to
positions on the heterocyclic amine base.) The complete structures of all four deoxyribonucleotides and all
four ribonucleotides.
Although chemically similar, DNA and RNA differ in size and have different roles
within the cell. Molecules of DNA are enormous. They have molecular weights of up to 150 billion and lengths of up to 12 cm when
stretched out, and they are found mostly in the nucleus of cells. Molecules of
RNA, by contrast, are much smaller (as low as 35,000 in molecular weight) and are found mostly outside the cell
nucleus. We'll consider the two kinds
of nucleic acids separately, beginning with DNA.
16.7
Structure of DNA
Nucleotides
join together in DNA by forming a phosphate ester bond between the 5'-phosphate
group on one nucleotide and the 3'-hydroxyl group on the sugar of another
nucleotide (Figure 16.4). One end of the nucleic acid polymer thus has a free hydroxyl at C3' (the 3' end), and the other end has a
phosphate at C5' (the 5' end).
Generalized structure of DNA.
|
Just as the structure of a protein
depends on the sequence in which individual amino acids are connected, the
structure of a nucleic acid depends on the
sequence of individual nucleotides. To carry the analogy further, just
as a protein has a polyamide backbone with different side chains attached to
it, a nucleic acid has an alternating sugar—phosphate backbone with different
amine bases attached.
The sequence of nucleotides in a
chain is described by starting at the 5' end and identifying the bases in order
of occurrence, using the abbreviations A for adenosine, G for guanosine, C for
cytidine, and T for thymine (or U for uracil in RNA). Thus, a typical sequence
might be written as TAGGCT.
16.8 Base Pairing in DNA:
The Watson-Crick Model
Samples of DNA isolated from different tissues of the same species have
the same
proportions of heterocyclic bases, but samples from different species can have
greatly different proportions of bases. Human DNA, for example, contains about 30% each of A and T
and about 20% each of G and C. The bacterium
Clostridium. perfringens, however,
contains about 37% each of A and T
and only 13% each of G and C. Note that in both examples, the bases occur
in pairs; A and T are usually present in equal amounts, as are G and C. Why should this be?
In 1953, James Watson and Francis Crick made their now classic proposal for the
secondary structure of DNA. According to the Watson—Crick model, DNA consists of two
polynucleotide strands, running in opposite directions and coiled around each
other in a double helix like the
handrails on a spiral staircase. The strands run in opposite directions and are
held together by hydrogen bonds between specific pairs of bases. Adenine (A) forms a strong hydrogen bond to
thymine (T) but not to G or C. Similarly, G and C
form strong hydrogen bonds to each other but not to A or T. The nature of this hydrogen bonding
is particularly apparent in electrostatic potential maps, which show the
alignment of electron-rich and electron-poor
regions along the edges of the bases.
The two strands
of the DNA double helix are not identical; rather, they're complementary because of hydrogen bonding. Whenever a G occurs in
one strand, a C occurs opposite it in the other strand. When an A occurs in one strand, a T occurs in the other strand.
This complementary pairing of bases explains why A and T are always
found in equal amounts, as are G and C. A
full turn of the DNA double helix is shown in Figure 16.6. The helix is
2.0 nm (20 A) wide, there are 10 base pairs per turn, and each turn is 3.4 nm (34 A) in length.
The two strands
of the double helix coil in such a way that two kinds of "grooves" result, a major
groove 1.2 nm (12 A) wide and a minor
groove 600 pm (6 A) wide. The major groove is slightly deeper than
the minor groove, and both are lined by potential hydrogen bond donors and
acceptors. As a result, a variety of flat,
polycyclic aromatic molecules are able to insert sideways, or intercalate, between the stacked bases.
Many cancer-causing and cancer-preventing
agents function by interacting with DNA in this way
16.9 Nucleic Acids and Heredity
The genetic
information of an organism is stored as a sequence of deoxyribonucleotides strung together in the DNA chain. For the information
to be preserved and passed on to future generations, a mechanism must exist for copying DNA. For the information
to be used, a mechanism must exist
for decoding the DNA message and implementing the instructions it
contains.
What Crick called the "central dogma of molecular genetics"
says that the function of DNA is to store information and pass it on to RNA. The
function
of RNA, in turn, is to read, decode, and use the information received from DNA to
make proteins. By decoding the right bit of DNA at the right time, an
organism uses genetic information to synthesize the thousands of proteins
necessary for functioning.
Three fundamental processes take
place in the transfer of genetic information:
•
Replication is the process by which identical copies of DNA
are made so that genetic
information can be preserved and handed down to
succeeding generations.
• Transcription is the process by which the genetic messages are
read and carried out of the cell nucleus to
ribosomes, where protein synthesis
occurs.
• Translation is the process by which the genetic messages are decoded and used to synthesize proteins.
16.10 Replication of DNA
DNA replication is an enzyme-catalyzed
process that begins by a partial unwinding of the double helix. As the strands separate and bases are exposed, new nucleotides line up on each strand in
a complementary manner, A to T and C
to G, and two new strands begin to grow. Each new strand is complementary to its old template strand, and
two new DNA double helices are
produced (Figure 16.7). Since each of the new DNA molecules contains one old strand and one new strand, the
process is described as semiconservative replication.
The process by which the individual nucleotides are joined to create new DNA
strands involves many steps and many different enzymes. Addition of new
nucleotide units to the growing chain takes place in the 5' -> 3' direction
and is catalyzed by the enzyme DNA
polymerase. The key step is the addition of a 5'-mononucleoside triphosphate
to the free 3'-hydroxyl group of the growing chain as the 3'-hydroxyl attacks the triphosphate
and expels a diphosphate leaving group.
FIGURE 16.7 T
Schematic representation of DMA
replication. The original double-stranded DMA partially unwinds, bases are exposed, nucleotides
line up on each strand in a complementary manner, and two new strands begin to grow.
The magnitude of the replication process is
staggering. The nucleus of a human cell
contains 46 chromosomes (23 pairs), each of which consists of one very large DNA molecule.
Each chromosome, in turn, is made up of several thousand DNA segments called genes, and the sum of all genes in a
human cell (the genome) is estimated
to be approximately 3 billion base pairs. Despite the size of these massive
molecules, the base sequence is faithfully copied during replication, with an
error occurring only about once each 10 to
100 billion bases. •
16.11 Structure and Synthesis of RNA: Transcription
RNA is
structurally similar to DNA. Both are sugar-phosphate polymers, and both have
heterocyclic bases attached. The only differences are that RNA contains ribose
rather than 2-deoxyribose and uracil rather than thymine. Uracil in RNA forms strong hydrogen bonds to its complementary base, adenine, just as thymine does in DNA. In
addition, RNA molecules are much smaller than DNA, and RNA remains
single-stranded rather than double-stranded.
There are
three major kinds of ribonucleic acid, each of which serves a specific
function.
• Messenger RNA (mRNA) carries genetic
messages from DNA to ribosomes, where protein synthesis occurs.
• Ribosomal RNA (rRNA)
provides the physical makeup of ribosomes.
• Transfer RNA (tRNA) transports specific
amino acids to the ribosomes, where they are joined together to make
proteins.
The
conversion of the information in DNA into proteins begins in the nucleus of
cells with the synthesis of mRNA by the process of transcription. Several
turns of the DNA double helix unwind, forming a "bubble" and exposing the
bases of the two strands. Ribonucleotides line up in the proper order by
hydrogen bonding to their complementary bases on DNA, bond formation
occurs in the 5' —> 3' direction,
and the growing RNA molecule unwinds from DNA
Synthesis of RNA using a DNA
segment as template.
Unlike what
happens in DNA replication, where both strands are copied, only one of the two
DNA strands is transcribed into mRNA. The strand that contains the gene is
called the coding strand, or sense strand, and the strand that gets
transcribed is called the template
strand, or antisense strand. Since the template strand and the coding strand are complementary, and since the template
strand and the RNA molecule are also complementary, the
RNA molecule produced during transcription is a copy of the coding strand. The
only difference is that the RNA molecule has a U everywhere the DNA coding strand has
a T.
Transcription of DNA by the process just discussed raises many questions. How does the DNA know where
to unwind? Where along the chain does one
gene stop and the next one start? How do the ribonucleotides know the
right place along the template strand to begin lining up and the right place to
stop? The picture that has emerged is that a DNA chain contains specific base
sequences called promoter sites that
lie at positions 10 base pairs and 35 base pairs upstream from the coding
region and signal the beginning of a gene.
Similarly, there are other base sequences near the end of the gene that
signal a stop.
16.12
RNA and Protein
Biosynthesis: Translation
The primary
cellular function of RNA is to direct biosynthesis of the thousands of diverse
peptides and proteins required by an organism. The mechanics of protein
biosynthesis are directed by mRNA and take place on ribosomes, small granular particles in the cytoplasm of a cell that
consist of about 60% rRNA and 40% protein.
On the ribosome, mRNA serves as a template to pass on the genetic
information it has transcribed from DNA.
The specific
ribonucleotide sequence in mRNA forms a message that determines the order in
which different amino acid residues are to be joined.
Each "word," or codon, along the mRNA chain consists of a sequence of three ribonucleotides
that is specific for a given amino acid. For example, the series UUC on mRNA is a codon directing
incorporation of the amino acid
phenylalanine into the growing protein. Of the 43 = 64 possible
triplets of the four bases in RNA,
61 code for specific amino acids (most amino acids are specified by more
than one codon) and 3 code for chain termination. Table 16.3 shows the meaning of each codon.
The message
carried by mRNA is read by tRNA in a process called translation. There are 61 different tRNA's, one for each of the
61 codons in Table 16.3 that
specifies an amino acid. A typical tRNA is roughly the shape of a cloverleaf, as shown in Figure 16.9. It
consists of about 70 to 100 ribonucleotides
and is bonded to a specific amino acid by an ester linkage through the 3'-hydroxyl on ribose at the end of the tRNA.
Each tRNA also contains in its chain a segment called an anticodon, a sequence of three ribonucleotides
that is complementary to the codon sequence. For example, the codon sequence UUC present on mRNA is read by a
phenylalanine-bearing tRNA having the
complementary anticodon sequence AAG. [Remember that nucleotide sequences are written in the 5' —» 3'
direction, so the sequence in an
anticodon must be reversed. That is, the complement to (5')-UUC-(3') is (3')-AAG-(5'), which is written as (5')-GAA-(3').]
As each successive codon on mRNA is read, appropriate tRNA's bring the correct amino acids into
position for enzyme-mediated transfer to the growing peptide. When synthesis of
the proper protein is completed, a "stop"
codon signals the end, and the protein is released from the ribosome. The
entire process of protein biosynthesis is illustrated schematically in Figure 16.10.
Structure of a tRNA molecule. The tRNA is a
roughly cloverleaf-shaped molecule containing an anticodon triplet on one "leaf" and a covalently
attached amino acid residue at its
3' end. The example shown is a yeast tRNA that codes for phenylalanine. The nucleotides not specifically
identified are chemically modified analogs of the four usual nucleotides.
A schematic representation of
protein biosynthesis. The mRNA containing codon base sequences is read by tRNA
containing complementary anticodon base sequences. Transfer RNA's assemble the proper amino acids into position
for incorporation into the peptide.
16.13
Sequencing DNA
One of the greatest scientific revolutions in history is now occurring
in molecular biology as scientists are learning how to manipulate and harness
the genetic machinery of
organisms. None of the extraordinary advances of the past decade would have been possible, however, were it not for the
discovery in 1977 of methods for
sequencing immense DNA chains.
The first step
in DNA sequencing is to cleave the enormous chain at predictable points to
produce smaller, more manageable pieces, a task accomplished by the use of enzymes called restriction endonucleases. Each different restriction
enzyme, of which more than 200 are available, cleaves a DNA molecule at a well-defined point in the
chain wherever a specific base sequence occurs. For example, the restriction
enzyme Alul cleaves between G and C in the four-base sequence
AG-CT. Note that the sequence is a palindrome,
meaning that it reads the same from left to right and right to left; that
is, the sequence (5')-AG-CT-(3') is
identical to its complement, (3')-TC-GA-(5'). The same is true for other
restriction endonucleases.
If the original
DNA molecule is cut with another restriction enzyme having a different specificity for cleavage, still other segments are
produced whose sequences partially overlap those produced by the first
enzyme. Sequencing of all the segments,
followed by identification of the overlapping regions, then allows complete DNA sequencing.
Two methods of DNA sequencing are in general use. Both operate along similar lines, but the Maxam-Gilbert method uses chemical
techniques, while the Sanger dideoxy method uses enzymatic reactions. The Maxam—Gilbert method is used in specialized
instances, but it is the Sanger method that has allowed the sequencing
of the entire human genome of 3 billion base pairs. The dideoxy method used in
commercial sequencing instruments begins
with a mixture of the following:
• The restriction fragment to be
sequenced
• A small
piece of DNA called a primer, whose
sequence is complementary to that on the 3' end of the restriction fragment
• The four
2'-deoxyribonucleoside triphosphates (dNTPs)
• Very small
amounts of the four 2',3'-dideoxyribonucleoside triphosphates (ddNTPs), each
of which is labeled with a fluorescent dye of a different color. (A
2',3'-dideoryribonucleoside triphosphate is one in which both 2' and 3' —OH
groups are missing from ribose.)
DNA polymerase enzyme is then added to this mix, and a strand of DNA complementary to the restriction
fragment begins to grow from the end of the
primer. Most of the time, only normal deoxyribonucleotides are incorporated
into the growing chain, but every so often, a dideoxyribonucleotide is incorporated. When that happens, DNA
synthesis stops because the chain end no longer has a 3'-hydroxyl group for
adding further nucleotides.
After reaction
is complete, the product consists of a mixture of DNA fragments of all possible
lengths, each terminated by one of the four dye-labeled dideoxyribonucleotides.
When this product mixture is then submitted to electrophoresis (Section 15.2),
each fragment migrates at a rate that depends on the number of negatively
charged phosphate groups (the number of nucleotides) it contains. Smaller
pieces move rapidly, and larger pieces move
more slowly. The technique is so sensitive that up to 1100 DNA fragments,
differing in size by only one nucleotide, can be separated.
After
separation by electrophoresis according to size, the identity of the terminal
dideoxyribonucleotide in each piece—and thus the sequence of the restriction
fragment—is identified simply by noting the color with which it fluoresces. Figure 16.11 shows a typical result.
..
FIGURE 16.11
The sequence of a restriction
fragment determined by the Sanger dideoxy method can be read simply by noting
the colors of the dye attached to each of the various terminal nucleotides.
So efficient is the automated dideoxy
method that sequences up to 1100 nucleotides in length can be rapidly sequenced
with 98% accuracy. After a decade of work, preliminary sequence information for
the entire human genome of 3 billion base pairs was announced early in 2001. Remarkably, our genome appears to contain only
30,000 to 40,000 genes, about one-third the generally predicted number
and only twice the number found in the common roundworm.
16.14
The Polymerase Chain Reaction
Once a gene
sequence is known, obtaining an amount of DNA large enough for study is often
the next step. The method used is the polymerase
chain reaction (PCR), which has been described as being to genes what
Gutenberg's invention of the printing press was to the written word. Just as
the printing press produces multiple copies of a book, PCR produces multiple copies of a given DNA sequence.
Starting from less than Ipicogram of DNA with a chain length of 10,000 nucleotides (1 pg = 10-12
g; about 100,000 molecules), PCR
makes it possible to obtain several micrograms (1 mg = 10~6
g; about 1011 molecules) in just a few hours.
The key to
the polymerase chain reaction is the discovery of Tag DNA polymerase, a heat-stable enzyme isolated from the thermophilic
bacterium Thermus aquaticus found in a
hot spring in Yellowstone National Park. Tag
polymerase is able to take a single strand of DNA and, starting from a
short "primer"
piece that is complementary to one end of the chain, finish constructing
the entire complementary strand. The overall process takes three steps, as
shown schematically in Figure 16.12.
STEP 1 The
double-stranded DNA to be amplified is heated in the presence of Tag polymerase, Mg2+ ion, the
four deoxyribonucleotide triphosphate monomers (dNTPs), and a large excess of two short DNA
primer pieces of about 20 bases each. Each
primer is complementary to the sequence at the end of one of the target
DNA segments. At a temperature of 95°C, double-stranded DNA spontaneously
breaks apart into two single strands.
STEP 2 The temperature is lowered to between 37°C and
50°C, allowing the primers, because of
their relatively high concentration, to anneal to a complementary
sequence at the end of each target strand.
STEP 3 The
temperature is then raised to 72°C, and Tag
polymerase catalyzes the addition of further nucleotides to the two primed DNA
strands. When replication of each strand is finished, two copies of the original DNA now exist. Repeating the
denature-anneal-synthesize cycle a second time yields four DNA copies, repeating a
third time yields eight copies, and so on, in an exponential series.
PCR has been
automated, and 30 or so cycles can be carried out in an hour, resulting in a theoretical amplification factor of 230
(~109). In practice, however,
the efficiency of each cycle is less than 100%, and an experimental amplification of about 106 to 108
is routinely achieved for 30 cycles.
FIGURE 16.12
The polymerase chain reaction.
Double-stranded DNA is heated to 95°C in the presence of two short primer sequences, each of
which is complementary to the end of one of the strands. After the DNA
denatures, the temperature is lowered and the primer sequences anneal to the strand ends. Raising the temperature
in the presence of Taq polymerase, Mg2+,
and a mixture of the four
deoxynucleotide triphosphates (dNTPs) effects strand replication, producing two
DNA copies. Each
further repetition of the sequence again doubles the number of copies.