Department of Commerce National Oceanic and Atmospheric Administration Department of Commerce
National Oceanic and Atmospheric Administration

A Primer on Molecular Biology

Some Benefits of Coral Genomics

Coral reef ecosystems, with their incredible biodiversity and richness of corals and other invertebrates, fishes, reptiles, plants, algae and protists, constitute vast reservoirs of genetic resources with great medical potential. While at least one-half of all therapeutic drugs on the current market are now derived from terrestrial organisms, we can expect many new drugs to be developed from marine organisms in the coming years. These drugs will be used as pharmaceuticals, nutritional supplements, biocides, cosmetics and other life-saving and life-enhancing products (Bruckner, 2002). Coral reef species (e.g., algae, sponges, soft corals, sea slugs) have already been used in the development of anti-cancer and anti-tumor drugs, painkillers, and anti-inflammatory agents.

Only a small percentage of coral reef biodiversity is known, and only small fractions of the known species have been explored as sources of biomedical compounds. With this in mind, the following primer on molecular biology is intended to help nonexperts understand the task of sequencing a genome, and why it is important to do so.

The Genome

DNA molecule
DNA molecule: A (adenine), T (thymine), C (cytosine), G (guanine), S (deoxyribose), P (phosphate) (Credit: National Human Genome Research Institute) Click image for larger view.

A genome is defined as all of the genetic material (DNA) in the chromosomes of a particular organism (i.e., all of the genes in an organism). Genes are specific sequences of base molecules that encode instructions on how to make proteins. Genes comprise only about two percent of the human genome; the remainder consists of noncoding regions, the functions of which may include providing chromosomal structural integrity and regulating where, when, and in what quantity certain proteins are made.

Chromosomes are thread-like bodies that are located in the nucleus of cells of most organisms, or dispersed throughout the cytoplasm of primitive cells that do not have distinct nuclei, for example, bacteria. In cells with nuclei, DNA confined within the nucleus (nuclear DNA or nDNA) is distinguished from mitochondrial DNA (mtDNA) which is located ourside the nucleus. Mitochondria are cellular organelles ("small organs" within cells) that play a key role in releasing cellular energy. Mitochondria cannot be produced by cells de novo, but instead are self replicating by the division of preexisting mitochondria, as instructed by the mtDNA.

Genes are linear segments along the DNA molecule that are responsible for transmitting hereditary information from generation to generation. Genes control all of the chemical reactions that occur continuously within cells, and thus, they control all of the cells' activities. Genes accomplish this by precisely directing the synthesis of proteins.

Proteins are complex molecules composed of any specific linear sequence and combination of the 20 amino acidsthat are the basic constituents of all proteins. Protein molecules can form long chains that contain thousands of amino acids. The order of the amino acids is determined by the genetic code for the particular protein. In addition to providing most of the structural components of a cell, proteins form enzymes, which are organic catalysts that increase the rates of chemical reactions. Imperfectly formed enzymes are responsible for many of the cellular malfunctions that lead to genetic disorders and diseases.

A gene may also be described as a segment of the DNA molecule that contains the encoded information that directs the formation of a particular protein. Genomics is the study of the sequence, structure, and function of the genome. Sequencing the genome is the determination of the order of nucleotides in a DNA or RNA molecule.

Structure of the Genome

To better understand the importance of sequencing the genome, let’s examine briefly the structure of DNA and how it directs the formation of proteins.
DNA resembles a double helix held together by weak hydrogen bonds of four nitrogenous bases: adenine (A), thymine (T), cytocine (C), and guanine (G), which, together with a phosphate molecule and a sugar molecule (deoxyribose in DNA, and ribose in the other nucleic acid, RNA), are called nucleotides. Nucleotides are repeated ad infinitum in various sequences. These sequences combine into genes that govern the production of proteins.


If the DNA molecule were to “untwist”, it would resemble a ladder. The sugar, deoxyribose (together with an attached phosphate (PO4) would constitute a section of the rails of the ladder, and each nitrogenous base would constitute one-half of a rung. The bases form pairs (base pairs) that form a complete rung, with adenine (A) always bonding with thymine (T), and cytosine (C) always bonding with guanine (G). Thus, we have the base pairs AT and CG bonded together by weak hydrogen bonds. A base attached to deoxyribose and a PO4 group constitutes a nucleotide. The double-stranded DNA molecule, therefore, is composed of a linear sequence of nucleotides that are repeated ad infinitum in various sequences (for example, ATTCCGGAGTC). These sequences combine into genes that spell out the exact molecular instructions required to synthesize the production of proteins, which direct the activities of cells.

RNA (ribonucleic acid) is the other nucleic acid found in the nucleus and cytoplasm of cells. RNA plays an important role in protein synthesis and other chemical activities of the cell. Its structure is similar to that of DNA, although RNA is single stranded. Also, the base uracil replaces thymine in RNA. Genetic information is stored by DNA involving particular sequences of nucleotides in the nucleus of cells, and RNA carries that coded information to other parts of the cell, where it is converted into proteins.

The total of the proteins produced from all the genes of a genome in a cell is called the proteome, which changes from instant to instant in response to thousands of intra- and extracellular chemical signals. Protein chemistry and behavior are specified by gene sequences and by the number and kinds of other proteins simultaneously synthesized in the same cell, and by those interrelatioships.

The “central dogma of molecular biology” is the principal statement of the molecular basis of gene action. Genetic information is stored in and transmitted as DNA. Genes are expressed by being copied (transcription) as RNA, which is processed into mRNA (messenger RNA). The information in mRNA is translated (translation) into a protein sequence using a genetic code to interpret a sequential triplet of nucleotides (codons) as instructions to add one of 20 amino acids or to stop translation. More simply put, DNA carries the genetic information that is transcribed to RNA and subsequently translated to protein.


The Human Genome

The human genome was sequenced in 2003. Following are a few interesting highlights from the first Department of Energy publications analyzing the sequence:

  • The human genome contains 3 billion chemical nucleotide bases (A, C, T, and G).
  • The average gene consists of 3,000 bases, but sizes vary greatly, with the largest known human gene being dystrophin at 2.4 million bases.
  • The total number of genes is estimated at 30,000 to 35,000.
  • The functions are unknown for more than 50% of discovered genes.
  • The human genome sequence is almost (99.9%) exactly the same in all people.
  • About 2% of the genome encodes instructions for the synthesis of proteins.
  • Repeat sequences that do not code for proteins (“junk DNA”) make up at least 50% of the human genome.
  • Repeat sequences are thought to have no direct functions, but they shed light on chromosome structure and dynamics. Over time, these repeats reshape the genome by rearranging it, thereby creating entirely new genes or modifying and reshuffling existing genes.
  • The human genome has a much greater portion (50%) of repeat sequences than the mustard weed (11%), the worm (7%), and the fly (3%).
  • During the past 50 million years, a dramatic decrease seems to have occurred in the rate of accumulation of repeats in the human genome.
  • Over 40% of the predicted human proteins share similarity with fruit-fly or worm proteins.
  • Genes appear to be concentrated in random areas along the genome, with vast expanses of noncoding DNA between.
  • Chromosome 1 (the largest human chromosome) has the most genes (2,968), and the Y chromosome has the fewest (231).
  • Genes have been pinpointed and particular sequences in those genes associated with numerous diseases and disorders including breast cancer, muscle disease, deafness, and blindness.
  • Scientists have identified about 3 million locations where single-base DNA differences occur in people. This information promises to revolutionize the process of finding DNA sequences associated with such common diseases as cardiovascular disease, diabetes, arthritis, and cancers.