The
Genetic Code and Symbol Position Budgets
Applying the bit budget principles to the genetic code
The four symbols of the genetic code are:
|
Code Letters and Nucleotides Represented |
|||
DNA nucleotide bases |
A |
G |
T |
C |
mRNA nucleotides |
A |
G |
U |
C |
|
purines |
pyrimidines |
Question: how could Watson and Crick (in 1953) anticipate a requirement for triplet (three-position) code units (or codons) for DNA?
Information at hand:
Base |
Complement |
A |
U |
G |
C |
T or U |
A |
C |
G |
Thus the mRNA complement of the codon AUG would be the anticodon UAC.
The code must be able to represent 20 different entities.
If we have one symbol position, we can represent (in a 4-value system) four different single values. If we have two positions, we can represent 16 different values or unique combinations of two values (not quite enough! - and no control characters!) So we need at least three symbol positions (giving us 64 unique combinations of three values).
In reality, after the code was understood, one amino acid can be represented by 1, 2, 4, or 6 codons (this is called a degenerate code since it does not offer unique equivalences). Examples:
Three codons turned out to be "nonsense," "terminal," or "stop" codons with no amino acid translation: UAA, UAG, and UGA (è compare the role of control characters in computer codes and stop bits and flags in data communications). Start sequences have also been identified involving codons which represent amino acids (eg. Met = methionine in multicellular animals) (è compare headers and start bits and flags in data communications).
DNA is even divided into sections of code that are used (exons) and that code for protein segments and (originally presumed) "nonsense" sections of code (introns*) that are not used to code for protein segments. Replication can distinguish the intron sections and remove them as part of the process of deriving mRNA, splicing the exons together (è compare HTML documents with embedded SQL - the embedded SQL is "nonsense" as far as HTML interpretation is concerned - this analogy is limited as the significance of the introns in DNA remains a topic of conjecture).
Recommended reading on this topic:
· For current understanding of introns, see: http://www.panspermia.org/introns.htm and http://post.queensu.ca/~forsdyke/introns.htm and http://www.ndsu.edu/pubweb/~mcclean/plsc731/transcript/transcript4.htm.
· When this page was first set up in 2003, this was current information: see W. Wayt Gibbs, “The Unseen Genome: Gems among the Junk,” Scientific American 289, 5 (November 2003): 46-53. See also an NIH site on non-coding RNA genes. For the concept epigenetics, please see W. Wayt Gibbs , “The SeeUnseen Genome Beyond DNA,” Scientific American 289, 6 (December 2003): 106-113.
See also: i3450DNAComputing.htm
Thanks
to Dr. Ordetta Mendoza, Head of the Department of Bioinformatics at
Valerie J. H. Powell, R.T.(R),
Ph.D., C&IS Department (Wheatley Center), Robert Morris University