Join UCL Science Magazine

Become a member!

Join Us

A Code Within the Codons?

How studying patterns in the genetic code is beginning to uncover the origins of biological information and of life itself. By Tom Dubois.

How do we define life? Where can we draw the line between biotic (alive) and abiotic (not alive) factors? One of the key differences between biological systems and chemical processes is information. Biological information can best be described as the meaning of DNA sequences, which are transcribed into RNAs and which in turn can be translated into proteins [1]. The information encoded by DNA enables cells to persist and to replicate. Without it, there would be no heredity and therefore no natural selection. Understanding the evolution of information is thus key and synonymous with understanding the origin of life.

The genetic code is a fundamental example of information in biology. 61 of the 64 possible sequences of 3 RNA codons encode an amino acid. The remaining 3 sequences are stop codons and terminate translation. This code is universally conserved across life, with few exceptions. Biologists have long attempted to understand the meaning and the origin of this code by looking for codes within the codons. Hypotheses have identified links between codons, biosynthetic pathways, and the hydrophobicity and size of the corresponding amino acid. However, there are limitations to these hypotheses. Most assume a heterotrophic metabolism at the origin of life, where amino acids and other compounds were already present on the primordial Earth and were simply used by primitive life forms which had to do little biosynthesis [2]. This makes coevolution of the genetic code and biosynthetic pathways unlikely.  Others fail to define a specific autotrophic metabolism, through which primitive lifeforms would have fixed their own carbon, converting gaseous CO2 into organic molecules, including amino acids. Failing to define a specific form of metabolism is problematic as the co-evolution of metabolism and the genetic code is strictly dependent on the type of metabolism at the origin of life.

The first step towards understanding the patterns in the code is therefore to construct a proto-metabolism; a series of plausible prebiotic chemical reactions where life and the genetic code could have emerged. Having researched the origin of life for the last 15 years, UCL’s Professor Nick Lane and his team argues that to understand proto-metabolism, one must use life as a ‘guide’ by considering that prebiotic pathways reflect highly conserved extant metabolism [3]. Phylogenetics; a branch of biology studying relationships between organisms and their evolutionary relationships suggests that the last universal common ancestor (LUCA) was a chemiosmotic autotroph, fixing CO2 and H2 through the acetyl CoA pathway and reverse incomplete Krebs cycle [4], thus supporting the theory that the first lifeforms were autotrophic. Indeed, experimental evidence has shown that core metabolic pathways can occur spontaneously in the absence of genes and enzymes, even producing nucleobases with metal ions acting as enzymes [5]. Thus, autotrophic protometabolism could have led to the synthesis of short sequences of RNA which initially lacked genetic information.

Using life as a guide to its own origins, Prof. Lane’s team reconstructed plausible prebiotic amino acid synthesis pathways from highly conserved extant pathways documented in the MetaCyc database. They found a clear pattern between the first position of the codon and the distance from carbon fixation, with the first set of amino acids encoded by guanosine at the first position. This indicates a temporal patterning for the recruitment of amino acids into the genetic code, meaning that the genetic code evolved progressively as proto-metabolic pathways developed. The first amino acids synthesised were assigned to guanosine and as the flux of metabolism increased, amino acids needing further reaction steps were successively assigned bases: adenosine, cytidine and finally uridine. The team then wondered why specific bases were temporally assigned to specific sets of amino acids. For instance, was guanosine assigned to the amino acids requiring the least reaction steps because it too requires the least reaction steps? It appears this is not the case as they were unable to identify a temporal ordering of the cognate nucleotides in line with that of the amino acids. However, a positive feedback loop between the production of glycine and purine nucleotides may explain why early amino acids are encoded by guanosine at the first codon position. Nevertheless, the specific ordering remains unclear.

The team also found that stereochemical interactions between nucleotides and their cognate amino acids were associated with the assignment of the last two codons. The hydrophobicity of amino acid corresponds to the hydrophobicity of its second anticodon. Finally, they found that the third codon position corresponds to size in non-redundant codons. When sister codons (codons that share their first two positions) encode different amino acids, large amino acids are encoded by a large purine base at the third position. Using these three rules, Prof. Lane’s team  were able to assign amino acids to sequences and create a ‘first draft’ code that was very similar to the actual genetic code.

Further work is being done to confirm these rules using nuclear magnetic resonance spectroscopy and molecular dynamic simulations. This hypothesis must further be developed to explain the emergence of mRNA, tRNA and ribosomes. Nevertheless, the scenario presents a framework that explains patterns found in the genetic code and offers a plausible explanation to the origin of biological information. Autotrophic proto-metabolism could have led to the synthesis of short RNA aptamers (RNA oligonucleotides capable of enzymatic activity and of specific binding to target molecules). These molecules initially had no genetic information but may have contributed to protocell growth through their enzymatic activity. As non-random interactions started occurring between random RNA sequences and amino acids, the sequence of the amino acids in the first peptides would have been templated by the RNA sequence. Deterministic chemistry would then have been able to become genetic information, eventually leading to the origin of life.


  1. Koonin EV. The meaning of biological information. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2016;374(2063):20150065.
  2. Fani R. The Origin and Evolution of Metabolic Pathways: Why and How did Primordial Cells Construct Metabolic Routes? Evolution: Education and Outreach. 2012;5(3):367-81.
  3. Harrison SA, Lane N. Life as a guide to prebiotic nucleotide synthesis. Nature Communications. 2018;9(1).
  4. Harrison SA, Palmeira RN, Halpern A, Lane N. A biophysical basis for the emergence of the genetic code in protocells. Biochimica Et Biophysica Acta-Bioenergetics. 2022;1863(8).
  5. Yi J, Kaur H, Kazöne W, Rauscher SA, Gravillier LA, Muchowska KB, et al. A Nonenzymatic Analog of Pyrimidine Nucleobase Biosynthesis. Angewandte Chemie International Edition. 2022;61(23).