Shannon Information Entropy in the Genetic Code
American Physical Society March Meeting, Los Angeles, California, March 5-9, 2018
The canonical genetic code is the nearly universal language for translating the information stored in DNA into proteins, and has evolved a considerable measure of robustness to single-letter mutations. Shannon entropy measures the expected information value of messages. As with thermodynamic entropy, the Shannon entropy is only defined within a system that identifies at the outset the collections of possible messages, analogous to microstates, that will be considered indistinguishable macrostates. This fundamental insight is applied here to amino acid alphabets, which group the twenty common amino acids into families based on chemical and physical similarities. By calculating the normalized mutual information, which measures the reduction in Shannon entropy conveyed by single nucleotide messages, groupings that best leverage the fault tolerance of the code are identified. The relative importance of properties related to protein folding - like hydrophobicity and size - and function, including side-chain acidity, can also be estimated. This approach allows for the quantification of the average information value of nucleotide positions, which can shed light on the severity of hereditary and de novo genetic mutations.
Nemzer, Louis R., "Shannon Information Entropy in the Genetic Code" (2018). Chemistry and Physics Faculty Proceedings, Presentations, Speeches, Lectures. 267.