Saturday, August 26, 2017

Information and Chemical Diversity


Another Nautilus article has gotten me thinking about chemical complexity and the origin of life. The article itself is not related to either topic. How Information Got Re-Invented is “the story behind the birth of the information age”. It is a selected biography of Claude Shannon and the influences that led him to him famous paper, “A Mathematical Theory in Communication”. If you’re a physical chemist like me, you’ve heard of Shannon entropy and its similarities to Boltzmann entropy.

The article does a great job tracing the work of Harry Nyquist and Ralph Hartley, both engineers who had contributed significant insights that led to Shannon’s breakthrough work. There are also anecdotal stories from friends of Shannon during his time at Bell Labs. (I highly recommend The Idea Factory by Jon Gertner about the golden age of innovation thanks to the remarkable setup of Bell Labs.) The article also clearly lays out the basics of information transition theory and introduces the bit as a measure of information content.

More importantly, the authors of the article Jimmy Soni and Rob Goodman, do a great job uncovering the counter-intuitive definition of quantified information. Here’s an excerpt:

“What does information really measure? It measures the uncertainty we overcome. It measures our chances of learning something we haven’t yet learned. Or, more specifically: when one thing carries information about another – just as a meter reading tells us about a physical quantity, or a book tells us about a life – the amount of information it carries reflects the reduction in uncertainty about the object. The messages that resolve the greatest amount of uncertainty – that are picked from the widest range of symbols with the fairest odds – are the richest in information.  But where there is perfect certainty, there is no information: There is nothing to be said.”

The article goes on to discuss languages and code-breaking. The English language, for example, has many redundancies – including the letters used in the alphabet such as vowels. The authors quote an example from Shannon: “MST PPL HV LTTL DFFCLTY N RDNG THS SNTNC.” Code-breakers exploit these redundancies to their advantage. In fact, “every human language is highly redundant. “From the dispassionate perspective of the information theorist, the majority of what we say – whether out of convention, or grammar, or habit – could just as well go unsaid.” Having attempted to learn languages that have more or less redundancy in my adult life gives me more appreciation and patience for people speaking in their non-native tongue and making “grammatical errors”. Many of these aren’t errors per se, at least in terms of communication. You can understand what they are saying – it just doesn’t sound “right” to your native ears. But rightness in this case is simply the current convention a native speaker uses. Languages do evolve over time

Why the redundancy? It turns out that “every signal is subject to noise. Every message is liable to corruption, distortion, scrambling.” The speed at which a message can be propagated is dependent on how the message is encoded or packaged – Shannon proved there is a “point of maximum compactness”. So it turns out that redundancy is important for communication or propagation. In the case of our genetic material DNA, copying isn’t perfect – there are errors; but they are mitigated by redundancy and evolved error-correcting helper molecules. Darwinian evolution takes advantage of such errors. It allows for variation – I think of it as creativity in exploring biological space.

Chemistry operates in the same way. The riddle of origin-of-life chemistry has less to do with making a large variety of complex molecules – it’s about why life only picks out a select few and uses them over and over. I study the oligomerization of small molecules. Starting with a single substance such as formaldehyde (CH2O), a whole plethora of molecules can be formed including polyethers, oxanes, and a whole range of sugars. Now add a second substance into the mix and the diversity of molecules explodes exponentially.

Now if indeed we humans have evolved to be information guzzlers, as suggested by Gazzaley and Rosen, and there is a thermodynamic law that favors the increase of information akin to entropy, there should be a way to quantify this in terms of the information carried in molecules. But what is this information? Number of elements? Number of atoms? Number of bonds? Number of adjacent reaction types? Number of downstream cascades? There are also likely to be constraints that increase certainty and decrease information – I’m thinking of thermodynamic sinks here. I’m reminded of Jeffrey Wicken’s book, Evolution, Thermodynamics and Information. It's on my bookshelf. I read it six years ago when I got interested in origin-of-life research and didn’t understand a lot of it – let’s just say it was very information-dense. I should revisit the book, but my summer is almost over! Maybe it will be my project next summer.

The Nautilus article reminded me that in thinking about quantifying information, we should concentrate on the symbols that weren’t used, the words that weren’t said that could have been. A couple of speakers at the recent ISSOL conference made essentially the same point. We as researchers shouldn’t just focus on how we got to the current molecules of life, we should be exploring adjacent chemistries – molecular systems closely related that will give us a clue into why extant life uses what it uses chemically. Many groups are already doing this and our chemical community is the richer for it.

No comments:

Post a Comment