Monday, February 17, 2020

Periodic Table of Language


What is the elemental basis of language? Are there words which all (known) languages share? How might one go about answering such questions?

It turns out that a team of linguists has been working on creating a meta-language made up of primes – the elemental words of any language. It’s called “natural semantic metalanguage” or NSM. Forty years down the road, the most up-to-date list consists of 65 primes, less than the number of chemical elements in our periodic table. Using only the 65 primes, one can define any other word in any language and tease out the (often subtle) differences between closely related words in different languages. It is cumbersome to do so, but it is elemental. And it works.

I learned about the NSM project in David Shariatmadari’s Don’t Believe a Word, a book I alluded to in my last blog post. There are many fun tidbits about words and languages, but not just at a superficial level; the author dives into linguistic details, yet keeps his prose breezy and light. I’m not a linguist, yet I found his book readable, engaging and delightful.

Are some languages better than others? Well, what do you mean by better? It’s been claimed that Sanskrit is the most efficient language and that NASA has endorsed it – the endorsement isn’t quite true, but there has been some analysis comparing artificial intelligence programming and Sanskrit. Or if you’re looking for an efficient script, perhaps you should consider Korean. It’s certainly well-designed and compact. Turns out there’s a study comparing information density in seven widely spoken languages covering the three main varieties (isolating, fusional, agglutinative). Mandarin turns out to be the most informational dense, and Spanish the least. But… There’s always a But.

The point of language is to communicate. Although some languages might seem more ‘complex’ or dense than others, they seem just as effective in communicating – at least if you’re a native speaker, using the language day-in and day-out. Many of us don’t have that skill set or practice. In the last five years, I’ve been learning Spanish and Mandarin as sort of “third” languages. I’m terrible at both, and they are indeed very different languages, but my communicative capacity in both is surprisingly similar; I can communicate fine with kindergarteners, and have very limited ability conversing with adults.

But there are differences. When listening to Spanish, I have trouble catching the key words. For Mandarin, I’m still translating the first three words in my head and miss the next three – due to its high information density. Spanish, on the other hand, being of lower density is spoken more quickly by native speakers. My reading comprehension of Spanish is much better than my listening. For Mandarin, I have trouble telling some of the characters apart, or when they are used in different contexts (with alternate meanings).
Shariatmadari explains why.

All languages do the job we need them to do: allow us to communicate effectively. There is… a fairly consistent ‘rate of information transmission’. If this dipped too far, the language would fail to perform the tasks required of it – using it would be like fumbling in a second language. If the rate went up too high, it would exceed our psychological and cognitive capacities (it would be impossible for our tongues and brains to keep up with). In other words, languages cluster around a communicate sweet spot.

Sci-fi could have a field day exploring how a cognitively superior alien race* might structure and speak its own language, possibly exceeding our ability to comprehend. Since I study the origin of life, this made me think of the different informational systems of biochemistry. We think of the four-letter alphabet DNA (or RNA) as the “informational” molecule. Nucleic acids sequences are translated into proteins which have a twenty-letter alphabet. Sugars have their own alphabet too, with more variation between distantly related organisms. And helping the crosstalk between these systems is a larger (yet still small) group of metabolites and co-factors that facilitate “speech” or signaling. Chemicals rather than words are communicated. There’s a physicality to it. Like our sense of smell, communicated by molecules wafting through the air, in contrast to soundwaves formed by spoken words.

Do the different biochemical languages have different information density? Or complexity? Or communicative efficiency? It’s hard to quantify these in different systems. What scale do we choose to measure these different systems against each other? Test-tube chemistry might be called “simple”- reactants collide and react – it’s raw and direct. Biochemistry, on the other hand, is heavily-mediated chemistry. One system talks to another system mediated by translators. This made me think of Shariatmadari’s description of the seeming “continuum” between German and Dutch as one examines the language of the bordering communities, seemingly in-between. He provides another dramatic example:

Inhabitants of Slovenia, which borders Italy, might find it hard to understand their fellow Slavs in Bulgaria, which borders Turkey. But they’re only a couple of steps away from each other. Get a Serb and a Macedonian to stand in between them, and you’ve assembled the perfect linguistic relay team. These areas of overlap, of links in an unbroken chain, are called ‘dialect continuums’, delicate structures, which… have been eroded by both globalization and nationalism.

Chemistry has elements. Does it have a language? At its base, is it akin to a pidgin, where over time as biochemistry evolved, it turned into a creole? Were LUCA and its cousins the seemingly pre-Babel-babble? Perhaps I should be collaborating with a linguist to consider these questions!

*While Arrival (the movie) is not mentioned in Shariatmadari’s book, he does discuss the debunking of the Sapir-Whorf hypothesis, at least in broad terms.

No comments:

Post a Comment