What is the elemental basis of language? Are there
words which all (known) languages
share? How might one go about answering such questions?
It turns out that a team of linguists has been
working on creating a meta-language made up of primes – the elemental words of
any language. It’s called “natural semantic metalanguage” or NSM. Forty
years down the road, the most up-to-date list consists of 65 primes, less than
the number of chemical elements in our periodic table. Using only the 65
primes, one can define any other word in any language and tease out the (often
subtle) differences between closely related words in different languages. It is
cumbersome to do so, but it is elemental. And it works.
I learned about the NSM project in David Shariatmadari’s
Don’t Believe a Word, a book I
alluded to in my last blog post. There are many fun tidbits about words
and languages, but not just at a superficial level; the author dives into
linguistic details, yet keeps his prose breezy and light. I’m not a linguist,
yet I found his book readable, engaging and delightful.
Are some languages better than others? Well, what
do you mean by better? It’s been claimed that Sanskrit is the most efficient
language and that NASA has endorsed it – the endorsement isn’t quite true, but
there has been some analysis comparing artificial intelligence programming and
Sanskrit. Or if you’re looking for an efficient script, perhaps you should
consider Korean. It’s certainly well-designed and compact. Turns out
there’s a study comparing information density in seven widely spoken languages
covering the three main varieties (isolating, fusional, agglutinative).
Mandarin turns out to be the most informational dense, and Spanish the least.
But… There’s always a But.
The point of language is to communicate. Although
some languages might seem more ‘complex’ or dense than others, they seem just
as effective in communicating – at least if you’re a native speaker, using the
language day-in and day-out. Many of us don’t have that skill set or practice.
In the last five years, I’ve been learning Spanish and Mandarin as sort of
“third” languages. I’m terrible at both, and they are indeed very different
languages, but my communicative capacity in both is surprisingly similar; I can
communicate fine with kindergarteners, and have very limited ability conversing
with adults.
But there are differences. When listening to
Spanish, I have trouble catching the key words. For Mandarin, I’m still
translating the first three words in my head and miss the next three – due to
its high information density. Spanish, on the other hand, being of lower
density is spoken more quickly by native speakers. My reading comprehension of
Spanish is much better than my listening. For Mandarin, I have trouble telling
some of the characters apart, or when they are used in different contexts (with
alternate meanings).
Shariatmadari explains why.
All
languages do the job we need them to do: allow us to communicate effectively.
There is… a fairly consistent ‘rate of information transmission’. If this
dipped too far, the language would fail to perform the tasks required of it –
using it would be like fumbling in a second language. If the rate went up too
high, it would exceed our psychological and cognitive capacities (it would be
impossible for our tongues and brains to keep up with). In other words,
languages cluster around a communicate sweet spot.
Sci-fi could have a field day exploring how a
cognitively superior alien race* might structure and speak its own language,
possibly exceeding our ability to comprehend. Since I study the origin of life,
this made me think of the different informational systems of biochemistry. We
think of the four-letter alphabet DNA (or RNA) as the “informational” molecule.
Nucleic acids sequences are translated into proteins which have a twenty-letter
alphabet. Sugars have their own alphabet too, with more variation between
distantly related organisms. And helping the crosstalk between these systems is
a larger (yet still small) group of metabolites and co-factors that facilitate
“speech” or signaling. Chemicals rather than words are communicated. There’s a
physicality to it. Like our sense of smell, communicated by molecules wafting
through the air, in contrast to soundwaves formed by spoken words.
Do the different biochemical languages have
different information density? Or complexity? Or communicative efficiency? It’s
hard to quantify these in different systems. What scale do we choose to measure
these different systems against each other? Test-tube chemistry might be called
“simple”- reactants collide and react – it’s raw and direct. Biochemistry, on
the other hand, is heavily-mediated chemistry. One system talks to another
system mediated by translators. This made me think of Shariatmadari’s
description of the seeming “continuum” between German and Dutch as one examines
the language of the bordering communities, seemingly in-between. He provides
another dramatic example:
Inhabitants
of Slovenia, which borders Italy, might find it hard to understand their fellow
Slavs in Bulgaria, which borders Turkey. But they’re only a couple of steps
away from each other. Get a Serb and a Macedonian to stand in between them, and
you’ve assembled the perfect linguistic relay team. These areas of overlap, of
links in an unbroken chain, are called ‘dialect continuums’, delicate
structures, which… have been eroded by both globalization and nationalism.
Chemistry has elements. Does it have a language? At
its base, is it akin to a pidgin, where over time as biochemistry evolved, it
turned into a creole? Were LUCA and its cousins the seemingly pre-Babel-babble? Perhaps I should be collaborating with a linguist to consider these
questions!
*While Arrival (the movie) is not mentioned in Shariatmadari’s book, he does discuss the
debunking of the Sapir-Whorf hypothesis, at least in broad terms.
No comments:
Post a Comment