Synonymity, Shannon Entropy, Complexity, and the Library of Babel

Count Timothy von Icarus

A thought occured to me recently while reading Borges' short story the Library of Babel. I had also been doing a lot of reading on information science as of late.

The story, which is worth a read, posits a bunch of librarians living in a massive (but finite!) library. Each book in the library has a given number of characters per page, and a given number of pages. The library doesn't repeat, it simply contains every possible combination of characters per book, in different books. This represents 10^4677 books. For comparison, estimates of the number of protons in the visible universe are around 10^78-82.

From the stand point of what can be put into language, or any human code, this Library of Babel likely represents the maximum entropy of any code possible.

You can take a look at the library here: https://libraryofbabel.info/About.html

An interesting fact that Borges notes in the story is that the pages of meaningless strings of characters that are far more likely to occur than real words, all actually mean something. Somewhere in the library is a book, or series of books, that explains a code through which said book can be translated into a meaningful message. Even a repeating string of "XLLLX XLLLX" could have multiple meanings, if we take that each block modifies the ones before or after it.

This led me to a few thoughts about information theory that, despite some desperate searching, I could not find answers to anywhere in the literature (and this is perhaps likely because I don't know the literature and its terms well enough, or because my ideas won't prove out).

I've introduced the Library because it is a useful stand-in for "all possible messages."

Looking at the Library of Babel a few things stand out:

1. The total number of meaningful messages is less than the total number of possible messages. The proof of this is that the same message can be sent using different codes, such that, once transcribed into meaning by the receiver, it is the same message. For an example, we can imagine whole books of English where every letter is simply shifted one space down, A becomes B, Z becomes A, etc.

2. While the maximum Shannon Entropy of a message is the number of possible configurations of the message (so for a three bit code it is 8), we often describe entropy as the amount of surprise in a message. For example, an all lower case text message with only spaces has 27 possible outcomes per character. So the maximum entropy is 27^n where n = the number of characters in said message. If there are limits on what a message can contain, surprise is reduced.

Now let's look at synonyms. Imagine you're getting a message from your doctor as a follow up to an exam. They're very pressed for time so they just have two buttons for sending you a complete message, two possible outcomes. However, the amount of surprise from the message is different if the buttons send either the message "you need to see an opthalmologist" or "you need to see a doctor who specializes in eyes" versus if the two potential messages are "you need to see an opthalmologist" and "you need to see a neurologist." In fact, in the first case, given we take opthalmologist and eye specialist to be perfect synonyms, the surprise is zero, only one possible message can be sent in terms of meaning. Synonymous messages don't provide additional information. Messages can also more or less synonymous obviously.

The above example is fairly odd, but you can also look at binary code. A three digit binary code has 8 outcomes. However, if we assume we code our transmissions with an incredibly feeble encryption method whereby the first bit of our message simply tells us to flip all 1s to 0s and all 0s to 1s if the first digit is a 1, or to ignore the first bit if it is a zero, we actually only have four possible messages. I am well aware that from the mathematical definition of surprise/entropy this still represents 8 potential outcomes, which I will return to, but the general point is that the maximum entropy of meaning for a message can be less than the total number of possible outcomes (i.e. x^n).

In reality, letters in English don't follow a random distribution, so an English message has a far lower surprise (surprise is simply a measure of probability, p for event x). Automatic text generators further reduce entropy, and get more accurate in producing believable text by using words as the unit of analysis, and then group words together in strings in which they are likely to occur. Modern text generators do one better, by generating their text from a sample you give them, so that they mimic the authors' style and content

The issue I see here is that every standard measure of surprise I've seen that looks at letter or word distributions do not take account of synonymity between messages for calculating surprise.

3. Certain information can not reach some receivers because of the way it is encoded. A good example is a series of text messages displayed by a projector. If the projector uses two shades of red that the human eye can't tell apart, one for the background and one for the text, someone watching just sees an unchanging red square. Essentially, the error for our channel is = 1. If you flip the text to a green wavelength, the message gets across fine, although some wavelength pairings that work for most people won't work for people who are colorblind.

This means that functional synonymity can depend on the recipient of a message. This is a different synonymity from two messages referring to identical ideas however.

Now what does this all have to do with complexity? Complexity is a poorly defined concept. Complex systems are systems that cannot be too chaotic or too orderly. They are generally robust (removing one part doesn't destroy the system), adaptive, have emergent properties, and are networked.

The problem is that these definitions aren't particularly firm. There are tons of examples of complex systems that seem to fail some of these test, or non-complex systems that pass them. The brain is a common example of a complex system, but it isn't particularly redundant. Sure, you can remove neurons, but you can also scrape off flakes of metal on a clock gear and still have it work. The fact is, the brain is much more like our uncomplex clock when you start talking about removing functional components, with the most common result being the death of the system when one part is removed.

https://arxiv.org/pdf/1502.03199 this has a good overview of the definition of complexity and its problems.

https://core.ac.uk › pdfPDF
What is a Complex System? - CORE

This poses one possible measure of complexity, statistical complexity. More statistically complex systems are those systems with more patterns, less randomness.

One I really liked was: https://adamilab.msu.edu › Re...PDF
What is complexity? - The Adami Lab.

This looks at the complexity of an organism as a function of how much information it holds about its enviornment.

---

So, my insight, or perhaps confusion, is that the degree of relative synonymity between various possible messages in a system seems to be highly analogous to complexity.

How is this?

To recap:
1. Codes of similar sizes can hold perfectly or roughly synonymous messages.

2. In terms of information producing meaning (or in thinking of information in the physical sciences, causing an effect), the important thing is the entropy of meaning in a message, not necessarily the number of possible messages or the probability of each individual message.

3. The degree of synonymity is determined by the receiver of the information.

The tie in to complexity is that, in complex systems, possible messages have very low rates of synonymity. Information in the form of light waves is far less synonymous for a human eye than it is for a rock.

To use the brain in a bit of a grizzly example, the possible states of my brain if I am sitting in a blank room, thinking, is far lower than the possible states if you throw my brain in a blender. Entropy, as measured by possible states, within the complex system of a functioning brain is orders of magnitude less, because there are far fewer states that represent a functioning brain than there are possible configurations of all the parts that make up the system. Just as coherent passages in the Library are far less common.

However, we have very little ability to measure thought by looking at the brain, or to predict behavior. Meanwhile, if we use our blender for a bit, we will have an extremely effective model for telling us exactly where certain lipids and proteins will eventually settle in the mush we have left. If we vaporize it, we can create an quite accurate model of what will happen to the components despite exponentially higher entropy.

The thing that springs to my mind is that, from an information perspective, this is because the synonymity between various components of the system goes up as the system is destroyed. A single proton's behavior in studying hydrogen gas is replaceable with any other proton. By contrast, the components of a neuron can't be changed much at all without the information it transmits becoming significantly less synonymous.

By way of another example, we can imagine a benzo molecule floating around the blood, bouncing into red blood cells, not doing a whole lot. It doesn't do much when it crosses the blood brain barrier either. It's physical contact with most areas of most cells is mostly synonymous with a plethora of other molecules. But then it bounces into a GABA receptor, and transmits significantly more information.

Synonymity explains how surprise can go up while entropy goes down. It also explains interconnectedness, since changes in key system components are less synonymous in more complex systems than they are in less complex ones. In a random series of characters, any new character is unlikely to change the meaning of a message, whereas in a novel, a plot twist at the end of the book can effect the meaning of all preceding words. The choice of words on the last pages becomes less likely to be synonymous as the complexity of the text before increases (this makes sense given our measure of surprise is context dependant).

It's certainly not like hydrogen molecules in a container don't interact, nor are they not connected. Equalized pressure is an emergent property coming from interactions across the volume of the gas. You also have "tipping points" in phase transitions and fusion, another property of "complexity." The equilibrium is also more robust, existing through a range of temperatures and through variations in gravity, continuing through degrees of variance in both that would eradicate more complex systems, and the system is also highly adaptive to enviornmental changes, reaching new equilibriums quite quickly. And indeed, these systems are considered "complex," just less complex than things like ecosystems and brains. The problem is that, by common definitions of complexity, gas equilibrium should be MORE complex than a brain.

And yet this can't be true for our definition of complexity to have pragmatic meaning, because predicting gas equilibriums is fairly easy, predicting changes in economics seemingly hopeless.

It seems apparent to me that the reason it is easier to model gas dynamics, despite much higher entropy, a larger number of interactions, more adaptability, etc. is because the components share a great deal of synonymity. Information exchange is greater in terms of Shannon Entropy, but lower in terms of meaning entropy. This is what allows you to study a small quantity of hydrogen gas and deduce laws that apply for massive volumes, while studying a single person gets you almost nowhere in determining human social behaviors.

----

Perhaps there isn't too much to this line of thinking, because synonymity isn't easy to model mathematically. After all, there are lots of good definitions of complexity, they just lack rigor.

However, it seemed to stand out to me when I thought about the intersection of information science and complexity.

I'd also love any recommendations in information science that deal with synonymity, I haven't found anything. The closest idea I've found is error, but they really aren't the same thing.

Count Timothy von Icarus

By the way, another interesting thing about the library is that there should be a size at which every possible thought or occurance is expressed. Borges' library seems much larger than this requirement, but it could be made larger still by increasing the characters used, characters per page, of number of pages per book. His library already seemingly has enough space to code the exact position of every particle in the universe in relationship to one another, in multiple code variants, but it could be larger still. On the small end, all possible books at just 60 characters and two characters per book yields a tiny 3,600 books. There is of course, the fact that a smaller library cannot code all the possible configurations of a larger one, but at a certain point you have to hit maximum synonymity for ideas other than the contents of libraries of larger entropy than the one you already have. That is, there will cease to be relevance outside entropy itself.

At what point does the library reach a size where it begins only producing duplicates of meaning? If time is infinitely divisible, you can't code the universe without an infinite library, but said library must also have infinite numbers of duplicates. If you take a minimum unit of time, then there is a finite code representing the universe (provided the universe is finite).

Would a library containing all possible codes and permutations of human thought contain all possible information, or would it only contain a part? Is it possible there is information that resists being coded as such, at least for man? Essentially this would be akin to the messages shown in indistinguishable wavelengths discussed earlier. Ideas that are different, but seem irrevocably synonymous to our minds no matter how they are expressed?

Agent Smith

A very common utterance in scientific circles is "the brain is the most complex thing in the universe."

The yardstick, to my reckoning, for such a pronouncement is combinatorics - how many possible permutations of the neurons and synapses are possible.

It [the brain] has 100 billion neurons and 10- to 50-fold more glial cells — PNAS

In a human, there are more than 125 trillion synapses just in the cerebral cortex alone. That's roughly equal to the number of stars in 1,500 Milky Way galaxies. — Smith (ScienceDaily)

Now, what are synonyms? As far as I can tell, if two/more symbols have the same referent, the symbols are synonymous. In effect, synonyms are redundancies. One reason why a language is like that is when it's more than sufficient to handle the situation if you know what I mean: For instance, if your office is over-staffed, you might/can have 2 secretaries.

How does that relate to complexity? Does the brain actually need 125 trillion synapses or 100 billion neurons to do what it does? If we assume a binary model for the brain,

1. Permutations possible with 100 billion neurons = $2^{100,000,000,000}$ :scream:

2. Permutations possible with 125 trillion synapses = $2^{125,000,000,000,000}$ :scream: :scream:

On the face of it, complexity and redundancy seem to go hand in hand.

Count Timothy von Icarus

↪AgentSmith

I get your point here, but the thing is, for that same brain, there are orders of magnitude more possible permutations of how the matter that composes it could be arranged.

If we had a magical blender that randomized the particles for us into all possible configurations, only an exceedingly small number would be functioning brains, most would be a soup.

So while I get that, just looking at synapses, you have an enormous number of permutations, this is nonetheless a small minority of potential permutations for the systems components to have. And of course, fundemental particles interacting still represents information exchange, so the entropy of our blended brain is higher than that of the functioning one. It's just that a huge number of permutations in the blended version are effectively analogous soups.

Otherwise, it seems we have to assume that the amount of information/entropy for individual particles somehow increases as an emergent phenomena of their interactions. This would be like saying a mashup of random words has more entropy than a equally sized mashup of random characters, which it clearly does not.

Agent Smith

↪Count Timothy von Icarus

Does the following make sense to you?

Start off with a 2 lettered (binary) language with characters/letters X, Y.

Perform a permutation $2 \times 2$ and get 4 words.

4 > 2

Permutation with words $\geq$ Permutation with letters/characters.

Indeed, for any set S, the subsets of S are more numerous than members of S. — John Stillwell

Count Timothy von Icarus

Permutations for calculating entropy would be the number of possible outcomes in the message. So a 2 bit configuration like you described would have four outcomes, not two. Entropy for binary is just additive of log2.

Words represent an extremely marginal amount of possible combination of letters. With English, 26 letters ^ 8 for an 8 letter word represents about 209 billion possible combinations. English has about 80,148 8 letter words. Adding proper nouns might (optimistically) get you up to 320,000, which means the possible number of permutations of the string that equate to actual words is about 15 1/10,000ths of 1% of all possible combinations. Same would hold for particle configurations in relationship to each other in any system. I may have phrased that ambiguously above, but that's what I mean.

Agent Smith

↪Count Timothy von Icarus

P(x/y) = Probability of x given y

B = Permutations of the atoms of a brain
M = Permutations of the atoms of a brain such that a mind exists

B >> M such that $\frac{M}{B} \approx 0$

P(M/Q) = The probability of mind (M) given a permutation of the atoms of brain matter (Q)

$P(M/Q) = \frac{M}{M + B} \approx 0$

N = Permutations of letters of the English language
W = Permutations of letters of the English language that have meaning

N >> W such that $\frac{W}{N} \approx 0$

P(W/R) = The probability of a meaningful word (W) given a string of characters in a language.

$P(W/R) = \frac{W}{N+W} \approx 0$

Given a configuration of atoms (of brain matter), the likelihood of it being mind-endued is near-zero.

Given a string of characters (of a language), the chances of it being a meaningful word is (also) near-zero.

Suppose now that I find a permutation of atoms of brain matter and I discover it is endowed with a mind. It would be shocking (very high surprisal index). Put simply, a brain that comes with a mind has infinite bits of information: $Information \propto \frac{1}{P(M/Q)}$

Likewise, if I'm given a string of characters in a language and I find out it has meaning, it would also as startling (very high surprisal index). Here too, a meaningful string of characters (a word) has infinite bits of information: $Information \propto \frac{1}{P(W/R)}$ .

If we take a combinatorics approach to complexity we can say that the surprisal index varies linearly with complexity: more permutations possible means any subset of permutations is going to be (very) unlikely.

You take it from here...

SophistiCat

1. The total number of meaningful messages is less than the total number of possible messages. The proof of this is that the same message can be sent using different codes, such that, once transcribed into meaning by the receiver, it is the same message. For an example, we can imagine whole books of English where every letter is simply shifted one space down, A becomes B, Z becomes A, etc. — Count Timothy von Icarus

This is assuming that there is a strict mapping from code to meaning (a surjection in this case). In reality, of course, the interpretation of a text ("code") is not unambiguous, which is to say that the same code can generate multiple meanings for different receivers (readers) or even for the same receiver.

T Clark

The story, which is worth a read, posits a bunch of librarians living in a massive (but finite!) library. Each book in the library has a given number of characters per page, and a given number of pages. The library doesn't repeat, it simply contains every possible combination of characters per book, in different books. This represents 10^4677 books. For comparison, estimates of the number of protons in the visible universe are around 10^78-82. — Count Timothy von Icarus

The theme of incomprehensively huge libraries, either finite or infinite, is pretty common in fantasy books. The problem usually comes down to how to find the information you are looking for. Sometimes you do it by magic, sometimes it's just luck, and sometimes it's impossible or practically impossible and the library is useless.

Synonymity, Shannon Entropy, Complexity, and the Library of Babel

Welcome to The Philosophy Forum!

Categories

More Discussions