Why LLMs are so effective? Because language is thought
Dragana and I experienced telepathy. Twice.
It was not a fluke. It was not our imagination. We experienced cold hard telepathy. This happened on two occasions where, for a brief period of time, we knew exactly what the other person was thinking. We became deeply connected by the exact same train of thought; the same sequence of words traversed our minds. These thoughts were triggered by situations only possible given our shared experiences: our history as friends, our immigrant upbringings, our AI related jobs, and English as our common language. The first time, we simply laughed it off. The second time, we realized that this could have only happened given our shared understanding of reality, powered by a level of specificity that only language can convey. This realization is at the heart of the current AI revolution and it is what makes Large Language Models (LLMs) work so scarily well.
Part 1 - Language as a substrate of thought
Languages carry a bottomless level of meaning. From a large but still a finite number of words, we can create an infinite number of meaningful and extremely specific ideas that we can transmit to others. An illustration of this is how, when a relationship dies, a dialect dies with it. The words and phrases each couple uses carry such profound meaning that it becomes a part of their shared personality.Douglas Hofstadter, a great inspiration for this post, calls this phenomenon a low-resolution copy of each other’s mind in his book, “I am a Strange Loop”. Chapter. 17: “How We Live in Each Other” An example: Dragana’s native language is Serbian and she is married to R, whose mother tongue is Dutch. Their relationship, from its start, has been built in English which is their second language. And even when Dragana becomes fluent in Dutch, they will still speak English to each other because they have developed their love-layer on top of the English language, specific and unique to only the two of them. The English language is not only a means for communication but is also part of the relationship itself! The love-layer example shows a foundational principle around languages, which can then be extended to siblings, friends, family units, communities, cultures, ethnic groups and beyond. The infinite layers of specificity and meaning are created by shared semantic composition among different groups of people. This is also why the Scots have over 421 words to represent concepts around Snow, and why accurately translating between languages is extremely hard.I (Paolo) read the spanish version of Hofstadter’s opus: Gödel, Escher, Bach a long time ago; a book about language, music, minds and how self-reference gives rise to consciousness. The book is originally in English but since it relies a lot on wordplay, Hofstadter spends 8 pages giving the translators guidance on how carry its full meaning. Later on, he has argued that using AI as a tool for translation should be criminal because it erodes what it is to be human. Take the Spanish-adopted word Apapachar,Originally coming from the Náhuatl language and primarily used in Mexico and in Paolo’s household. it can be roughly translated to hug with the soul, but this translation is completely devoid of the beauty and impact of what an apapacho feels like. Certainly direct human-to-human communication is way more than just the words in a language. There’s the tone, both written and spoken, the body movements, the emojis you use and even subtle facial expressions that our brains are so primed to detect. This is precisely why language alone can only get you so far, it’s just one of the many layers that make up communication.
Our first telepathic encounter involves our shared friend Y. We were having drinks at his place when he pulled out some fancy French Vanilla Ice Cream. Paolo said “Wow man, French Vanilla.”, however, both thought the exact same thing: Y is very bougie. And only I (Dragana) understood what this conveyed. Both a true appreciation of a friend sharing high quality ice-cream with us and a tiny hint of acknowledgement that our friend demands and enjoys certain standards. I looked at Paolo and we both started laughing frantically. We were not only thinking the exact same thing but we knew that the other person was thinking it. That was the telepathic moment. We drew from the same database of our experiences, inside jokes and mutual observations. Paolo’s “Wow man, French Vanilla.”, triggered the same shared meaning. We had built up enough shared semantic composition that minimal language conveyed maximal meaning.
The examples above show why human language is such an extraordinary cognitive tool that, in a way, it is our thought. We can probably think ideas and feel feelings but those materialize when we can describe them using words. Language formalizes our thinking process. Take your own emotions as an example and how damn hard it is to precisely pin them down just by using words.A great example is the wheel of emotions, a map that allows you to hone into your emotions by expanding your vocabulary. Clearly, we are not even close to being the first ones to discover this phenomena. Similarly, it’s been proven that there’s more to thought is than just the languages we use. However, there’s increasing evidence on how language does shape it. More importantly, we needed a clickbaity title.The formal theory that underpins these claim is called Linguistic relativityLinguistic relativity or Whorfian hypothesis. The strong version of the theory: language is thought has been mostly discredited. Yet, there’s empirical evidence for a weaker version which supports that languages quite heavily influence a speaker’s perception of reality. See Dedre Gentner’s “Language serves as a cognitive tool kit.”
Part 2 - Why LLMs work so well?
Now, let’s completely switch context from psychology and linguistics to the world of mathematics and artificial intelligence. For centuries, mathematicians have been trying to represent and abstract reality with symbols, equations and diagrams. From geometry to calculus, Math has been able to accurately describe phenomena and it’s all worked quite well! At times it seems that math is baked into reality. As an abstraction, math is quite useful, underpinning most if not all, of humanity’s technology. From the wheel to the transistor, there’s some mathy abstraction that can describe its workings.This is a gross oversimplification of the history of Science. Dragana only allowed me (Paolo) to fit a tenth of what I wanted to write here. However, it was definitely for the best. When reality became too complex to describe exactly, mathematicians invented probability and statistics. These disciplines allowed us to quantify uncertainty and capture general trends in data, synthesizing it into actionable insights, unfortunately for our day-to-day jobs. Fast forward to today in our data rich world. Can we also find trends in the language data? Well, it turns out that if you add a couple of teraflops of compute power and stack layers and layers of linear regressions, boom, magic happens. Suddenly, your computer starts speaking back to you.
The connection between language and AI sits at the core of Large Language Models (LLMs). At the hearts of ChatGPT, Claude, Gemini and all the recent vaguely-Englishman named AI-products, there is a Large Language Model. An LLM as its name suggests is a model, a representation of reality, that takes language (a piece of text) as its input and tries to autocomplete it, i.e. it also outputs a piece of text. To achieve this feat, LLMs first break down the input text into tokens, think: a word, a piece of a word or even a punctuation mark. As an example, the word fishing is very different from fish and the* -ing* suffix by themselves, the latter, fish and -ing are the tokens. Next, these tokens are converted into long lists of numbers called vectors. These lists of numbers are not random at all, their relative positions between each other is what allow them to retain semantics and context. This means that tokens with relatively similar meanings have vectors that are close together. This is what we nerds call embeddings. You should click here for a cool 3D visualization of this fact. Then, these embedded vectors are passed through a series of mathematical operations called a deep learningdeep learning network. You can think of this network as a machine and its parameters as finely tuned knobs, that are elegantly combined to process text in a way that it seems to understand it.The secret sauce of the LLMs’ machinery is the transformer architecture. These attention heads allow the LLMs to focus on the right pieces of information surrounding a word. So that when you refer to an it, the LLM can retrieve the context around what it, is. So, if we input an incomplete sentence: “I like fishing in the ____”, the LLM will try to complete it. A few likely next words could be: river, sea, lake or pond, however, the word rock (?) would be unlikely, therefore meaningless. Last, the LLM draws one of the likely words at random, filling in the blank in a statistically coherent manner. This randomness is, paradoxically, what allows LLMs’s texts interesting and human-like.This randomness is also what enables hallucinations, the formal term of what happens when your AI chatbot comes up with facts. It is intrinsic its architecture.
And that is how LLMs work; very very, VERY roughly. However, the why they work is way more subtle. It turns out that the fundamental principles that make language so profoundly rich can be found within all of humanity’s written work. It’s only natural because language is part of the cognitive toolkit that has allowed humanity to build a shared understanding of reality. The inflections that make Italian sound so significant. The combinations of nouns and adjectives that writers wield to make a text provoke emotions in you. And yes, even some of the cultural context that the word apapachar conveys. In the process of training the LLM, we fine-tune these knobs to make the output make sense given an input. When we use all of humanity’s written work as input-output combinations, the fundamental principles of language are captured by the machinery of the LLM. The same intricacies and depth that make language the substrate for your thoughts is then approximated scarily well by the LLMs. While we don’t know exactly how it works, we know that it’s an emerging property given the training process. Even things like tone or style, both paradoxically outrageously hard to describe in words, are modeled by LLMs. This means that yes, unfortunately, if you were to fine-tune an LLM with all of your writing and instant messaging conversations, the LLM would totally be able to replicate your style. It’s the same principle of why you can easily detect if your wife is angry at you from a one-line SMS text. Another interesting example is how people are starting to develop parasocial relationships with AIs chatbot, sometimes with dire consequences.
The second time we experienced telepathy, Paolo and I were having drinks, again, but this time at my place. We were discussing how R and I have English as our love-layer, our first example. Similarly, Paolo was sharing how he and his friends were able to speak a version of Spanish that was specific to the few blocks of Mexico City where he grew up in. There was not a single phrase that triggered us but similar to our first telepathic encounter, we reached exactly the same conclusion*: “language is crazy and it has so much depth!”* Which then devolved into*, “Wow, language IS thought”, and eventually morphed into, “… and that’s why LLMs work so well!” *Call it what you want, either the ramblings of two friends, or simply, a narrative vehicle to hook you into reading this essay. However, it is true. We both thought about it and we both knew that the other person was thinking it. For a second time, our shared understanding of reality, the content we were discussing and, unfortunately, our jobs as Product Managers working with LLMs, led us to exactly the same train of thought. This was the second telepathic moment and we really wish everyone to experience this level of semantic connection.
We allowed our thoughts to wander for a bit. Despite not having whiteboards to prove it, our minds were blown a few more times that night. What if language is just one of the many substrates that can describe some aspect of reality? As we already saw, math can do it quite well. But what about code? What about sounds, music, light, images and colors? Each substrate is a kind of representation which describes, at different levels, a different aspect of reality. The vibes are totally real. However, we will leave these exploding thoughts for some other time. For now, we can just hope that this essay has been able to telepathically transmit to you, dear reader, our core idea: that LLMs work because they have been able to model language, one of the core components of our thoughts, in all of its richness and depth.
Seal of Organic, AI-free human thinking. We vouch that we did not use AI to write the content of this essay. As tools, we did use LLMs to do research, circle ideas and perhaps find the perfect word for a sentence. Ultimately, that’s what LLMs are great at! However, we believe that our thoughts and ideas are what make us humans, and offboarding all the thinking process to an LLM completely devoid of it. Plus, it’s you, the human writer / reader that gives it meaning vs. the 5 page essay that ChatGPT could have written with a simple prompt.In this Nature paper, the author argues that writing is literally thinking. Similarly, here is a seminar that helped us craft this sentence.
Who are we and what are our credentials to dare us write this?
Dragana is a Technical Product Manager at Amazon working on the Kindle Direct Publishing Team. She is currently building a zero to one LLM-powered product. She is also a bit of a polyglot, speaking 4 languages, a crystal clear thinker, and six-pager menace writer.
Paolo sits at the intersection of math & business. He once described himself as a datamancer, an AI-maximalist and an electronic music enthusiast. He is currently a GenAI Subject Matter Expert at Amazon, where he is empowering a very large org. with AI tools. He’s also surprisingly persistent at getting us to finish this essay.