The Limits of AI

How Language describes the World

May 14, 2026

Can we use systems that master an understanding of language to build intelligent systems? Training LLMs to predict the next word given a piece of text, has allowed us to miraculously capture some kind of intelligence into a language prediction system.

But what is such a system capable of exactly?

Since we use language to describe the world in a formalised way, it seems plausible that systems possessing some kind of understanding of how our language works must be able to use it to perform some kind of 'linguistic reasoning', write down a long chain of thoughts that continually checks themselves, and produce new output that genuinely describes how the world works in ways that can be validated and reproduced.

What is Language? And how does it describe the World?

But what is language?

We use language to communicate and to describe facts about the world. It feels as though sentences can capture a piece of reality the way an image or a map can do that. By capturing some kind of structural relation of the thing it represents.

Ancient cultures like the Greeks and Egyptians took this idea very seriously. They believed there was an objective world out there — Plato's allegory of the cave is the most famous expression of this — and that language could capture its structure the way an image or a map captures a place. They revered language as a kind of magical medium through which the order and structure of the world became understandable and shareable.

The Ancient Greeks had a word that captured how they perceived this: logos. It is difficult to translate. It does not mean only 'word' It can also mean 'speech', 'reason', 'pattern', 'order', and 'intelligibility'. They used it to refer to the principle by which the world becomes ordered, understandable, and intelligible to us through language.

Wittgenstein's Intuition on Language

The philosopher Ludwig Wittgenstein (1889 - 1951) was obsessed with working out the way language captures such a description of the world in a more formal way.

In his early work, the Tractatus Logico-Philosophicus, he attempted to show the relationship between language, thought, and the world. His idea was that every meaningful series of words must describe some kind of abstract statement about the world. And that language can then be used in a consistent and coherent way to relate these different statements, or show how one statement follows from another.

This is how many of us intuitively come to think about the nature of language.

And if this were true, it would mean that an LLM could use language, as a series of symbols obeying logical rules, to reason about the world in an entirely consistent and objective manner. Given enough time and compute, it might even derive new truths from text alone, and then explain them back to us in language we already understand.

Language as use

But as Wittgenstein continued working on this idea, he found that this intuitive, simple, and clean picture of language as an objective representation of the world fails on closer inspection. The meaning of a word does not cleanly map onto one single well-defined underlying logical essence.

In his later work, especially the Philosophical Investigations, Wittgenstein took his own early picture apart from the inside.

Take a simple example: "Pick out the red object." At first, this seems straightforward. The word red appears to refer to a colour.

But how does a word like red get its meaning?

The intuitive answer is that red refers to a private inner experience. But Wittgenstein kept pushing: how do I check, the next time I use the word, that I am still using it in the 'correct' way?

Not by discovering a private mental object called redness. We learned what the word refers to from other people, through training, repetition, correction, and shared judgement. They point to things. Others agree or disagree. Over time, the way the community uses the word becomes stable enough to use it within the community, and be understood.

Language, Wittgenstein argued, is not a mirror of the world in any simple sense. Language is a human activity.

Language works, not because we all have the same inner experiences of the world in our mind. To understand a word, is not attaching it to an inner image. Being able to use language is much more nuanced than simply agreeing on dictionary meanings.

We agree, most of the time, on how to use a word within the different contexts in which our society uses it. To understand a word is to know how to use it among a group of people who share a similar way of living. Language works because we share ways of reacting, grouping, correcting, and responding.

Many philosophical questions and debates arise, Wittgenstein thought, because we wander away from the contexts in which our words have their use, and we mistake the resulting confusion for intellectual depth.

For example: we notice that soul is a noun, and we start asking where it is located. We notice that consciousness names something and we start asking how much of it a system has.

We can construct grammatically correct sentences that resemble well-formed questions — what was there before the Big Bang?, or why is there something rather than nothing? — without checking whether these words still mean anything once they are detached from any practice that normally give them meaning.

Philosophy, Wittgenstein wrote, is a battle against the bewitchment of our intelligence by our language.

Consequences for Artificial Intelligence

All of this has direct consequences for how we think about AI systems, and about large language models (LLMs) in particular.

Language is not something we use to record our intelligence on paper in well-defined statements. Our intelligence (whatever it is that goes on in our head) is something we try to express to other people outside of our own private experience, through language (a medium that is itself not perfectly well-defined).

Does training an LLM on language allow an AI system to manipulate and generate language in a manner that is useful to the community of people that uses that language?

Yes — but this also raises a series of more subtle questions:

1. Can LLMs perform 'cognitive' actions that are advanced and useful?

Yes, and often surprisingly well. The textual fragments that these models are trained on encode how concepts relate to each other, how arguments are built, how we reason, how and in what ways we connect different topics. A system that masters these patterns can use them in a coherent enough manner that it can be used to provide useful cognitive work in a way that used to require human intelligence.

2. Can we solve the problem of hallucinations by training them on more text?

Not entirely. LLMs take part in the manipulation of language without inhabiting the world it points to. From this very limited perspective, severed from embodiment, and perception, it is difficult to understand the often subtle and contradictory nuances and practices within human language.

A well-known example: a user asks an LLM-based agent 'what is the distance to the nearest car wash?' The agent searches the web and concludes it is only 1km away. The user then asks 'should I walk or should I drive?', and the agent replies that the distance is short enough to walk — completely missing the point that there is no use in going to the car wash without the car.

As a result, LLMs still often generate text that is fluent but wrong, or correct but missing the point. More training data sharpens the patterns that are stored in these models, but through text-based data they inherit not only our concepts and definitions but also our metaphors and our confusions.

As long as these models do not also have access to the same amount of data that humans do, there may always be aspects of meaning that text alone cannot fully convey. Humans possess a wealth of additional data through the lived experience that grounds their understanding of language in the world that our language refers to.

Multimodal systems may eventually reduce this gap by training on more channels of input, giving them access to more data, and getting a closer understanding of the human experience on which language is based. But we will still need to ground these models in our linguistic understanding of the world in order to communicate with them at all.

3. Can LLMs come up with new truths from text alone?

Probably at best in a very limited and narrowly defined way. Recall that the meaning of language derives from a community that uses words consistently. A single LLM has no such community to disagree with it about whether a new word is being applied correctly. Generating new knowledge, and the new language to describe it, is a collective project. LLMs can rearrange existing language in surprising ways, and they may turn out to be powerful instruments inside such a community.

But extending language and knowledge itself remains something that requires more than one single system. This is an extremely interesting topic, and we will come back to this in the next sections.

Language as a boundary of our Knowledge

This does not make LLMs unimpressive. On the contrary, it makes it more interesting. Artificial intelligence reveals something strange about us: much of what we call thinking is deeply entangled with our use of language, and the picture of the world that it implies to us.

We used to think language was a tool used by intelligence. But LLMs show us that language is more like an environments in which intelligence forms.

As humans, we are creatures born into speech, our understanding of the world shaped by stories. As early hominins we were surrounded by primitive speech patterns. As our capacity for intelligence grew, our capacity to form more complex language evolved with us as well.At every point in our modern history as a species we inhabited a world already filled by speech and stories that shaped our understanding of it. Slowly we collectively corrected our speech patterns to reflect the world as best as possible, while natural selection slowly enabled us as a species to use more complex language patterns that captured more complex relationships about the world in more efficient ways. This evolution of our collective intelligence as a species operates within language.

The LLM is not a mind. It is more like a mirror made of our own language. And when we look into it, we see the strange and messy machinery of how intelligence and language co-evolved looking back at us. LLMs cannot step outside language and grasp the essence of reality in a purely objective way. But then again, neither do we.

Continue reading:Generating New Knowledge with AI