Why Transformers Work: See Heidegger, Not Descartes

AI researchers have inherited more from Descartes than they might realize. The Cartesian worldview divides reality into two realms, the external world of objects and the internal realm of ideas. In this picture, language is simply a mapping device, a way to attach words to ideas in the mind, which themselves represent objects out there in the world. This representational picture has shaped centuries of thought in philosophy, linguistics, and cognitive science. It implies that for genuine intelligence to arise a system must build internal models of the external world, and that language would be secondary, little more than a label pasted onto those models. If that were true then training a system on text alone, without access to perception or embodiment, should not get us very far. At best it would produce a…

The Irony of AI Fear

We’ve all seen predictions by various luminaries that superintelligent AI will hunt us down, enslave us, or wipe us out. The imagery is apocalyptic, machines turning against their creators. But let’s pause for a moment and ask, why assume that intelligence would carry such motives at all? 1. Projection of Human Drives Humans kill not simply because they are intelligent, but because their intelligence is bound up with survival instincts, aggressive impulses, tribal competition, and scarcity. These drives shape how intelligence expresses itself. Fear of AI often projects these same traits onto machines, imagining that any powerful intelligence must also be predatory. 2. Intelligence Without Instinct Yet intelligence by itself does not entail hostility. A chess engine only “wants” to win in the sense that we’ve programmed an artificial drive to do so. Likewise,…

Why Transformers Speak but Don’t Understand

Large language models astonish us with their fluency. Ask them to explain a concept, write an essay, or carry on a conversation, and the words flow in ways that feel remarkably human. Yet the very same systems that impress us can also produce absurd mistakes, such as posing Aristotle as Galileo’s student, or recommending sunscreen for a rainy day. Why is that? The answer lies in how transformers, the architecture behind today’s AI chatbots, actually work. At their core is the Query-Key-Value (QKV) mechanism of self-attention. Each word (or token) in a sentence reaches out as a query to all the others, compares itself against their keys, and pulls in their values to shape its own meaning. Multiply this process across dozens of attention heads, and you get a dynamic web where every word…

How Language and World are the Same Web of Meaning

We usually think of “world” and “language” as two very different things: the world is everything out there, and language is how we talk about it. But Heidegger turned this picture upside down. For him, the world is not just a pile of objects, it is the web of significance in which those objects matter to us at all. And language is not a set of labels pasted onto things, it is the way this web becomes articulated and shared. Both world and language are relational at their core in that they draw meaning from how things connect with one another, not from how they stand in isolation. This is precisely the insight that today’s transformer models accidentally confirm, because their success comes from simulating relationality rather than representing objects. When a transformer processes…

Why Large Language Models Will Never Think

Large language models are astonishing machines. They can generate essays, write code, summarize books, even carry on conversations that feel uncannily human. They are fluent in language to a degree no one anticipated even a decade ago. And yet, for all their brilliance, they do not think, and they never will. To see why, we need to distinguish fluency from thought. Language models are trained to predict the next most likely word, and through the power of scale and the transformer architecture, they have learned to do this with breathtaking accuracy. This is what gives them the appearance of intelligence. But appearances can mislead. Fluency in words is not the same as understanding the world those words describe. A model can talk about hammers and nails, but it has no orientation toward building or…

AI Safety is Out of Control

AI safety is the most urgent conversation in the field today. Companies publish safety charters, researchers debate alignment strategies, governments scramble to regulate. But most of what passes for “safety” is only treating symptoms, not causes. The lesson of hallucination should have made this clear by now: the very architecture of the transformer produces the problem. Large language models hallucinate not because they are broken, but because they are doing exactly what they were designed to do, which is predict the next token with statistical coherence. Fluency is achieved, but understanding is not. Sooner or later, coherence runs ahead of grounding and produces statements that sound right but have no anchor in truth. No patch can fix this, because the absence of world is not a bug, it is the structural condition of the…

Scaling Has Reached Its Limit Exactly in Coherence

The marvel of large language models is their coherence. Ask them a question, and they respond with sentences that flow, paragraphs that hold together, and arguments that appear structured. Coherence has been the mark of their success. But coherence is also the sign of their limit. Why? Because coherence is not the same as truth, or understanding, or thought. It is only the surface quality of language holding together. By scaling models with billions of parameters and trillions of tokens we have perfected coherence to an extraordinary degree. And yet, it is precisely at this perfection that the gap reveals itself. The model can speak endlessly, but it cannot ground what it says. It cannot inhabit a world. Humans are coherent not because we have seen every possible token sequence, but because we live…

How Did Philosophy Become as Polarized as Our Politics?

While Being and Time is approaching its hundredth anniversary, there is still a reason why most scientists, and especially AI researchers, continue to think in a Cartesian mindset rather than in terms of how we actually experience the world. The reason has less to do with Heidegger’s difficulty and more to do with the way philosophy itself became polarized in the twentieth century. Philosophy split into two major branches in the late nineteenth century. One branch followed Nietzsche into what came to be known as Continental Philosophy. The other followed Frege into what became Analytic Philosophy. To see the divergence, you could just look at the titles of their respective works: Nietzsche’s Thus Spoke Zarathustra is an allegorical narrative that overturns inherited traditions, while Frege’s Foundations of Arithmetic is a rigorous attempt to ground…

The Forgotten Oracle – Hubert Dreyfus and the First AI Winter

In the mid-1960s, at the very moment when artificial intelligence was first being celebrated as the future of science, one voice stood apart. It was the voice of Hubert Dreyfus, a young philosopher at MIT. While his colleagues in computer science were predicting human-level intelligence within a generation, Dreyfus warned that their assumptions were fatally flawed. Dreyfus was not a programmer, but he had something that few in the AI community possessed, a deep understanding of philosophy, especially phenomenology. He had studied Heidegger and Merleau-Ponty, and he saw that the reigning picture of intelligence, as the manipulation of internal symbols, governed by rules, was blind to the actual conditions of human understanding. For Dreyfus, the crucial insight was that thinking is not a matter of detached computation. It is a way of being-in-the-world. This…

The Huge Energy Costs of LLMs Reveals an Absence of Grounding

The numbers are staggering. To train a frontier-scale large language model requires thousands of GPUs running in parallel, billions of parameters finely tuned, and trillions of tokens drawn from the expanse of the internet. And even then, what is achieved is only an approximation of what a child can do with a handful of sensory-motor episodes and the steady draw of ten watts of energy from eating a bowl of cereal. This discrepancy is not just an engineering issue, it is an ontological clue. The brute force of scale is needed because something essential is missing. The large language model must compensate with size and energy for the lack of grounding. In humans, salience is not calculated but lived. What matters stands out because of mood, need, and context. Learning is motivated, not the…

What Does Heideggerian Ontology Have To Do With Transformers?

Transformers work because they accidentally approximated an ontological structure. At the heart of every large language model lies the self-attention mechanism. Each word, or token, gets to “attend” to every other word in the sequence, weighted by relevance. This is what allows a transformer to capture context and coherence with such uncanny fluency. But looked at from a Heideggerian perspective, what the transformer is really doing is not logical deduction but something closer to disclosure, that is, meaning does not arise in isolation but within a field of significance where some elements solicit our attention more than others. For human beings, this field is structured by mood, care, and readiness-to-hand. A hammer shows up as meaningful not because of its shape or material, but because it belongs to the activity of building, repairing, dwelling.…

Why Philosophy is Useless and Yet Matters for AI

Philosophy is often accused of being useless, and in a certain sense that’s true. Philosophy doesn’t build bridges, cure diseases, or put rockets on the moon. It doesn’t provide grounded methods for solving practical problems. In fact, whenever philosophy does discover a ground, it hands that ground over to science. Physics, chemistry, biology, history, ethics, these were all once philosophy until they found stable ground and methods and became sciences in their own right. Philosophy remains only with the questions that resist such grounding. And this is exactly why it matters for artificial intelligence. AI is one of those questions. We know how to make machines fluent in language, we know how to scale models with data and compute, but we don’t know what thinking is, or what it would take to realize it…

What is Ontology?

When people in AI talk about “ontology,” they usually mean something technical, such as an organized chart of entities and categories, like a knowledge graph that specifies how “doctor” relates to “hospital” or “patient.” In computer science, an ontology is a taxonomy: a way to structure data so machines can reason across it. Useful, yes, but this is a very thin sense of the term. It treats meaning as if it could be fully captured in tidy hierarchies of objects and properties. Philosophy asks something deeper. Ontology, at its core, is the study of being. It does not just catalog what exists but asks the more fundamental question, which is what does it mean for something to be at all? Why do things show up as meaningful in the first place? Ontology is not…

What is World Mind?

The pursuit of artificial general intelligence needs something like a paradigm shift. There is a growing widespread consensus that large language models have hit a wall. The AI labs behind the frontier models all assumed, or at least hoped, that the transformer architecture of generative AI was the necessary foundation, and that scaling with more data and more compute would eventually unlock superintelligence. It turns out, however, that the transformer was never going to be the architecture to get us there. I don’t mean this only in the empirical sense, but in a deeper, ontological one. The transformer is a remarkable breakthrough, yet little reflection has gone into clarifying what it actually accomplished. What it gave us is language fluency and a shallow kind of causal reasoning, impressive achievements, to be sure, but still…

No more posts to load