The numbers are staggering. To train a frontier-scale large language model requires thousands of GPUs running in parallel, billions of parameters finely tuned, and trillions of tokens drawn from the expanse of the internet. And even then, what is achieved is only an approximation of what a child can do with a handful of sensory-motor episodes and the steady draw of ten watts of energy from eating a bowl of cereal.
This discrepancy is not just an engineering issue, it is an ontological clue. The brute force of scale is needed because something essential is missing. The large language model must compensate with size and energy for the lack of grounding.
In humans, salience is not calculated but lived. What matters stands out because of mood, need, and context. Learning is motivated, not the result of exhaustive exposure to every possible sequence of words. Language arises within practices such as building, playing, persuading, and caring that are already world-embedded. A child doesn’t encounter words as tokens but as signs situated in activities and relationships.
Transformers, by contrast, compute salience statistically. They learn blindly from oceans of decontextualized text. Their language is generated without the background of a world. This is why they are so resource-hungry, they must cover every pattern because they lack the structures that give relevance in advance. Where human intelligence is selective and efficient, machine intelligence in this form is exhaustive and wasteful.
So while transformers are brilliant mimics, they are deeply inefficient approximators of meaning. Their energy demands are not a sign of power alone but a symptom of absence. They reveal that fluency without grounding is costly, both in computation and in understanding. If intelligence is to move beyond mimicry, it will not come from multiplying GPUs, but from finding a way to give machines a share in what humans already have, which is a world that makes sense before a single token is ever predicted.
This is where World Mind points in a different direction. Instead of compensating for the absence of grounding with ever more scale, it asks how machines might be given structures of salience, care, and world from the start. Efficiency will not come from bigger engines but from finding the right ground. That is the challenge ahead, and the promise of a new paradigm for AI.
