LLM, VLA and WLM the evolution of AI to understand the physical world

LLM, VLA and WLM: Speaking Was Easy — Understanding the World Is Not

Over the past few weeks, almost unintentionally and between Christmas lunches and dinners, I’ve found myself reading quite a lot about robotics and artificial intelligence. Not as a way to disconnect from the tech world, but rather as a way to keep a pulse on something that seems to be moving beneath the usual noise.

And there’s one feeling that keeps coming back, again and again.

We’re not facing that grand, headline-grabbing moment of “this will change everything tomorrow.” We’re at a much more interesting — and also more uncomfortable — point: realizing that, in a few years, we probably won’t recognize what today feels completely normal.

Much of that feeling revolves around three acronyms that, until recently, barely featured in my vocabulary: LLM, VLA, and WLM.

When language is no longer enough

LLMs are already familiar to us. They’ve changed how we interact with technology and turned natural language into an almost universal interface. But when we try to take this intelligence out of software and into the physical world, problems begin to appear.

For years, we’ve had extremely precise robots operating in controlled environments. On an assembly line, they work beautifully. The problem arises when the environment stops being predictable.

Outside the factory, nothing is exactly where you expect it to be. Conditions change, humans give imprecise instructions, and context matters far more than it seems. Programming a robot to cover all those variations simply isn’t realistic.

That’s why we’re still surrounded by brilliant robots in factories… but none that are particularly useful in everyday environments. It’s not a question of power. It’s a question of understanding.

The first leap: starting to understand the environment

This is where VLA models come into play. Put very simply, the idea is that the robot doesn’t just execute an instruction, but combines what it sees, what it’s told, and what it’s capable of doing.

Not to react mechanically, but to adapt — at least minimally — to what’s in front of it.

If it learns how to pick up a cup, it can infer how to pick up a glass. If it knows how to open one door, it can deal with another that isn’t exactly the same. Not because it thinks like a human, but because it has seen enough examples to generalize.

This has led many companies to start taking robotics outside hyper-controlled environments seriously. Not so much to build more spectacular robots, but because of the possibility that they might begin to operate with some degree of fluency in an imperfect world.

It’s very similar to what happens when we talk about automation in complex infrastructures: the winner isn’t the most sophisticated system, but the one that adapts best to reality. It’s an idea we’ve already explored on the blog when discussing automation and critical infrastructures.

Understanding the task is not understanding the world

Over time, a fairly clear limit started to emerge.

A robot can see an object, understand an instruction, and know which action to execute… and still fail. Because it understands the task, but not how the world behaves.

It doesn’t anticipate what happens when it applies a certain force, how a liquid moves, what balance implies, or the consequences of a small mistake. And in real systems, that kind of ignorance isn’t trivial. It’s exactly the kind of thing that causes failures.

Teaching machines how reality works

This is where WLMs — so-called world models — start to make sense.

The idea isn’t to program rigid rules, but to give the machine an internal representation of how things work. A kind of statistical intuition about the physical behavior of its environment.

It’s the difference between knowing the rules and understanding the situation. Something that, broadly speaking, also happens in Data Center operations: having data and procedures isn’t enough if you don’t understand the context in which they’re applied.

We touched on this before when talking about the gap between data and real decision-making: How your Data Center changes when data takes control.

Expectations, reality, and that uncomfortable phase

When people talk about all this, the narrative quickly jumps to big promises. Domestic robots, automated hospitals, fully flexible factories.

Reality, as usual, is far less epic.

The technology exists, the demos work, but bringing all of this into stable production is still hard. Nothing new for anyone who’s worked with complex systems. We see it constantly in critical infrastructures, where the gap between ideal design and real operation is enormous: The biggest risk in a Data Center: operations.

Why all of this makes me cautious… and curious

Maybe that’s why this evolution gives me a strange mix of caution and fascination. Caution, because history has taught us that these transitions are slow, difficult, and painful in the last mile.

And fascination, because for the first time in a long while, the focus seems to be in the right place. Not on making systems faster or stronger, but on making systems that better understand the environment they operate in.

And, as almost always, we end up looking at the Data Center

Perhaps that’s why this evolution sparks a curious blend of caution and curiosity. Caution, because these transitions are usually slow and difficult. Curiosity, because for the first time in a long time, the focus seems well placed: understanding context before acting.

And here an inevitable question arises.
If this more versatile robotics matures, could it become a real foundation for more automated and truly remotely manageable Data Center operations? Could it help us understand complex systems before intervening in them?

Nothing closed, nothing definitive (for now)

I don’t know how all of this will end. I don’t think anyone truly does, even if some speak with great confidence.

What does seem clear is that something is changing in how we try to make machines relate to the world. It’s no longer enough for them to respond correctly. They need to understand context and consequences.

I don’t know whether this will turn out to be a quiet revolution or a well-told bubble.
But I’d rather watch it closely, calmly and with a healthy dose of skepticism, than ignore it and realize too late that the change has already happened.

in Technology

The true end-of-year gift for a Data Center

Activa el dato

Usa el dato

ebook

tutoriales

informativos

webinar