Skip to Content

From PUE to AI per Watt: how the Data Center industry learned to measure what it produces, not what it consumes

21 May 2026 by
From PUE to AI per Watt: how the Data Center industry learned to measure what it produces, not what it consumes
Oscar Rojas Badilla

For years, the conversation around efficiency in Data Centers was centered on one very specific question: how much energy was being lost before reaching computing systems.

It was a logical concern. In the early 2000s, a large portion of a data center’s energy consumption was dedicated to cooling, power distribution, and auxiliary systems. Infrastructure was complex, expensive to operate, and, above all, difficult to measure consistently.

In that context, the emergence of PUE (Power Usage Effectiveness) marked a turning point for the industry.


The beginning of everything, PUE


There was a time when the most important question in a Data Center was how much energy was lost before reaching the servers.


It was a reasonable question. The digital infrastructure buildings of the 1990s and early 2000s were huge, clumsy thermal machines: oversized cooling systems, backup generators consuming power while idle, transformers with losses nobody accounted for. For every watt that reached a server, another watt or more evaporated along the way. No one measured it because no one had defined how to do it.


In 2006, The Green Grid, a consortium of technology companies, published a metric that changed the conversation. It was called PUE, Power Usage Effectiveness. The formula was simple: the total energy entering the building divided by the energy effectively reaching the IT equipment. A PUE of 2.0 meant that for every useful watt, another watt was spent on overhead. A PUE of 1.0 was the impossible ideal: a building with no losses.



It was a genuine breakthrough. It gave the industry a common language, a way to compare facilities, and competitive pressure toward efficiency. Hyperscalers began publishing their PUEs as engineering credentials. Google reached 1.08 in some of its Data Centers. The global average dropped from 2.5 to 1.58 in less than two decades.


PUE worked. And that is precisely why it is no longer enough.


A metric that measures the container but ignores the content


To understand why PUE aged, it is necessary to understand the world it was designed for.


In 2006, a server was simply a server. It ran business applications, databases, email systems. It consumed between 200 and 400 watts. Workloads were relatively predictable, rack densities were manageable, and the variation between workloads, in energy terms, was modest. In that context, building inefficiency was the dominant problem. If you managed to get more energy to the servers, you made the system more efficient. Straightforward logic.


What PUE could not anticipate was the question nobody had formulated yet: what are those servers actually doing with the energy they receive?


For years, that question did not matter much because the answer was more or less uniform. A server processing invoices and a server sending emails consumed similar amounts of energy and generated value that was difficult to compare, but nobody attempted to measure that. Efficiency belonged to the building, not to the workload.


The problem appeared when the nature of the workload changed.


The replacement that changes everything


Starting in 2016, with the first wave of deep learning, and explosively from 2022 onward with large language models, Data Centers began hosting a radically different type of workload. GPUs replaced CPUs as the dominant computing unit for AI. A rack that once consumed 10 kilowatts began consuming 40, then 100, then — with NVIDIA’s Blackwell architecture in 2025 — 140 kilowatts. Energy density multiplied by a factor of ten in less than a decade, accelerating a transformation that is already redefining the evolution of Data Centers


In this new context, a Data Center with a PUE of 1.1 hosting 140 kW racks consumes amounts of energy that would have seemed absurd in 2006. The building is efficient. The scale is different.


But the real disruption was not how much power the equipment consumes. It was how much it produces.


A GPU running AI inference generates measurable output: tokens, responses, completed inferences. That output has direct economic value — it is what the Data Center customer is actually buying. And that output varies enormously depending on hardware, software, and configuration. Two facilities with identical PUEs can produce radically different amounts of intelligence using the same energy.


PUE cannot see what is being produced. For PUE, a watt reaching the server is a watt well used, regardless of whether it produces anything useful or not.


The industry has already identified it and is naming it clearly


In December 2025, Schneider Electric published an analysis that articulated the limitation precisely: a facility with excellent PUE running poorly optimized AI workloads wastes more energy per unit of value produced than a facility with mediocre PUE running efficient workloads. The building metric and the business metric had stopped correlating.​


The alternative the industry began adopting had a direct name: tokens per watt. How many units of AI output the system produces for every watt of power consumed. Not the theoretical peak under laboratory conditions. Real-world performance, under real workloads, including all inefficiencies along the way.


The metric had the advantage of being understandable to non-engineers. A token is a unit of generated textapproximately three-quarters of an English word. Tokens per watt is, essentially, how much thought each unit of energy produces. A formula that connects physics with economics without intermediate steps.


In March 2026, Jensen Huang made that metric the centerpiece of his keynote at GTC in San José, California. The formula he presented to CEOs and investors was deliberately simple:



He did not talk about hardware specifications. He talked about energy as a scarce resource and efficiency as a revenue multiplier. It was the moment when a technical metric became business language.


The timing was not accidental. By 2026, energy had ceased to be a predictable operating cost and had become the strategic bottleneck of the sector, something that is already beginning to define the real limit of artificial intelligence.


Electricity demand from AI-oriented Data Centers increased by 50% in 2025 alone, according to the IEA report Key Questions on Energy and AI published that year. The high-voltage transformers required to power new facilities have delivery lead times of two to three years. In several regions across Europe and the United States, utilities have paused new grid connections or conditioned them on multi-year waiting periods.


In that context, the operational question stopped being “how do I build more capacity?” and became “how do I extract more value from the energy budget I already have?” And that question only has a useful answer if you know how to measure how much value you are producing per watt.


The contrast between hardware generations illustrates the scale of the change. According to analyses by NVIDIA and SemiAnalysis using real benchmark data, GPUs based on the Blackwell architecture produce more than 50 times more tokens per watt than the Hopper generation from just two years earlier.


In practical terms: the same energy budget can now support fifty times more inference capacity than in 2023, if the right hardware is used.


For Data Center operators, this changes the argument for hardware renewal. It is no longer just about more capacity in the same space. It is about more intelligence per watt — and in a world where watts are the scarce resource, that is the metric that determines who can grow and who cannot.


From the building to the workload: three decades of learning


The evolution of efficiency metrics in digital infrastructure is not just a technical story. It is a story about the questions an industry dares to ask itself.


In the 1990s, the question was: is it running? Availability was the only criterion. A Data Center that worked was considered efficient, regardless of how much energy it consumed.


In the 2000s, with PUE, the question became: how much is lost along the way? It was real progress. It focused attention on support infrastructure, cooling, electrical distribution, backup systems, and generated two decades of genuine improvements in building efficiency.


In the 2010s, under sustainability pressure, the question expanded: where does that energy come from? Renewable versus non-renewable energy entered corporate reporting. Carbon-per-watt metrics emerged. Companies began signing agreements with renewable energy producers to clean their footprint.


In the 2020s, with AI as the dominant workload, the nature of the question changed again: what does that energy produce? Not the building, not the source, but the output. Tokens per watt is the industry’s answer to that question.


Each transition expanded the perimeter of what was measured. Each expansion revealed that the previous perimeter was insufficient. And in every case, the industry took between five and ten years to massively adopt the new metric — not because it was difficult to understand, but because it changed who had to be accountable for what.


PUE placed responsibility on the building operator. Tokens per watt places it on the workload operator.


It is a shift with contractual, organizational, and commercial consequences that the industry is still processing, especially in environments moving toward more autonomous operating models.


These are the concrete movements that process implies today:


   Incorporating tokens per watt as an operational KPI. Not as an aspirational marketing metric, but as an indicator measured, reported, and managed with the same seriousness as PUE or availability.


   Separating the conversation about the building from the conversation about the workload. PUE remains valid for measuring support infrastructure efficiency. But it now needs a counterpart that measures what happens inside the servers.


   Revising commercial agreements. If you sell or buy inference capacity, efficiency per watt should be part of the contract. It is the metric that connects technical performance with real energy cost and customer value delivered.


   Evaluating hardware renewals with the correct metric. Acquisition cost per unit of compute is no longer the central argument. Real inference efficiency, measured in tokens per watt, can justify renewals that traditional analysis would reject.


   Preparing the conversation with customers. AI infrastructure users are going to start asking about this metric. Operators who can measure it and explain it will have an advantage in that conversation.


   Looking at AI agents as demand multipliers. AI workflows chaining multiple reasoning processes generate far more operations per query than a simple response. Efficiency per watt becomes critical when workload volume per user grows non-linearly.


   Communicating the metric upward within the organization. Tokens per watt is not just a technical indicator. It is a business argument: how much AI productivity we obtain with the available energy budget. Translating that is the responsibility of those operating the infrastructure.


Closing


PUE was an honest tool for the problem that existed when it was designed. It measured what needed to be measured, generated the improvements it could generate, and served the industry for twenty years.


What changed is not that PUE was wrong. What changed was the problem.


When energy was abundant and predictable, measuring the building made sense. When energy became the sector’s scarce resource and workloads became measurable intelligence per unit, the question had to evolve.


From how much is lost along the way to how much is produced at the end.


From the building to the workload. From the container to the content. From the watt that enters to the thought that comes out.


That is the story of three decades of metrics in Data Centers. And the metric that defines the sector today is, at its core, the most honest version of a question that has always been there: what is all that energy actually for?