Technology · Analysis
What are the latest developments in AI chips and computing infrastructure?
The AI chip market is rapidly evolving with specialized processors designed for training and inference, while data centers face unprecedented power and cooling demands that are reshaping energy infrastructure.
Stake & Paper Editorial TeamApril 27, 2026
What Are the Latest Developments in AI Chips and Computing Infrastructure?
The latest developments in AI chips center on specialization: Google has announced the eighth generation of its Tensor Processing Units (TPUs), which for the first time includes two distinct chips—the TPU 8t for training and the TPU 8i for inference—engineered specifically for the agentic era.
NVIDIA has launched its Rubin platform with six new chips designed to deliver unprecedented efficiency and performance, with Rubin-based products becoming available from major cloud providers in the second half of 2026.
Meta has agreed to use Amazon's general-purpose Graviton chips in a multiyear deal, while also signing deals worth a combined $48 billion with CoreWeave and Nebius for GPU access.
These developments reflect a fundamental shift: the industry is moving away from one-size-fits-all processors toward specialized chips optimized for specific AI workloads.
Key Points
-
Every major cloud player is either developing or buying AI chips, confirming that demand is here to stay.
-
Custom silicon is taking the lead in AI, offering better performance, efficiency, and cost for specific tasks.
-
The monolithic GPU as the default AI compute platform is ending; the diversity of AI workloads has made specialization economically attractive at scale, with chiplet architectures allowing companies to mix and match compute, memory, and I/O components from different sources.
-
As density rises, power availability—not compute—is emerging as the limiting factor, with operators increasingly relying on on-site generation to accelerate deployment.
-
Global electricity consumption for data centers is projected to double to reach around 945 TWh by 2030, with electricity consumption in accelerated servers projected to grow by 30% annually, accounting for almost half of the net increase in global data center electricity consumption.
Understanding AI Chip Architecture
AI chips differ fundamentally in how they process information.
Graphics Processing Units (GPUs), originally developed for gaming and graphics applications, have become indispensable for AI workloads due to their ability to perform numerous parallel computations, making them ideal for processing large datasets in applications such as image and video analysis, natural language processing, and machine learning model training, and they are widely available, cost-effective, and relatively easy to program.
Application-Specific Integrated Circuits (ASICs) are custom-designed chips tailored for specific AI tasks, prioritizing speed and energy efficiency, and unlike GPUs which offer general-purpose parallel processing, ASICs are optimized for particular workloads such as running deep learning inference models or powering edge AI devices, enabling them to achieve unmatched performance and efficiency for large-scale deployments.
A GPU is a "general-purpose" parallel processor while a TPU is a "domain-specific" architecture; a TPU strips away architectural baggage and uses a unique architecture called a Systolic Array, where data flows through the chip like blood through a heart.
Because of its systolic array, a TPU drastically reduces the number of memory reads and writes required from high-bandwidth memory.
How It Works
The evolution of AI infrastructure involves three key developments:
- Chip Specialization:
The TPU 8t is specifically designed for high-throughput AI workloads, delivering nearly 3x higher compute performance than previous generations to shrink training timelines for massive models.
The TPU 8i is engineered to deliver ultra-low latency required by agentic workflows, tripling on-chip SRAM to 384 MB and increasing high-bandwidth memory to 288 GB to break the memory wall.
System-Level Integration:
To meet the critical needs of the largest AI challenges, companies are redesigning every layer of their cloud infrastructure stack, composing multiple new approaches across silicon, servers, networks and datacenters, leading to advancements where software and hardware are optimized as one purpose-built system.
Infrastructure Scaling:
With Virgo Network and TPU 8t, companies can connect 134,000 TPUs into a single fabric in a single data center, and connect more than one million TPUs across multiple data center sites into a training cluster—essentially transforming globally distributed infrastructure into one seamless supercomputer.
Why It Matters
The shift toward specialized AI chips has profound implications for energy infrastructure and computing economics.
AI data centers use 60+ kilowatts of power per server rack, whereas more standard data centers typically use 5-10 kilowatts per rack.
As density rises, power availability—not compute—is emerging as the limiting factor, with operators increasingly relying on on-site generation to accelerate deployment.
Goldman Sachs Research forecasts global power demand from data centers will increase 50% by 2027 and by as much as 165% by the end of the decade compared with 2023.
This unprecedented demand is forcing a fundamental redesign of how data centers are built and powered.
Unlike traditional phased buildouts, many AI campuses are now deployed in large increments with infrastructure and compute coming online in tighter synchronization; the traditional data center model is under strain as AI is not just increasing demand—it is changing the shape of that demand, introducing new constraints in power, cooling, and time.
Related Terms
Tensor Processing Unit (TPU):
A custom chip specifically designed for Google's TensorFlow framework, a symbolic math library used for machine learning applications such as neural networks.
Graphics Processing Unit (GPU): A processor originally designed for graphics rendering that excels at parallel processing, making it well-suited for AI training and inference workloads.
Application-Specific Integrated Circuit (ASIC):
A custom-designed chip tailored for specific AI tasks, prioritizing speed and energy efficiency, optimized for particular workloads such as running deep learning inference models.
Inference:
The ongoing usage of models, or what happens after users submit prompts.
Agentic AI:
A system where a single intent triggers a chain reaction, with a primary AI agent decomposing goals into specific tasks for a fleet of specialized agents that then collaborate, preserve state, and use reinforcement learning to deliver outcomes in real-time.
Frequently Asked Questions
Why are companies developing custom AI chips instead of just using GPUs?
Custom silicon is taking the lead in AI because it offers better performance, efficiency, and cost for specific tasks.
TPUs are anywhere from 25%-30% better to close to 2x better than GPUs depending on the use cases, reflecting the difference between a very custom design built to do one task perfectly versus a more general purpose design.
What is the biggest challenge facing AI infrastructure expansion?
Power availability—not compute—is emerging as the limiting factor.
Data center supply has been constrained over the past 18 months due to the inability of utilities to expand transmission capacity because of permitting delays, supply chain bottlenecks, and infrastructure that is both costly and time-intensive to upgrade.
How much water do AI data centers consume?
In 2023, the country's data centers directly consumed about 17 billion gallons of water—with hyperscale and colocation facilities using the lion's share (84%).
Each 100-word AI prompt is estimated to use roughly one bottle of water (or 519 milliliters), and while this may not sound like much, billions of AI users worldwide enter prompts into systems like ChatGPT every minute.
Last updated: April 27, 2026. For the latest energy news and analysis, visit stakeandpaper.com.