Our eighth generation TPUs: two chips for the agentic era

184

xxnx about 4 hours ago 104 commentsRead Article on blog.google

FR version is available. Content is displayed in original English for accuracy.

https://cloud.google.com/blog/products/compute/tpu-8t-and-tp...

⚡ Community Insights

Discussion Sentiment

67% Positive

Analyzed from 3284 words in the discussion.

Discussion (104 Comments)Read Original on HackerNews

himata4113•about 2 hours ago

I already felt that gemini 3 proved what is possible if you train a model for efficiency. If I had to guess the pro and flash variants are 5x to 10x smaller than opus and gpt-5 class models.

They produce drastically lower amount of tokens to solve a problem, but they haven't seem to have put enough effort into refinining their reasoning and execution as they produce broken toolcalls and generally struggle with 'agentic' tasks, but for raw problem solving without tools or search they match opus and gpt while presumably being a fraction of the size.

I feel like google will surprise everyone with a model that will be an entire generation beyond SOTA at some point in time once they go from prototyping to making a model that's not a preview model anymore. All models up till now feel like they're just prototypes that were pushed to GA just so they have something to show to investors and to integrate into their suite as a proof of concept.

onlyrealcuzzo•about 2 hours ago

> They produce drastically lower amount of tokens to solve a problem, but they haven't seem to have put enough effort into refinining their reasoning and execution as they produce broken toolcalls and generally struggle with 'agentic' tasks, but for raw problem solving without tools or search they match opus and gpt while presumably being a fraction of the size.

Agreed, Gemini-cli is terrible compared to CC and even Codex.

But Google is clearly prioritizing to have the best AI to augment and/or replace traditional search. That's their bread and butter. They'll be in a far better place to monetize that than anyone else. They've got a 1B+ user lead on anyone - and even adding in all LLMs together, they still probably have more query volume than everyone else put together.

I hope they start prioritizing Gemini-cli, as I think they'd force a lot more competition into the space.

JeremyNT•about 1 hour ago

> Agreed, Gemini-cli is terrible compared to CC and even Codex.

Using it with opencode I don't find the actual model to cause worse results with tool calling versus Opus/GPT. This could be a harness problem more than a model problem?

I do prefer the overall results with GPT 5.4, which seems to catch more bugs in reviews that Gemini misses and produce cleaner code overall.

(And no, I can't quantify any of that, just "vibes" based)

asah•about 1 hour ago

also, for incorporating into gsuite, youtube, maps, gcp and their other winning apps and behind-the-scenes infra...

Iulioh•33 minutes ago

Not only that, google has an advange because they don't need to always generate a response.

When a lot of people ask the same thing they can just index the questions, like a results on the search engine and recalculate it only so often,

UncleOxidant•23 minutes ago

IIRC when Gemini 3 Pro came out it was considered to be just about on par with whatever version of Claude was out then (4?). Now Gemini 3 is looking long in the tooth. Considering how many Chinese models have been released since then, and at least 2 or 3 versions of Claude, it's starting to look like Google is kind of sitting still here. Maybe you're right and they'll surprise us soon with a large step improvement over what they currently have. Note: I do realize that there's been a Gemini 3.1 release, but it didn't seem like a noticeable change from 3.

ALLTaken•about 2 hours ago

My friend at google calmly shared having had access to GPT type AI 5 years before, but internally only. They deemed it too powerful to release to.. I'm adding too powerful to release.."I'll add to plebs like us"

This experience makes me believe they have highly advanced AI internally and see no reason and have no will sharing. OpenAI and Claude FORCED them to release what they can, just to stay relevant.

The TPU's are damn awesome and I would love to fab them in small for myself. But it's fully closed sourced I'm afraid. Also Google is known to hate the customer, more or less.

neonstatic•about 1 hour ago

I'm really struggling with terrible bloating today, but I deemed it too dangerous to release.

_boffin_•about 1 hour ago

Is your friend on the JAX team?

pmb•about 3 hours ago

At this point, when you are doing big AI you basically have to buy it from NVidia or rent it from Google. And Google can design their chips and engine and systems in a whole-datacenter context, centralizing some aspects that are impossible for chip vendors to centralize, so I suspect that when things get really big, Google's systems will always be more cost-efficient.

(disclosure: I am long GOOG, for this and a few other reasons)

Keyframe•about 3 hours ago

As others have been capturing news cycle eyes, seems to me Google has been going from strength to strength quietly in the background capturing consumer market share and without much (any?) infrastructure problems considering they're so vertically integrated in AI since day one? At one point they even seemed like a lost cause, but they're like a tide.. just growing all around.

boringg•about 3 hours ago

Do they really have consumer market share? I assume that their summaries on google search is what they are using for their usage numbers. Its a great access point for lower quality AI needs.

It's kind of impressive how much they dropped the ball in the google brain era and earlier AI run up to see them be able to fight back.

baq•about 3 hours ago

you've never tried to use gemini 3 I guess - that thing was so unreliable it might as well not be offered; there's also a reason why everybody here is excited for claude and codex, but not really for antigravity.

that said, I actually agree: google IMHO silently dominates the 'normie business' chatbot area. gemini is low key great for day to day stuff.

WarmWash•about 2 hours ago

Whats interesting to note, as someone who uses Gemini, ChatGPT, and Claude, is that Gemini consistently uses drastically fewer tokens than the other two. It seems like gemini is where it is because it has a much smaller thinking budget.

It's hard to reconcile this because Google likely has the most compute and at the lowest cost, so why aren't they gassing the hell out of inference compute like the other two? Maybe all the other services they provide are too heavy? Maybe they are trying to be more training heavy? I don't know, but it's interesting to see.

magicalhippo•about 2 hours ago

I've been trying Gemini Pro using their $20-ish Goole One subscription for a couple of months, and I also find it consistently does fewer web searches to verify information than say ChatGPT 5.4 Pro which I have through work.

I was planning on comparing them on coding but I didn't get the Gemini VSCode add-in to work so yeah, no dice.

The Android and web app is also riddled with bugs, including ones that makes you lose your chat history from the threads if you switch between them, not cool.

I'll be cancelling my Google One subscription this month.

WarmWash•about 2 hours ago

I don't sweat sources and almost never check them. I usually prefer to manually check information after it's provided, to prevent the model from borking it's context trying to find sources that justify it's already computed output. Almost all the knowledge is already baked into the latent space of the model, so citing sources generally is a backwards process.

I see it like going to the doctor and asking them to cite sources for everything they tell me. It would be ridiculous and totally make a mess of the visit. I much prefer just taking what the doctor said on the whole, and then verifying it myself afterwards.

Obviously there is a lot of nuance here, areas with sparse information and certainly things that exist post knowledge cut-off. But if I am researching cell structure, I'm not going to muck up my context making it dig for sources for things that are certainly already optimal in the latent space.

someguyiguess•about 2 hours ago

They have to have SOME competitive advantage. What reason is there to use Gemini over Claude or ChatGPT? It's not producing nearly the quality of output.

WarmWash•about 2 hours ago

I recently did my taxes using all three models (My return is ~50 pages, much more than a standard 1040).

GPT (codex) was accurate on the first run and took 12 minutes

Gemini (antigravity) missed 1 value because it didn't load the full 1099 pdf (the laziness), but corrected it when prompted. However it only spent 2 minutes on the task.

Claude (CC) made all manner of mistakes after waiting overnight for it to finish because it hit my limit before doing so. However claude did the best on the next step of actually filing out the pdf forms, but it ended up not mattering.

Ultimately I used gemini in chrome to fill out the forms (freefillableforms.com), but frankly it would have been faster to manually do it copying from the spreadsheets GPT and Gemini output.

I also use anti-gravity a lot for small greenfield projects(<5k LOC). I don't notice a difference between gemini and claude, outside usage limits. Besides that I mostly use gemini for it's math and engineering capabilities.

magicalhippo•about 2 hours ago

Well comparing Gemini 3.1 Pro vs ChatGPT 5.4 Pro, it's much faster at replying. Of course, if it actually thinks less then that helps a lot towards that. For most of my personal and work use-cases, I prefer waiting a bit longer for a better answer.

RationPhantoms•about 2 hours ago

They just released their enterprise agentic platform today so my expectation is that might be the gravity well for the Fortune 500's to park their inference on.

fulafel•about 3 hours ago

"TPU 8t and TPU 8i deliver up to two times better performance-per-watt over the previous generation" sounds impressive especially as the previous generation is so recent (2025).

Interesting that there's separate inference and training focused hardware. Do companies using NV hardware also use different hardware for each task or is their compute more fungible?

burnte•2 minutes ago

> Interesting that there's separate inference and training focused hardware. Do companies using NV hardware also use different hardware for each task or is their compute more fungible?

Dedicated hardware will usually be faster, which is why as certain things mature, they go from being complicated and expensive to being cheap and plentiful in $1 chips. This tells me Google has a much better grasp on their stack than people building on NVidia, because Google owns everything from the keyboard to the silicon. They've iterated so much they understand how to separate out different functions that compete with each other for resources.

mrlongroots•about 1 hour ago

That training is compute-bound and inference is memory-bound is well-known, but I don't think Nvidia deployments typically specialize for one vs the other.

One reason is that most clouds/neoclouds don't own workloads, and want fungibility. Given that you're spending a lot on H200s and what not it's good to also spend on the networking to make sure you can sell them to all kinds of customers. The Grok LPU in Vera Rubin is an inference-specific accelerator, and Cerebras is also inference-optimized so specialization is starting to happen.

electroly•about 1 hour ago

I can't answer for NVIDIA but AWS has its own training and inference chips, and word on the street is the inference chips are too weak, so some companies are running inference on the training chips.

zozbot234•about 2 hours ago

The "training" chips will probably be quite usable for slower, higher-throughput inference at scale. I expect that to be quite popular eventually for non-time-sensitive uses.

dataking•about 3 hours ago

Vera Rubin will have Groq chips focused on fast inference so it points toward a trend. Also, with energy needs so high, why not reach for every feasible optimization?

xnx•about 3 hours ago

Nvidia said in March that they're working on specialized inference hardware, but they don't have any right now. You can do inference from Nvidia's current hardware offerings, but it's not as efficient.

FuriouslyAdrift•about 2 hours ago

AMD has been doing inference chips for many years and are the leader for HPC.

https://www.amd.com/en/products/accelerators/instinct.html

TheMrZZ•about 3 hours ago

> A single TPU 8t superpod now scales to 9,600 chips and two petabytes of shared high bandwidth memory, with double the interchip bandwidth of the previous generation. This architecture delivers 121 ExaFlops of compute and allows the most complex models to leverage a single, massive pool of memory.

This seems impressive. I don't know much about the space, so maybe it's not actually that great, but from my POV it looks like a competitive advantage for Google.

kamranjon•about 2 hours ago

It's interesting that, of the large inference providers, Google has one of the most inconvenient policies around model deprecation. They deprecate models exactly 1 year after releasing them and force you to move onto their next generation of models. I had assumed, because they are using their own silicon, that they would actually be able to offer better stability, but the opposite seems to be true. Their rate limiting is also much stricter than OpenAI for example. I wonder how much of this is related to these TPU's, vs just strange policy decisions.

gordonhart•about 2 hours ago

It's frustrating how cavalier they are about killing old Gemini releases. My read is that once a new model is serving >90% of volume, which happens pretty quickly as most tools will just run the latest+greatest model, the standard Google cost/benefit analysis is applied and the old thing is unceremoniously switched off. It's actually surprising that they recently extended the EOL date for Gemini 2.5. Google has never been a particularly customer-obsessed company...

surajrmal•about 2 hours ago

What benefit is there to sticking on older models? If the API is the same, what are the switching costs?

kamranjon•about 1 hour ago

Consistency, new models don't behave the same on every task as their predecessors. So you end up building pipelines that rely on specific behavior, but now you find that the new model performs worse with regards to a specific task you were performing, or just behaves differently and needs prompt adjustments. They also can fundamentally change the default model settings during new releases, for example Gemini 2.5 models had completely different behavior with regards to temperature settings than previous models. It just creates a moving target that you constantly have to adjust and rework instead of providing a platform that you and by extension your users can rely on. Other providers have much longer deprecation windows, so they must at least understand this frustration.

gordonhart•about 1 hour ago

If you're trying to run repeatable workflows, stability from not changing the model can outweigh the benefits of a smarter new model.

The cost can also change dramatically: on top of the higher token costs for Gemini Pro ($1.25/mtok input for 2.5 versus $2/mtok input for 3.1), the newer release also tokenizes images and PDF pages less efficiently by default (>2x token usage per image/page) so you end up paying much much more per request on the newer model.

These are somewhat niche concerns that don't apply to most chat or agentic coding use cases, but they're very real and account for some portion of the traffic that still flows to older Gemini releases.

akelly•about 1 hour ago

I've heard GenAI.mil still has Gemini 2.5 only.

gordonhart•about 1 hour ago

Wouldn't surprise me. The best model you can get on AWS GovCloud is still Claude Sonnet 4.5.

jbellis•about 2 hours ago

Flash 2 isn't even at EOL until June but we started seeing ~90% error rates getting 429s over the weekend. (So we switched to GPT 5.4 nano.)

nsteel•about 3 hours ago

This link has more on the architecture: https://cloud.google.com/blog/products/compute/tpu-8t-and-tp...

ks2048•41 minutes ago

For how many times does this article mentions "agentic" and "agents"... Am I correct assume the hardware has nothing to do with "agents"? I assume it's just about a new generation of more efficient transformers / deep-learning layers.

cmptrnerd6•about 2 hours ago

Which company is building the silicon for Google? Is it tsmc? What node size? I didn't see it with a quick search, sorry if it was in the post.

wina•about 2 hours ago

tsmc through broadcom

geremiiah•38 minutes ago

TPUs are systolic arrays right? So does that mean that Google is using a hetreogenous cluster compromising both GPUs and TPUs, for workloads that don't map well or at all on TPUs?

amazingamazing•about 3 hours ago

If ai ends up having a winner I struggle to see how it doesn’t end with Google winning because they own the entire stack, or Apple because they will have deployed the most potentially AI capable edge sites.

jjice•about 1 hour ago

I've been saying it, and I'll keep saying it (as someone who has an opinion backed by very little) - I think Google is incredibly well placed for the future with LLMs.

Owning your hardware and your entire stack is huge, especially these days with so much demand. Long term, I think they end up doing very well. People clowned so hard on Google for the first two years (until Gemini 2.5 or 3) because it wasn't as good as OpenAI or Anthropic's models, but Google just looked so good for the long game.

Another benefit for them: If LLMs end up being a huge bubble that end up not paying the absurd returns the industry expects, they're not kaput. They already own so many markets that this is just an additional thing for them, where as the big AI only labs are probably fucked.

All that said: what the hell do I know? Who knows how all of this will play out. I just think Google has a great foundation underneath them that'll help them build and not topple over.

nickandbro•about 3 hours ago

I am curious what workloads Citadel Securities is running on these TPUs? Are you telling me they need the latest TPUs for market insights?

iandanforth•about 2 hours ago

Anyone know if these are already powering all of Gemini services, some of them, or none yet? It's hard to tell if this will result in improvements in speed, lower costs, etc, or if those will be invisible, or have already happened.

Aissen•about 1 hour ago

Interesting that t8i is both for post-training and inference.

zshn25•about 2 hours ago

It would be interesting to benchmark a short training / inference run on the latest of TPU vs. NVIDIA GPU per cost basis

paulmist•about 3 hours ago

At $15/GB of HBM4 the 331.8TB of HBM4 per pod is 5 million...

zozbot234•about 2 hours ago

$15/GB is retail price for DIMM sticks. Is HBM4 really that cheap?

selectodude•about 2 hours ago

HBM is just DRAM stacked directly next to the die. The expensive part is gluing it on there. The chips themselves are pretty much the same.

akelly•about 1 hour ago

HBM uses about twice as much DRAM silicon per GB due to all the space for interconnect

nsteel•about 3 hours ago

It's HBM3e

aliljet•about 3 hours ago

The real problem is that scientists doing this sort of early work more often than not want to burn hardware under their desks. Renting infrastructure in Google cloud isn't the only way...

NoiseBert69•about 3 hours ago

That cooling system looks crazy. What an unbelievable density.

jmyeet•about 2 hours ago

In recent discussions about Tim Apple [sic] moving on there was a discussion about whether Apple flopped on AI, which is my opinion. Of course you had the false dichotomy of doing nothing or burning money faster than the US military like OpenAI does.

IMHO that happy medium is Google. Not having to pay the NVidia tax will likely be a huge competitive advantage. And nobody builds data centers as cost-effectively as Google. It's kind of crazy to be talking ExaFLOPS and Tb/s here. From some quick Googling:

- The first MegaFLOPS CPU was in 1964

- A Cray supercomputer hit GigaFLOPS in 1988 with workstations hitting it in the 1990s. Consumer CPUs I think hit this around 1999 with the Pentium 3 at 1GHz+;

- It was the 2010s before we saw off-the-shelf TFLOPS;

- It was only last year where a single chip hit PetaFLOPS. I see the IBM Roadrunner hit this in 2008 but that was ~13,000 CPUs so...

Obviously this is near 10,000 TPUs to get to ~121 EFLOPS (FP4 admittedly) but that's still an astounding number. IT means each one is doing ~12 PFLOPS (FP4).

I saw a claim that Claude Mythos cost ~$10B to train. I personally believe Google can (or soon will be able to) do this for an order of magnitude less at least.

I would love to know the true cost/token of Claude, ChatGPT and Gemini. I think you'll find Google has a massive cost advantage here.

someguyiguess•about 2 hours ago

Apple has not flopped on AI as you say. They are just focused on privacy and are likely waiting for the time when local models become efficient enough to run on iPhones (which is quickly becoming a reality).

Google could probably train models for orders of magnitude less money as you say, but they aren't. They are not capable of creating high quality models like OpenAI and Anthropic are. Their company is just too disorganized and chaotic.

Anecdotally, I don't know a single person who uses Gemini on purpose.

jmyeet•about 2 hours ago

The "waiting for local LLMs" came up re: Apple and IMHO that's too passive for company where if someone else has a better AI assistant, it's going to be a huge problem.

What if somebody cracks the problem if splitting inference between local and remote? What if someone else manages so modularize learning so your local LLM doesn't need to have been trained on how to compute integrals? Obviously we can't disect a current LLM and say "we can remove these weights because they do math" but there's no guarantee there isn't an architecture that will allow for that.

Apple could also be training an LLM Siri 2.0 that knows enough to do the things you want. Setting alarms, sending messages, etc. Apple would have all the information on what the major use cases are and where Siri is currently failing. They can increase Siri's capabilities as local LLM inference improves.

As for Google creating high quality models, I personally believe the models are going to be commoditized. I don't believe a single company is going to have a model "moat" to sustain itself as a trillion dollar company. I base two reasons for this:

1. At the end of the day, it's just software and software is infinitely reproducible and distributable. I mean we already saw one significant Anthropic leak this year; and

2. China is going to make sure we're not all dependent on one US tech company who "owns" AI. DeepSeek was just the first shot across the bow for that. It's going to be too important to China's national security for that not to happen.

And OpenAI's entire funding is predicated on that happening and OpenAI "winning".

knowaveragejoe•about 2 hours ago

> I saw a claim that Claude Mythos cost ~$10B to train.

Can you cite this? That seems absurd.

jmyeet•about 2 hours ago

I've seen various claims to this (eg [1][2][3]) but nobody reall knows. These may all come from one uunsubstantiated claim. It is I think widely accepted that Mythos is ~10T parameters.

I've seen figures that suggest GPT-4 was 1.8T parameters and cost upwards of $100 million to train (also unsubstantiated), in which case the Mythos figure might be inflated and also include development costs.

So who really knows?

[1]: https://www.softwarereviews.com/research/claude-mythos-previ...

[2]: https://x.com/duttasomrattwt/status/2041903600516133016

[3]: https://www.forrester.com/blogs/project-glasswing-the-10-con...

vibe42•about 3 hours ago

The pics of the cooling system is pretty good sci-fi / cyberpunk / steampunk inspo.

If the whole AI bubble spectularly collapes, at least we got a lot of cool pics of custom hardware!

NitpickLawyer•about 3 hours ago

> If the whole AI bubble spectularly collapes

Every other news for the past month has been about lacking capacity. Everyone is having scaling issues with more demand than they can cover. Anthropic has been struggling for a few months, especially visible when EU tz is still up and US east coast comes online. Everything grinds to a halt. MS has been pausing new subscriptions for gh Copilot, also because a lack of capacity. And yet people are still on bubble this, collapse that? I don't get it. Is it becoming a meme? Are people seriously seeing something I don't? For the past 3 years models have kept on improving, capabilities have gone from toy to actually working, and there's no sign of stopping. It's weird.

vibe42•about 2 hours ago

Both are possible; increasing demand and bubble collapse.

The way this could happen is if model commoditization increases - e.g. some AI labs keep publishing large open models that increasingly close the gap to the closed frontier models.

Also, if consumer hardware keep getting better and models get so good that most people can get most of their usage satisfied by smaller models running on their laptop, they won't pay a ton for large frontier models.

hgoel•about 2 hours ago

There's a massive amount of demand at the current price point, this does not exclude a bubble considering that the current cost to consumers is lower than what capacity expansion costs.

Though nowadays it feels like the bubble is going to end up being mainly an OpenAI issue. The others are at least vaguely trying to balance expansion with revenue, without counting on inventing a computer god.

nicman23•about 2 hours ago

yeah but can you release the sdk for the pixel 10? it was one of then only reasons which i bought this mid phone

SecretDreams•about 2 hours ago

They are missing a header to show the transition in discussion from TPU8t to 8i!

Thanks for posting otherwise.

Edit: actually, looks like the header got captured as a figure caption on accident.

varispeed•about 2 hours ago

I can't help but think we will be "laughing" at this in 10 years time like we laugh at steam engines or abacus.