DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
78% Positive
Analyzed from 2614 words in the discussion.
Trending Topics
#broadcom#openai#hardware#inference#more#google#memory#taalas#https#makes

Discussion (53 Comments)Read Original on HackerNews
1. https://www.investing.com/news/stock-market-news/openai-unve...
There are a lot of large tech companies that most of HN has never heard about that completely dominate entire segments.
Q2 is forecasted to be negative, partly because of RAM prices like you said, but for the most part this is something that only price sensitive nerds care about. Broadcom sells a ton of server chips. Server sales are up 30% vs last year so I highly doubt they're desperate to use their allocation
Broadcom has become wealthy by being Google's TPU hardware partner, including sharing their TSMC capacity with Google, and evidently now they are doing the same thing with OpenAI. What a brilliant way to take advantage of the AI gold rush!
I wish they weren't using their piles of money to extort money out of the software industry like they are with VMWare and Bitnami.
https://finance.yahoo.com/sectors/technology/articles/broadc...
Oh dear god. I'm actually feeling sorry for Google at that point. Good luck, you'll need it...
Kinda, but not exactly.
Broadcom cornered the enterprise infra and security market in the late 2010s and early 2020s after acquiring CA Technologies, BMC (EDIT: Did NOT acquire them, they were considering it back in 2018 but decided against it and KKR ended up acquiring them), Symantec (which they bought instead of BMC), and VMWare and were able to make a strong cybersecurity story during the late 2010s cybersecurity and SaaS boom.
That gave them plenty of cashflow that helped subsidize their hardware business when hardware was not viewed as hot as it is today.
Additionally, Broadcom is GCP's marquee customer and has been for a little under a decade so they were able to make a sweetheart deal where all that software businesses at Broadcom would be exclusively using GCP and in return GCP would working with Broadcom to design it's silicon and source infra needed for their DC buildouts.
Ironically, the DoJ blocking Broadcom's acquisition of Qualcomm was the best thing it ever could have done for Broadcom, because it gave Broadcom the dry powder to dominate the Enterprise SaaS and build a strong niche in the cybersecurity space.
> piles of money to extort money out of the software industry
From personal experience, executives and leadership who started off in the electronics and hardware industry are much more vicious and cutthroat than their peers who started in software.
Working in an industry that historically had to deal with high commodification, low margins, and long tail sales leads to leadership that can execute. Additionally, no one climbs the leadership ladder without having spent years as a line-level engineer, but that's true for software as well to an extent.
Edit: can't reply
> Did they acquire also BMC?
Nope.
Broadcom was considering acquiring them in 2018 but decided not to go through with the opportunity and KKR jumped in.
> From personal experience, executives and leadership who started off in the electronics and hardware industry are much more vicious and cutthroat than their peers who started in software.
Only The Paranoid Survive is quite a name for a management book. It implies surviving in the world you are speaking about.
[0] https://www.goodreads.com/book/show/66863.Only_the_Paranoid_...
However, based off first impressions, it seems like this is meant for inference side, and not training, which is also an interesting choice.
Nvidia is king of general purpose training chips. But inferences can be specialized.
Yes? That’s why more money will be spent on inference than training?
I’m talking absolute cost. As the number of people using AI and burning tokens goes up the amount of spend on inference goes up.
I am fairly confident that Anthropic has way way more GPUs serving Claude Code to users than they have training models. They’ve got a lot of users!!
> API price is becoming more important than SOTA capability.
Also yes? This is why custom silicon for efficient inference makes sense!
I think we’re in total agreement here :)
We're starting to see what really matters here, and though this is hand wavy the TPU makes similar claims.
I think googles memo about having no moat still stands (see: https://newsletter.semianalysis.com/p/google-we-have-no-moat... if you are unaware). It kind of makes sense that all of this is looking more like 60's to 90's IBM, DEC, Cray, Sun and the hardware race that happened then. History doesn't repeat but it often rhymes and I suspect that these efforts will follow the same trajectory.
So after the IPO and will be featured heavily in the IPO sales brochure as a future promise?
I'm sceptical over any pre-IPO announcements.
IIRC their biggest cost they're "hiding" in their financials by doing creative accounting is inference (putting it into marketing and whatnot, in the billions)... if they can't hide it in their S-1 then they have to rationalize it, either by a) increasing the prices (not gonna happen, with token based billing orgs are already watching their codex spends) or b) lowering the inference costs. You can lower that by "soft optimizing" (dumbing down) your models but then you have the other players breathing down your neck (see quick rise of Claude), or actually optimizing, in software and in hardware. We're like 5 years into the rise of LLMs, there's not THAT much left on the table unless you write to the metal you specifically designed for your models (and I'm pretty sure the lack of "nvidia tax" would help with covering most of the r&d costs of a custom solution, at least in the long term).
50% cheaper inference without losses in fidelity would unquestionably be a massive win for OpenAI.
No, the nonprofit org stays nonprofit, while the for-profit org it owns will become publically traded.
See https://openai.com/index/evolving-our-structure/
I want a super fast LLM that is Opus 4.6+, like, in ability.
For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth.
[1] https://taalas.com/products/
I know, it's nick picking, but when people can just reach in and take services away, like Fable/Mythos, hardware is the only thing worth buying.
https://chatjimmy.ai
Cerebras etch memory onto the wafer alongside the processing elements, but AFAIK OpenAI are going to be using HBM memory and a conventional chiplet design.
Cerebras are addressing very specific use cases, not general purpose LLM serving, and OpenAI does already partner with them.
That sentence sounds weird to me. I can't really put my finger on why, maybe the combination of adverbs, or just the fact of writing the desire of scaling as a company so directly. It feels (to me) like openly claiming their selfish goals. Or maybe I am just misinterpreting and they are referring to the whole humanity as "We" (but knowing Broadcom and in a lesser extent OpenAI doings, I am not convinced).
Imagine when we can roar along at that speed, low power. Can just have the model reason for a while about anything and everything. It reminds me of the "race to idle" for mcus etc.
It's odd to me that I haven't heard anything about this approach (baking LLMs/weights into silicon directly) since. It seems almost common-sense that we're going to end up there eventually. And it feels like that point is drawing ever closer now that model capabilities, if not quite plateauing out, are at least getting to a "good enough" point for a LOT of use cases.
I wonder if it's being worked on in secret, if there's something about it that makes it infeasible, or if companies are really too nervous to lock in one model like that because the next one down the line could be a huge improvement. Re. infeasability, I have heard that the Taalas demonstration chip ran Llama 3.1 8B (a pretty horrible model) and that even that took a massive amount of transistors / die area. So it might just be the case that the good models are too big to fit on silicon?
Taalas has a running demo here: https://chatjimmy.ai/
It's eye opening: generated an AVX-512 optimized Mersenne Twister in C in 0.076s, 13,706 tok/s. Too fast for the tok/s to be terribly accurate.
The studies and efforts are ongoing and public, and there are technical hurdles to be faced - but the relevant works go back in time quite a lot and there is heightened interest in it now.
It seems that you simply took the "hyped headlines" for the whole of the work.
Well, yeah, that's what I'm saying. It's odd that there haven't been any major headlines (customer interest, competitors' announcements, etc) other than their initial demo. Good to hear it's being worked on though!
It has only been four months since they unveiled their first prototype. I don't understand your confusion. Chip development does not happen overnight...?
Their initial blog post laid out a roadmap, so theoretically they should have another thing to demonstrate this summer.
I guess that makes sense. Is this feasible, or does the added latency between chips kill any of the performance gains?
Make sure you all use that fancy ñ
9 months to production is completely impossible anyway.
9 months from design to early samples is probably impossible given than TSMC takes 3 months after tape out to produce them. Then it’s up to the customer to qualify and revise for production. TSMC doesn’t do that.
There’s no AI that makes this happen in 9 months.
"Jalapeño" is such a bad name, having an "ñ" already makes it difficult and annoying to deal with in so many little ways. Good luck with that.
But also, theres the sort of "yes lets use Mexican related things because we're California" thought that I just really hate. I don't know, its like corporate Memphis to me. You see a product like this, you know it's an uppity califonia based firm that came up with it.
Jalapeño
Jalapeño
Really has a… ring to it