DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
57% Positive
Analyzed from 927 words in the discussion.
Trending Topics
#training#story#fans#circuits#post#article#little#gpu#need#solder

Discussion (35 Comments)Read Original on HackerNews
It cost about £190 in 2006.
Now we have GPUs that are in tens of thousands of pounds with insane performance, but what would their price be without the AI and Datacentre squeeze?
I'm air cooling so I set -pl 450 so I'm not running them all at the full 600w
Hint: when you have a piece of metal stuck with thermal goop to a lot of components, the force doesn’t “concentrate” on one of them. You need to detach it from each one with however much force is needed to detach it from that component.
The story is interesting but it’s hard to read because it’s hard to tell which parts are meaningful and which parts are filler.
E.g. “we pulled the card cold - straight from the rig to the workbench”. Okay, but why would going straight from the rig to the workbench make it cold? If anything it would be warm. But it turns out the temperature is meaningful in your story.
Those are SM120 so no tmem/tcgen05 and lack of support in main libraries (it's like everybody is focusing on B300/SM100).
For that money I'd buy a single B300, similar total AI TOPS, similar GPU bandwidth aggregated, and only 25% less total memory (probably saved in less implementation complexity), half the energy consumption...
Also by having all SMs local they have the special L1-level interconnect. SMs can collaborate on the same GEMM. And a bunch of other nice features.
Or, you know, rent it.
Something went wrong in manufacturing. The solder should have wicked to cover the entire pad, not just a small square, and there should be no (brown) discoloration.
Edit: reading fail on my part, nothing to see here.
> With 18× 140 mm of surface, the fans run quietly and the coolant Δt across the rads stays small
Signed, IPC-610 certified tech.
I picked up on it too, this wouldn't have been something difficult to share but it's far too verbose to be a real person's words in this way.
The phrasing is very claude like:
"That cracked joint is the whole story. The card had passed initial bring-up and ran fine at light loads for a week."
"That sequencing matters — it’s why we have a story to tell. The pilot card failed, taught us a lesson, and the lesson is the reason the other three went on without incident."
"Driver swaps, CUDA reinstalls, and inference-engine theories were dead ends I spent hours on. The failure pattern itself told the story — listen to it earlier."