Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

71% Positive

Analyzed from 4578 words in the discussion.

Trending Topics

#rtx#more#llm#run#still#power#server#running#don#lot

Discussion (119 Comments)Read Original on HackerNews

freediddyabout 3 hours ago
In the last year, I have bought an M3 Ultra Mac Studio with 512 GB, a Macbook Pro M5 MAX with 128 GB and an RTX 6000 Pro. I have spent around $25k so far, not including electricity. I figured worst case scenario I can sell them in the next year and only take a haircut as opposed to losing my entire investment.

In comparison to just spending for tokens, the tokens would have been much cheaper and much much faster. I've been running against Gemma4:31b, Qwen3.5 and 3.6, and getting local LLMs to solve AMC 8/10 math questions and it's about 10-100x slower than just doing it online. When I tried it with ChatGPT late last year, it took about one night and $25 to solve about 1000 questions. Using my RTX 6000 and M3 Ultra and Gemma4:31b on both, it answered about 40 questions in 7 hours and I haven't checked how good the answer is yet. At 800 watts (600 for RTX and 200 for M3 Ultra) and running for 7 hours, it solved around 40 questions.

At the very least I'm going to try to sell my M3 Ultra if I can find a reliable place to sell it without getting ripped off by scammers.

tpurves10 minutes ago
>> find a reliable place to sell it without getting ripped off by scammers.

This is a real problem and why I've just about given up on ebay or fb marketplace, esp for computers. If you are in Canada though sellit9.com is a great solution to having to deal with sketchy buyers.

jon-woodabout 3 hours ago
I’m not usually one to ask this because learning to do a thing can be fun, but why exactly have you spent 25 thousand dollars on getting an LLM someone else made to answer maths exam questions?
nickthegreekabout 3 hours ago
The cost is obviously not that big of factor for OP as it might be for others. It's actually refreshing to hear the candid viewpoint that he expresses here.
freediddyabout 2 hours ago
25k is definitely a lot but I did the risk analysis and I figured worst case I would lose a 1000-2000 after a year of playing around with it, so I look at it more like renting (I'm going to keep the Macbook Pro no matter what since I needed a new one).
hnuser123456about 3 hours ago
Privacy and offline operation are valuable or non-negotiable in some cases, but the difference is pretty categorical between what can run on a single card and what can run on a DGX GB200 NVL72 cabinet. Doesn't mean it's not worth seeing how far local models can be pushed. Not every problem needs a senior engineer.
freediddyabout 3 hours ago
It's just a project I'm working on. I'm working on projects where AIs are processing and classifying large amounts of data that would be a lot of work for humans to do.
wutwutwatabout 2 hours ago
I think of LLMs as being well equipped for handling dynamic data or adapting to unforeseen circumstances well (random code requests, website's ever changing layouts, typos, non-standard formatting in docs, groking out important info, etc), but math problems are be definition a very specific set of instructions to run, so is the overhead and "thinking" aspect of a LLM/AI even needed here? I'm genuinely curious, btw, I'm not asking sarcastically. Can't these math problems just be yanked from some test file and rapid fired directly at a gpu/compute unit?
iwontberude30 minutes ago
I’ve spent twice that on hosting movies and tv for Plex, so… I think they are worthy of my praise. What a healthy outlet for money.
root_axis22 minutes ago
You spent 50k for plex hosting? Why so expensive?
Retricabout 3 hours ago
That hardware is costing him ~1$/hour over 3 years. Presumably having it answer math questions was a tiny fraction of what he was using it for.
mountainriver30 minutes ago
Running LLMs on Macs is still terribly slow. They simply lack the optimizations other platforms have.

An RTX 6000 pro Blackwell is a pretty good card

speedgoose24 minutes ago
A M3 ultra mac Studio can run models that do not fit in similarly priced computers with multiple Nvidia GPUs. And it will use a lot less electricity while still having good enough performance. Except the pre-filing perfs that are quite poor on the M3.
bethekindabout 3 hours ago
Which of these has been the most productive for you? Sounds like you've enjoyed the RTX6000 the most?
freediddyabout 3 hours ago
RTX 6000 is some-what obviously my fastest card but my biggest problem with the RT 6000 is the immense heat. The GPU itself is almost 200F and the exhaust from the fans itself is over 150F. I'm worried that my hard drives are going to fail. I was told that the GDDR7 is even hotter than the GPU which is surprising to me.

After my last run, I'm going to wait for the new case I ordered to come in and cannibalize my kid's PC that we built beginning of this year to form an entirely separate computer. And then figure out better ways to deal with the heat, especially with summer coming up. I'll have to play around with undervolting and running vents directly outside my house to see if that helps.

vladgurabout 2 hours ago
From my failed and expensive affair with GPU mining 5 years ago, You can get a great heat dissipation outcome by using an open case with a lot of directed fans at the expense of a bit of dust and lots of noise
ericdabout 2 hours ago
I take it this wasn't the half-wattage Max Q version with blower fan?
LarsDu88about 2 hours ago
Well if it makes you feel better those frontier LLMs are all technically taking a big loss, and they may all be in your shoes after a few years.
arjieabout 3 hours ago
All of these have appreciated in value. How much are you looking for the Ultra?
freediddyabout 2 hours ago
I've seen a lot of sales on eBay for over $20k, but I don't know if I believe it. Plus the lack of seller protection and the prevalence of scams on eBay make me too hesitant to actually want to risk it so I don't know what to do haha
arjieabout 2 hours ago
Haha, yeah, it's about $23k or so. Should be twice the price what you bought it for if you got it last year. Tbh I don't know why. The RAM is large but the bandwidth and the compute isn't nearly enough. You can fit DeepSeek V3 on it quantized but inference is like 10 tok/s. Honestly, you'll be able to sell it locally for that in cash, and I would in your place.

I saw your heat comments about the RTX 6000 Pro as well. I bought a few of them recently and I'm running 2 of them in a 2U case in a colo. You need a lot of active airflow to keep them cool. Mine range from 23 C to 80 C.

ahmadyanabout 1 hour ago
If you are in the bay area, i'm happy to buy that M3 Ultra from you, i've been unsuccessfully looking for one and can't find any.
iooiabout 2 hours ago
I'll buy your macbook if you're trying to get rid of it!
freediddyabout 2 hours ago
I'm keeping that one for sure, I love it!
plasticsopranoabout 3 hours ago
You'll probably make a profit by selling them today. I bought a M1 Max Studio with 64 GB last year off FB Marketplace for $1000 and today I'm seeing numerous 32 GB M1 Maxes for $1200-1500.
freediddyabout 2 hours ago
Yes the prices on eBay for the Mac Studio are all over the place, but I've seen sales for over $20k. I don't know if I believe it but there's enough to make me think if I can sell it for that price it would be worth it, but eBay has basically no seller protection so I'm not willing to take that chance.
CamperBob2about 3 hours ago
How do you use the RTX 6000 with the Macs? Exo? I would think that would be pretty snappy if configured properly.
freediddyabout 3 hours ago
This is on a separate Windows PC, I don't have it integrated with the Macs.
jmyeetabout 3 hours ago
I looked into the M3 Ultra 512GB Mac Studio before it was discontinued and the as best as I could determine it just wasn't worth it... yet. The GFLOPS and memory bandwidth just arne't there even though it can hold a much larger model in memory.

But the trend here is interesting. I think by 2030 you'll be able to buy fairly cheap hardware that is currently $10k+. I don't know what this does to the trillions invested in AI data centers because the next NVidia architecture after Blackwell will essentially half the value of purchased cards overnight.

I'm not convinced Apple has yet pivoted the Mac Studio line towards this market and the expected M5 Ultras in Q3 2026 will likely be an incremental improvement rather than big leap forward but I'd like to be proven wrong.

freediddyabout 2 hours ago
I agree that all these datacenter companies like Coreweave are investing billions in technology that has a very fast depreciation curve and I don't know how they will sustain income. The same goes for datacenters in space, what happens when those chips are obsolete? Will they sent astronauts to replace them or will they let them burn up and send new ones into orbit every year?

I feel that the open weight models pale in comparison to the frontier models, and I believe that if the gap closes quickly, that the open weight vendors will stop releasing it for free.

cpard30 minutes ago
UPDATE: Launch was a success! 400K+ views, and multiple companies reached to use my IP. Read more here

It seems that he managed to get what he wanted from the hardware and I'm happy for them.

He said something interesting at the beginning of his post, he compared the cost of the hardware to the cost of his time based on his FAANG salary. Which is an interesting way to think of this, but the rest of the article didn't make me understand if at the end he did save money/time based compared to just rend on the cloud.

Also, outside of the power cost, hardware has other costs too, you need to operate it, maintain it, set it up, etc. all that require time. I mean, even the process of figuring out if it had a good enough ROI compared to cloud, takes from your time (collecting data, analyzing data, etc etc).

m463about 1 hour ago
Other things people spend "too much money" on:

- muscle cars, with all the stuff, driven occasionally.

- boats, that don't get taken out much

- gamer x, where x=system or laptop or keyboard or mouse or desk or glasses or mousepad or speakers or ... usually with "> too much RGB"

- children

$48k for something constructive even if ai related? no problem, refreshing even.

datadrivenangelabout 3 hours ago
I did the math at least on a Macbook pro, and for inference it's definitely not worth it.

- https://www.williamangel.net/blog/2026/05/17/offline-llm-ene... - Discussion: https://news.ycombinator.com/item?id=48168198

jmyeetabout 3 hours ago
It's comparing laptops to dedicated GPUs in a server environment. The best comparison would be the Mac Studio but the current release is almost 2 years old at this point. We'll see what a likely M5 Ultra Mac Studio looks like, probably in Q3 this year.

But yes, for pure inference, the M5 Max Macbook Pros probably aren't there yet. They have other utility though of course. And you can get 64GB and 128GB MBPs at a discount. Micro Center currently will let you buy a 64GB M5 Max MBP for under $4k currently, for example.

joefourierabout 3 hours ago
Why didn't you take into account batching, input tokens, different costs of electricity, and the fact that a laptop can still hold a decent % of its resale value, and is useful for many other tasks than running an LLM?
bigyabaiabout 3 hours ago
> Why didn't you take into account [...] the fact that a laptop can still hold a decent % of its resale value, and is useful for many other tasks than running an LLM?

Because that wasn't what they claimed to research?

  >> for inference it's definitely not worth it.
It's entirely fine if you enjoy local LLMs on your computer, there are people doing horribly inefficient inference on smartphones now. But for pure inference tasks, it's pretty obvious why M5s and Mac Studios aren't replacing TPUs and GPUs.
joefourierabout 2 hours ago
Who is going to buy a $4299 M5 Max MBP with 64GB of RAM just to run Gemma 4 31b? Firstly you don't need 64GB for that model. Secondly if you want a machine that sits in the corner and does nothing but LLM inference, you don't buy a MacBook Pro, you buy some GPUs which are going to cost you a fraction of that (~$1k for ~64GB of VRAM is possible). The people buying Apple Silicon for inference general aim for the Mac Studios with enormous amounts of RAM (128-512GB), to run very large models.

The idea is obviously to be running the LLM on your work laptop. As a developer I'd need a laptop with 24GB of RAM for work anyway, and 48GB, which is enough for a very good quant of Gemini, is just $400 extra.

rosmineabout 1 hour ago
Hi! Thank you so much for posting this! I got back luck/timing when I tried, so happy it made it to the front page! (I am the author)
4chandailyabout 1 hour ago
I did this with used parts and cheaper consumer cards (3090s) and did much of the same calculations. I found it was way cheaper for me as well.

The main advantage, however, is that the friction of "this is going to cost me in tokens to even try" goes away. I was so much more willing to take chances and try new things on my own hardware than I would have been if I were paying API costs. I feel like this point isn't made clearly enough by those of us who run these absurd self-hosted inference systems.

Thanks for the write up, was a fun read. I spent an order of magnitude less, but I could relate to your story from beginning to end.

Epyc (Milan), 512gb ram, 4x 3090

dekhnabout 3 hours ago
I can't imagine spending $48K on a home GPU server, but I did just splurge and buy a PC with an RTX 5090, specifically to hold the largest models you can fit in 32GB. It's a top of the line PC with water cooled high end CPUs, 64GB RAM, RTX 5090 for $5K. To me the jury is still out whether this was a worthwhile investment, but I do expect to use this machine for a decade. I don't run it at 100% power (it's mostly idle, except for times when I'm training or doing batch inference). It has the nice property of being blackwell generation, similar to the machines we use at work.

It just scares me to own a box that is $48K in my house, especially if it breaks, or gets stolen.

phkahler44 minutes ago
>> To me the jury is still out whether this was a worthwhile investment, but I do expect to use this machine for a decade.

The high cost and power consumption are both signs of the death of Moore's law, so you are probably correct that this system will be near state of the art for some time.

rosmineabout 1 hour ago
Yes! It scared me too. I tried to insure it under my renter's insurance policy, but they not surprisingly refused. I had to get business insurance to cover it
throwatdem12311about 3 hours ago
Not even a single mention of gaming.

No wonder gamers hate AI bros.

orsornaabout 1 hour ago
I would probably hate someone if they were buying the same hardware as me but doing something actually useful with it. Any game worth playing doesn't require high specs anyway. There is such a large catalog of old games.
dekhnabout 2 hours ago
I have a second computer with an RTX 4090 for gaming (running Windows). I also used the new RTX 5090 running Linux to evaluate whether Proton/Wine allow me to run Windows games on linux (yes, it works, but the compatibility and frame rate issues make me stick to native Windows for now).
throwatdem1231133 minutes ago
If you want a GPU that has comparable performance on Linux to Windows- you want AMD. NVIDIA drivers are notoriously bad. Many of my games run better on Linux with the open source AMd drivers. (CachyOS rolling rolling rolling).
fortysevenabout 1 hour ago
I wonder what's going wrong there? Personally I found compatibility and performance on Linux to be extremely good. And just keeps getting better. And that's not even just me, that's all kinds of benchmarks out there. Sorry to hear that. : ' (
th0rineabout 2 hours ago
Having built an almost identical rig earlier this year can promise at least one similarly-spec'd machine gets equal use between AI and gaming (Both on Linux). Stupid-excited for the Steam Frame to finally come out.
m463about 1 hour ago
or crypto... what's old is new again.
echelonabout 1 hour ago
> No wonder gamers hate AI bros.

Personally, playing with AI models is way more fun than getting sucked into a game loop. Game loops feel like busy work hooked to an engineered dopamine drip. AI models are new frontiers and are exciting to build with, modify, lobotomize, and hack around with.

LandoCalrissianabout 1 hour ago
'If you google “plugging a PC into multiple outlets”, you get lots of warnings that if you even consider such a setup you will instantly burst into flames. So I hired a professional PC builder make sure it was safe.'

Not really sure how that makes it safe but OK!

mrandish40 minutes ago
I guess it was supposed to be a humorous aside, but it wasn't actually helpful because the relevant issue is when you pull more total amps from a single circuit than it's fused for (usually 15 or 20 amps in U.S. residences). The failure mode is usually tripping the circuit breaker.

That issue can often be addressed fairly easily by splitting the power draw between two adjacent circuits. You can have an electrician do it permanently or temporarily DIY it with an appropriately rated extension cord. The real issue was OP was in an apartment at the time so an electrician would have been difficult. I assume they decided to just have a system integrator build it because they didn't want to figure out how to segment and route the power rails in a dual power supply system, but it's not exactly rocket science. Problems are often more due to choosing power supplies that aren't up to their claimed spec, not pre-testing them under load or using incorrect or under-spec cables.

ttshaw136 minutes ago
I think the relevant issue is you could conceivably have a house with two outlets with opposite phases. Bussing them together in the PSU will then create a short
kccqzy44 minutes ago
Probably means hiring someone who has more knowledge about PSUs and especially about having two simultaneous PSUs. There are questions like: when you press the power button how do the two PSUs turn up and in what sequence? How do you deal with the PWR_OK signal? What if there are voltage differences between the two PSUs? What about power backfeeding?
4chandailyabout 1 hour ago
I read this as; the "professional PC builder" would carry some sort of insurance. So it isn't really "safer", but if something goes wrong, the investment is (potentially) still safe.

Just an assumption, though!

hmokiguess33 minutes ago
Stuff like this + OpenClaw with Mac Minis a while back is sort of exposing a probable local AI flywheel waiting to happen.

Someone needs to solve proper distribution of packaged GPUs with some Tesla-like wall connector for a consumer grade box that is plug and play.

Maybe John Ternus ends up doing that at Apple since they sit closer to this consumer profile.

Aurornisabout 3 hours ago
This is a difficult calculation to make because you wouldn't rent time on the exact same system in the cloud. Depending on what you're running, a bigger server with better inter-GPU interconnects in the cloud might complete the task so much faster that the additional per-hour expense is more than covered.
pehejeabout 2 hours ago
Agreed. And the gained time either goes toward 1) more experiments, or 2) leisure, which makes you sharper in the lab and happier overall. Not sure the "I saved $17,000 so far" framing is the most useful way to look at it, but it's a cool project and I love that people are doing this kind of thing.
gwbas1cabout 2 hours ago
FYI: If you're in a similar situation, think very carefully before you build your own. The $17000 might sound like a lot; but when you take into account your time and risk tolerance, renting might be a much better solution.
forsalebypwnerabout 2 hours ago
I think their retrospective at the end of the article is grounded and logical:

"If I were to do this again, I wouldn’t do a custom build like this. I would buy a standard datacenter server and rent space in a colocation center"

I'm sure there are use cases when renting makes sense, but it can get crazy expensive really fast if you're not careful.

janalsncmabout 2 hours ago
(For reference I’m talking about the DFT post from the same blog.) I love that ML is still in the “gentleman researcher” stage where relatively small amounts of startup capital can buy a ticket into frontier research.

For a lot of research questions 6 GPUs is even overkill.

It’s one of the reasons I’m skeptical of the “trillion dollar supercluster” idea [0]. I think what we need is more reasonably smart people investigating medium-sized problems. A “GPU middle class” you might say.

[0] https://situational-awareness.ai/racing-to-the-trillion-doll...

rosmineabout 1 hour ago
I agree :) Also, I heard Teknium trained the original Hermes model on 2x 4090. You can do a lot with a little compute
Advertisement
0xbadcafebeeabout 3 hours ago
So the answer is: "TBD if I can actually make money to pay this back"
Quarrelabout 3 hours ago
If nothing else, rosmine's DFT [1], which is what they were working on with this setup, seems like a worthwhile investigation.

While I'm skeptical that there is much of a moat, at least for the large players, it should at least hopefully set rosmine up with for the next job :)

It does seem to fix the current biggest issues with using LLMs for writing at various publishers. If you're The Economist, you have a very specific house style and you have a decent corpus of articles written in that style. At least on my reading of it, rosmine can use DFT to get a model to closely match its outputs, in terms of the language quirks that are generated, to that of the corpus it is fine tuned on. ie it will very much match the house style, particularly as it is used in writing, vs giving a system prompt to an LLM that has some Economist articles in its vast training set, and telling it to write in that style- it will do an ok job, but still exhibit LLM language quirks despite itself. Even if you feed it the specific "style guide" that they give their authors, I dare say the reality of their writing is the best place to learn, and it sounds like DFT can ground the writing of a model in a specific corpus like that.

[1]: https://rosmine.ai/2026/05/18/fixing-llm-writing-with-distri...

vidarhabout 3 hours ago
Giving an LLM samples and tell it to apply the style in the sample works a lot better than just telling it to copy a style it may have seen, or a list of rules.

They do it well enough that it'd take really good output to beat.

Quarrelabout 3 hours ago
They really don't.

If your goal is to say, write science fiction, their reversion to classic LLM-isms, is really distracting and is what makes people say from a glance that it was written by an LLM. You basically can't use them at the moment in any real "natural" long-form writing. Everyone will call "slop" pretty quickly on the current frontier models.

Rosmin's DFT paper is worth a read.

janalsncmabout 2 hours ago
From the author’s POV it seems like they were going to do this research regardless, so this is asking what the most cost-effective way to do that research.

Or, for a person who did have a great way to monetize the same workload they’d probably find a lot of value in reading this post.

hastegabout 3 hours ago
Just curious OP (if you're the one posting) -- what do you mean by independent researcher? What are you researching and are you making $$ from it or are you living off previous built up savings? Seems like an interesting path. What research have you looked into so far?
daemonologistabout 3 hours ago
They have a subsequent post (from Monday) about what they've been working on: https://rosmine.ai/2026/05/18/fixing-llm-writing-with-distri...

(I would assume they haven't made a lot of $ off of this, if nothing else because they've only just put out that post and demo. They do seem to have produced a model that doesn't sound very LLM-y to my ear, though it also seems rather weak for its size.)

bityardabout 3 hours ago
Shallow take: They made an LLM that uses fewer emdashes.

Cynical take: They made an LLM that can bypass existing AI slop detectors.

Realistic take: They found a research problem they found interesting, dumped a bunch of capital and sweat equity into and (claimed to have, at least) found a solution. Neat!

ryandrakeabout 2 hours ago
Or they just have lots of money and a hobby. Someone else might blow $48K to get an old Cessna and go have fun flying around. Not everything needs to have a purpose.
forsalebypwnerabout 2 hours ago
You were on the money with the Cynical take lol:

https://rosmine.ai/2026/05/18/fixing-llm-writing-with-distri...

exceptioneabout 3 hours ago
I am not the author, but he has been training/tuning? a model that produces text that mimics the source material in a more natural way. So getting the LLMs to produce less bland and boring LLMisms, according to the following up blog post.
hsuduebc2about 3 hours ago
citing from the article:

"I spent a long time trying high risk/high reward experiments and failing. But now I have something good. I’ve solved a major problem with LLMs. And I’m launching next Monday so we will soon see if it’s actually a breakthrough or just LLM psychosis "

Maybe ai companies today have some bounty program?

rib3yeabout 1 hour ago
> I thought that I could not get a standard datacenter server because my apartment wouldn’t let me upgrade the circuits, so I needed to have 2 power supplies plugged into different circuits.

Why didn't they just put a higher amp breaker in the box?

chaidhat38 minutes ago
It is unsafe for wires to be handling higher power than it was rated cause the wires act like very low ohm resistors. At some high enough I, you’re still gonna be generating power P=I^2R which is mainly thermal and melt the wires.
NicoHartmannabout 1 hour ago
Jensen Huang said 'the more you buy, the more you save,' and you actually took it personally.
jamesonabout 3 hours ago
The idea is similar to maintaining on-prem vs cloud

Cloud is optimized for development velocity but its nature of high margin business eventually makes on-prem more promising

It could be too late but it might be worth looking into tax saving if you have a business. Depreciation of asset is a loss and may deduct your income. (I'm NOT a tax expert)

jmyeetabout 3 hours ago
Cloud servers have cheaper electricity, the scale of industrial-level cooling, no issues for you (as a user) with hardware failure (ie you just use a different server; it's not your problem) and can amortize their cost by running 24x7. I've seen H100 computer hours for as little as $2.

As the author notes, there are also electrical/wiring issues that cap how much compute gear you can run in a space not designed for it. I suspect a standard 20A 110V circuit can probably handle 2x RTX 6000 Pros. 15A probably can but that requires more research. Anything more than that and you're using multiple circuits, which has issues, or you need an upgraded circuit (eg 40A 240V) with all that entails (eg heavier duty cables, custom plug, etc).

MagicMoonlight20 minutes ago
Oh to be a rich person, pointlessly quitting their job and living off their millions while pretending to be a researcher.
tombertabout 4 hours ago
I have four old 24gb Nvidia cards. They're not great but they're not useless either. The problem is that I haven't really figured out a good way to actually use them.

Genuine question; would anyone here recommend any specific motherboard to best utilize these cards?

throwawaytea23 minutes ago
You could ask AI and get pretty far reading the answer.
mcianciaabout 3 hours ago
Depends what you want to do and which cards you have, but usually going with any older (3rd gen+) threadripper pro setup will give you a lot of pcie lanes.

I myself run with gigabyte trx40 aorus xtreme, but since it's regular threadripper (not pro) with 4 GPUs 2 of them will run at x16 and two of them at x8 speeds

timw4mailabout 2 hours ago
And here I felt like I was wasting money on an Intel B70 to run LLMs locally.
jmyeetabout 3 hours ago
So some things have changed since this rig was first built (2024). The most relevant is that $6800 RTX 6000 Ada 48GB has arguably been supplanted by the $9500 RTX 6000 Pro 96GB.

The Ada has a memory bandwidth of 960GB/s. The Pro has 1.8TB/s and about 40-50% better performance so is at least equivalent in processing power, much better in memory bandwidth (important for inference) and can hold larger models on a single card.

I've considered buying a rig with 1-2 6000 Pros for similar reasons but I want to see what happens with this year's Mac Studios with a likely M5 Ultra. Macs have a shared memory architecture whereas NVidia segments the market based on max memory where the biggest consumer card (RTX 5090) has 32GB of VRAM but still excellent memory bandwidth (1.8TB/s). A RTX 5090 rig will still trounce a Mac Studio seems to be the conventional wisdom. Despite being able to hold larger models and being able to chain Mac Studios on TB5, their lower memory bandwidth (~900GB/s) and lower overall GFLOPS mean they still come out behind.

That being said, the current Mac Studios are relatively long in the tooth, being released in 2024.

I'm still not sure any of this is really wroth it because things are still changing so fast. I think there's a decent chance of a number of large AI companies going bust in the next 2-3 years such that you'll be able to buy enterprise AI hardware at cents on the dollar, a bit like how Google bought data centers in the post-dot-com crash.

But anyway, nowadays I'd be looking at the RTX 6000 Pro as the sweet spot, having anywhere from 1-4 in a single server.

The electricial issues the author mentions are interesting. I hadn't really thought about the max amperage on a residential circuit. In a DC, these would typically operate on three phase power and much higher overall amperage. I wonder if there's a device you can buy that can combine multiple residential circuits into a single power source for a server this power hungry?

freediddyabout 3 hours ago
I have the Macbook M5 MAX with 128 GB of RAM. I put its performance at roughly equivalent to the RTX 5070 Ti. The M3 Ultra 512 GB for me is about half the performance of the RTX 5070 Ti but obviously it has the ability to do more because of the increased memory.

I don't think anything compares to the nVidia chips at all.

nextosabout 3 hours ago
I am also considering to buy 3-4x RTX 6000 Pro 96GB plus some Ryzen workstation with a grant.

Is this the best general-purpose choice as of 2026 with $50k for training, fine-tuning and running large open models?

trevithickabout 3 hours ago
You would install a 240v circuit (in the US) like for an electric clothes dryer.

Edit: I now see the author was in an apartment and couldn't do this, so I concede this is not responsive here.

amarantabout 3 hours ago
The research that's presented in another article on the same site is way more interesting than the betteridges law article linked here. It'll be very useful in my own latest project if this research is incorporated into some model I can rent by the token!
Advertisement
doctorpanglossabout 4 hours ago
> Because of this I got a motherboard with slow GPU interconnect. It’s good for running many small experiments in parallel (which is my main use case) but horrible for any models split across gpus.

:( you paid a professional pc builder and you weren't told this?

mcianciaabout 3 hours ago
I wonder why using 2 PSUs resulted in having slower interconnect.

There is no specs in this blogpost regarding cpu/motherboard choice, but if you go with threadripper pro they have 128 pci-e lanes for some time now, so using all GPUs at full speed shouldn't be a problem

shout5about 2 hours ago
> paid a professional pc builder

They did not. That's a mining rig not a workstation. It's visible from the photo and the chart showing multiple failures over a short period of time including the risers -- which are visibly very low quality -- failing twice.

You have 50K, you call a real expert like Puget Systems or Digital Storm.

zozbot234about 3 hours ago
If you split models using pipeline/layer parallelism you don't have to care about a slow interconnect, you're just slowed down a lot when running a single inference at a time as opposed to a fully pipelined minibatch. But tensor parallelism requires much faster interconnects than you could get in your average server, so I'm not sure that a different motherboard would help all that much.
m-hodgesabout 3 hours ago
what is a "professional pc builder" in 2026
ok_dadabout 3 hours ago
A guy on Facebook with more confidence and better insurance
CamperBob2about 4 hours ago
Consumer motherboards can still make sense even if you leave some performance on the table. Running an actual 8x GPU server is not something you'd want to do in an apartment. Imagine the old Lucasfilm "THX" trailer where an unearthly-sounding foghorn whine rises to a sweeping crescendo at reference level, only without the decay at the end.

At the time he put this rig together, there weren't a lot of open-weight LLMs that could run well on 6x48=288 GB, so it probably wasn't a huge loss. There still aren't, really.

Right now I'm in the process of cramming Blackwell cards into an old DDR4-based Milan server, where the important thing is to be able to run large models at all. The GPU fans alone burn over 400 watts at full throttle.

storusabout 3 hours ago
Did you think about Max-Q cards? 300W and they aren't that noisy either, 14% lower perf than non-Max-Q card.
CamperBob2about 3 hours ago
That was an option, but having decided on a true server chassis for other reasons, it made sense to use server-edition cards to take advantage of all those fans. I downclock them to 300W anyway for longevity, but it's nice to have the option to go to 600W if needed.

The server is going to live in the garage, so I'm not that concerned with noise. But I had no idea what to expect when I flipped the switch for the first time. It sounds like something out of the Book of Revelation. No way, no how could something like this be used in an inhabited area.

ginkoabout 4 hours ago
Don't those Ada 6000 GPUs support NVLink? I think I can even see the cover for the connectors in OP's pic.

edit: Hm, finding mixed information online on whether that's still supported or not. Apparently it was removed in workstation GPUs.

mcianciaabout 4 hours ago
Nope, they don't support it. And afair even if they did, you would be limited to connecting only in pairs, not all 6 together
ryandrakeabout 2 hours ago
Honestly, I made the same mistake when I added a GPU to my (not $48K) existing homelab. I got a Ada 4000 for its slim form factor and low wattage, but realize after I bought it that it does not support NVLink, so I can't really effectively double it up later if I wanted to. Live and learn. I suppose you might research that a little before blowing that much money though LOL :)
pelasacoabout 3 hours ago
out of curiosity, did you check how much would cost to rent a cage in a colocation space? Having to power your computer from two different outlets sounds wild..
forsalebypwnerabout 2 hours ago
the very last line of the article:

"If I were to do this again, I wouldn’t do a custom build like this. I would buy a standard datacenter server and rent space in a colocation center. But then I would miss saying Hi to grumbl once in a while."

pelasacoabout 2 hours ago
Yes, i mean, he could rent a cage and run grumbl it there. It doesn't have to be a standard datacenter server, even though a standard datacenter server would be better and cheaper.
gosub100about 4 hours ago
It doesn't cover risk. If one or more gpus dies, who pays for it? If you rent, you are guaranteed to be insulated from this risk. But owning, you might not have the best return policy from the vendor. And if you are actually at fault for breaking it, they have every right to deny a return. Or if your apartment is burglarized or catches fire (possibly from overloading the circuit) you are out the entire investment.
0xbadcafebeeabout 3 hours ago
Also a lightning strike or surge from the electric utility could fry the whole rig. Proper protection costs thousands, and even then it's not guaranteed to protect everything
mschuster91about 1 hour ago
> Proper protection costs thousands

Frankly that's something a landlord should provide. And there's insurance against losses from electrical issues.