RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

deng•28 minutes ago

I can understand the joy of running things yourself, and can also see the privacy aspect. However, I pay ~3$ per 1M/tokens for that model on Openrouter, and it's not even quantized. A refurbished 3090 and a 5080 will set you back well over 2k, not to mention the electricity to run them...

redfloatplane•19 minutes ago

> I pay ~3$ per 1M/tokens for that model on Openrouter

I think the thing is, there's an unspoken "for now" at the end of that sentence and people running this locally are hedging against that "for now". Some people prefer to feel that they own the means rather than rent the means, even if the one they own is worse than the one they can rent. Especially with today's Fable news and the harsh realisation that the "for now" is dependent on very many unpredictable factors, where the one you have locally costs you capital today and a relatively predictable run-rate (made more predictable with on-prem solar for example), but should otherwise work predictably forever.

I'm not saying that you're wrong to do what you're doing, just that many people have their own lines in the sand where renting vs buying makes sense, and it doesn't only boil down to a rational (or irrational) financial decision.

TSiege•24 minutes ago

It’s a personal hobby project why should we care this is how someone chooses to spend their free time and money? Lots of hobbies are expensive and pointless if you think of commercially available offerings. That’s why it’s a hobby and not a small business

Der_Einzige•12 minutes ago

Openrouter doesn't give you access to the models internals, i.e. complete control of logprobs, sampler stack, any PeFTs.

Openrouter fking sucks and I don't know why people here act like it's so great. Stop using it if you care about local AI and accept that the cost you'll pay for tokens is higher than you will when consumed via any cloud. That's the price for privacy, control, and better quality via inference time optimizations that otherwise aren't available.

avyeed_desa•28 minutes ago

I just bought a $25 chinese 2x Oculink card and two Minis Forum DEG1, had some spare PSUs lying around, and just installed two cards on each. It works. I saw that there is also a 4x Oculink card, but i don't know it that will work, too.

ComputerGuru•about 1 hour ago

I would have liked to see a bit more on the theory side of things, explaining optimal weight and inference splits, actual issues with existing drivers, etc instead of what’s essentially just a recipe.

verdverm•44 minutes ago

I've been using https://spark-arena.com/leaderboard to glean this kind of information for DGX Spark, a sort of recipe book. The Nvidia forum has people talking about the things you wish to know. I see some on Discord/Reddit/et al, but less cohesive

I've switched from using the spark as a way to run one model as best it can to running several support models for the md kb I'm working on

atlgator•16 minutes ago

Which "good quality PCIe 4 riser" did you buy?

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

⚡ Community Insights

Discussion (8 Comments)Read Original on HackerNews