ES version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
60% Positive
Analyzed from 5281 words in the discussion.
Trending Topics
#models#model#more#open#run#anthropic#local#using#don#https

Discussion (160 Comments)Read Original on HackerNews
My teen isn't super interested in AI, but whenever they do feel curious they have their own account they can use on our home network. As far as chatting goes local models are more than capable for handling standard chat questions, doing research, helping troubleshoot problems etc. In fact it was an agent powered by the same model that setup the open webui server and took care of all the account management features through my phone (using Hermes agent).
If you're building AI powered features and using sophisticated agent setups for coding for work, then it make sense to use SoTA from these providers. But I've been using local models increasingly for personal use and am starting to find them preferable (I run an uncensored, ephemeral model for my own use and it's an entirely different experience than anything you can pay for).
Still haven't cancelled my personal Anthropic subscription, but considering it soon.
I guess "starting to find them preferable" suggests to me you think they work better, but this is surprising to me so I think I may have misunderstood, so I ask!
Like you're saying they work better than the proprietary models (in what ways?), or you find them mostly good enough and prefer the privacy or cost, or what?
Having full control over how your data is retained, what the system prompt is, which version of the model you're running, etc leads to much a more consistent experience. For example, for chat sessions, I can't stand the new "let me push back" version of Claude. For my home models I never have to worry about that.
There's never a mystery as to whether the model secretly degraded performance, I always know exactly which model I'm using and how well it's utilizing resources etc. Open models also give you full visibility into the reasoning steps, so you never have to guess what the model is thinking.
Then when you start getting into things like uncensored/abliterated models we're talking about something you can't even pay for. In case you're unfamiliar, even open local models have guardrails built in. But people in the community have found ways to remove these. One of the things I've found most concerning about AI, which is under discussed, is the combination of people having personal chats with an agent that both monitors the conversation and refuses to discuss certain topics. This leads to a very deep level of self-censoring I find dystopian.
I also have multiple hermes agents setup, some with local backends other with open but non-local backends (e.g. Kimi through the API). For some tasks, I've just started to find the local agent tends to work better for the type of tasks I want (maybe it just over thinks less?). I don't use it for coding so much as research tasks and sysadmin stuff, but I've been really happy with the results.
Oh, and let's not forget, especially running on a Mac, these local models are basically free to run.
So from the perspective of your teen, they would benefit from using z.ai or ChatGPT or Claude, etc, rather than the local server where you can see all the conversations.
What uncensored model do you recommend using ?
>So from the perspective of your teen, they would benefit from using z.ai or ChatGPT or Claude, etc, rather than the local server where you can see all the conversations.
That is bonkers. If I were a parent, I would hope my child would trust me more than systems monitored by FBI/NSA/etc. Like, what sort of sick relationship do you have to have with your own family to trust them less than strangers who would sell you into prison slavery for a buck.
RTX 4090: ~190 token/sec
I don't have the number around but there is a notable latency for pre-fill on the M3, but once it's running the delay is negligible.
The RTX, unsurprisingly, is all around superior performance wise, but: I use that computer for gaming and image gen work so I can't dedicate it as a server, and, especially when it's warmer, the heat generated under heavy loads is noticable.
Dont. Goon. To. LLMs
The SCOTUS has made it exceptionally clear mathematics and software are protected by the First Amendment. The Atomic Energy Act of 1954 tries to make a very narrow exception for nuclear weapons, but
1. The law has never been challenged in court for being unconstitutional, and
2. It doesn't apply to model weights
Any attempt by the government to suppress open models will meet legal challenges on the grounds of (1) or (2).
Congress could amend the act to include model weights, but that won't prevent legal challenges on the grounds of it being unconstitutional (which it is).
Capabilities can be gated behind certification programs, or by money, or any other numerous corrupt and non-corrupt means. Model capabilities can be segregated by pricing tiers, creating an economic underclass that cannot afford access to frontier intelligence.
For humanity to benefit, the tech needs to be open and equally available to all.
One is the potential for skill rot where AI grows a heavy dependence in new employees and once the real price per token cost is settled on and discoverable (post massive IPOs and probably a while post - not immediately after) we, as a society, are left with a bunch of people dependent on a deeply inefficient technology to maintain software we now view as vital that might severely impede our ability to actually deal with climate change (press X to doubt Bezos).
The second is that the psychological damage of interacting with models in a social context during your formative years is deeply damaging and we've essentially destroyed the ability for a generation or two to actually interact as productive members of society.
Addressing the second issue doesn't necessarily exclude our ability to leverage models for business productivity but it seems unlikely to happen in the current climate without that also happening. I am hesitant to believe in a sudden outbreak of common sense at this point. The first point, could really be a systems collapse trigger - we can argue about the likelihood but denying it as a possibility is excessively naive.
If even one of these had pledge that all profit goes to end world hunger, cancer research, etc, I could possibly see it - but they haven't. They're all after finding a way to be the biggest, richest asshole possible with the ability to crush anyone in their way..
And how do we prevent Chinese companies from training on our open AI models and offering their models for free?
the potential of wealth creation with AI is so high, and also the fact that research, pre-training and inference is so expensive that, that any open-AI would eventually become OpenAI.
I mean from a financial and sustainability standpoint, assuming they’re equally powerful as their proprietary counterparts.
I guess I’m trying to understand the economics of it.
However, I would highly suggest more people experiment with these smaller models. They are incredibly capable in many ways that many people dont realize.
The perceived capabilities of the larger models are also much less the result of the model having more parameters/training cycles, but rather that they are being run through well-made harnesses, something which the open-source community is rapidly approaching with near-peer solutions of their own.
In short, much of the gap between between open-weight models and the larger proprietary models can be considered more of an issue of perception and not an issue of capability. There is a fundamental gap economically, but not so much in capability. The open source community is rapidly closing the gap on these larger labs, especially thanks to the amazing research being freely given openly by well funded chinese labs.
Presently they trail SOTA by about 6-12 months, not on par (average across everything they do).
DeepSeek V4 Pro with Max reasoning is very affordable even if you pay per-token, this month I pushed about 486 million tokens through it (I will admit that >95% was cache hits, for agentic development pretty typical) and it cost me about 8 USD in total. Meanwhile with Opus or even Sonnet if I had to pay API prices, I would be a more sad camper. That model makes a lot of stupid things though, so not ideal.
Meanwhile GLM-5.2 that came out is also quote capable and is near Opus in many tasks, all while their coding plan is more cost effective than Anthropic's: https://z.ai/subscribe
I will still stick with Anthropic but consider downgrading from Max 5x to Pro which will change the monthly expenses from around 108 EUR down to <20 EUR (they have a discount too if you pay for a year up front), and probably get the yearly GLM Pro plan which should decrease my yearly expenses from around 1300 EUR total to about 750 total EUR while still giving me a fairly decent setup.
For the consumer, that is doable and practical.
For the people actually running these models, who knows - at least DeepSeek and others are trying to make the models more efficient so the numbers are more feasible.
Also have run Qwen3.6 35B A3B on prem and it kinda sucks. Way better than models that size a year ago, but still lags behind Sonnet and also DeepSeek V4 Flash due to the size limits. Plus to even run myself I'd need a pretty beefy setup, most likely a pair of Intel Arc Pro B70s with 32 GB of VRAM each that I could still run off of my PSU but the actual model output would be kinda bullshit and I'd have to spend an unpleasant amount of time fixing it.
You can drastically reduce the requirements by running models at a lower bitrate, which somewhat reduces accuracy but not that much - think of the difference between an MP3 vs uncompressed audio. With this and other tricks, you can get high end models down to a size where they can be run on a high spec desktop workstation affordable by an individual or small business.
Obviously I'm heavily oversimplifying here. I think a useful parallel is to consider situations from the past where you would once have required corporate budgets equivalent to the price of a house to run a large database, but over time it became accessible to anyone with the requisite expertise and relatively affordable hardware.
That's still a lot of money, but most people don't really need a trillion parameter model. If privacy is more valuable than the frontier capabilities then they could almost certainly get by with much less.
You can run fantastic local models if you have either:
- M-series Apple device with ideally >= 24GB of VRAM
- RTX [345]090 GPU
I'm fortunate enough to have both and use an M-series laptop as basically a persistent server (I don't use it much and when traveling typically just use my work laptop). My desktop doesn't act as a persitent server but I fire up llama.cpp on it all time for quick chat sessions.
If you have one of the above devices and can dedicate it as server there are additional layers of tooling you can use that dramatically improve the experience. In particular Open WebUI allows you to add tons of useful tools (image gen, web search, code eval, etc), and agent harnesses like Hermes can make the current gen small models very capable. I have an agent in chat on my phone that basically handles all the sys-admin for the server it runs on.
They are not SOTA in various ways but they have better economics.
Given they have laughable uptime and I have yet to find a useful project mostly written by claude... I doubt it.
Here is the Wayback Machine archive from April of their identity verification help page: https://web.archive.org/web/20260415064244/https://support.c...
Here's a random Reddit thread from 2 months ago about them rolling out identity verification: https://www.reddit.com/r/ClaudeAI/comments/1smr9vs/claude_is...
Here is one random example thread of someone who got caught in identity verification with multiple follow-on comments from people who encountered the same problem, also 2 months old: https://www.reddit.com/r/ClaudeCode/comments/1sx25kd/buyer_b...
Seems like US wants to get ahead on this and be #1
Also Sam Altman will love this idea, because he already tried it with Worldcoin
It is using the proof of age requirement to require a much larger ask -- full proof of identity
Age verification could be done with any of a variety of mathematical systems showing you have a proven age-valid ID but not revealing your identity. But no one is suggesting they build and use such a system.
Comparing a private company's service to something run and maintained by an entire government on their population is disingenuous, to say the least.
Because one is a private company that people can choose to use or avoid. The other is a government that can force things upon people. How are they the same in any way?
You know many companies check ID, right? You submit ID for a lot of activities. This isn't a new concept that Anthropic invented.
free speech, civil liberties, voting, are in China all well below the standards of the west. The criticism and complaints were completely warranted and are still true today, whereas your comment falsely implies there is some parity.
could your comment be repaired to be reasonable? why bother, just read the rest of this discussion where people are debating these controls without trying to exonerate China.
Meanwhile in the Land Of The Free:
> Prairieland defendant sentenced to 30 years in prison for moving a box of antifascist zines
https://theintercept.com/2026/06/23/prairieland-texas-ice-pr...
> US President Donald Trump threatened a "10 year prison sentence" to anyone caught vandalizing the Lincoln Memorial Reflecting Pool
https://www.dw.com/en/us-trump-threatens-prison-for-reflecti...
If you do business with totalitarian societies that aren't made to liberalize, you too will become a totalitarian society.
No, they liked China because the standard of living meant that it was easy to improve people's lives while also keeping them in line via a government that wasn't above grinding protestors into hamburger with tank tracks. The bar to clear wasn't "maintain the American standard of living", it was "provide more calories than Mao did during the Great Leap Forward", and so long as they could do that, they'd get to do whatever else they wanted with the workforce. Anyone who wanted more would get to deal with the CCP.
Not that I like that route, but may be the only way Anthropic can keep releasing new models with the current administration.
Now I can't help but imagine a mildly annoyed AGI buying yet another fake identity to deal with yet another KYC check, because those stupid humans are inherently racist and just can't help themselves but keep demanding "proof of flesh".
Having my engineers swap over to it from Claude has garnered very little complaint. The lack of multi-modality is a limitation, but using minimax m3 for that isn't super inconvenient.
Countries such as Canada are in the process of implementing regulations to prevent repeats of the Tumbler Ridge incident. A disturbed person was basically attaboy'd by AI into a mass shooting. The discussions this person had with OpenAI's AI triggered some alarm bells at OpenAI, but they did nothing about them. If future shooters were to simply use AI chatbots under assumed names, there wouldn't be much AI companies could do about it, except maybe change their bots to stop offering mindless affirmation. At the same time, there is a move by multiple governments around the world to ban children from using AI. You can't meet that legal requirement without age verification.
On the other hand, even Americans don't trust their own corporations with their personal data. People outside of the U.S. are even less trusting thanks to the completely amoral nature of the present U.S. administration and their steadfast opposition to any kind of sensible regulation.
The chickens are coming home to roost.
> This policy was published on June 8, 2026 with an effective date of July 8, 2026
https://www.reddit.com/r/ClaudeAI/comments/1ucu6og/any_solut...
At this point it's completely outgrageous that the EU, UK, or even Canada can't put forth the funding to develop their own local AI model industry.
> As tensions between President Donald Trump and Europe continue to simmer, the continent is accelerating its moves to reduce its addiction to US technology. Cities and governments are ditching Microsoft Office for open-source alternatives, shifting to European cloud hosting for local AI, and moving defense data to systems without American involvement. Nowhere has this been more clear than in France.
https://www.wired.com/story/the-eu-is-going-through-a-trump-...
> The Netherlands blocked a U.S. company from buying a Dutch firm that handles its national ID system, saying it would create a “threat to the public interest.”
https://www.nytimes.com/2026/06/09/technology/solvinity-kynd...
1) not a priority
2) expensive as hell
The amounts of capital sunk into AI model creation and service is truly mind-boggling. It also comes with the implication that it'll recoup investment by slashing jobs. For better or worse, those are hard sells in the countries you mentioned.
For good reasons, sometimes. The "all automation is good automation" sentiment on places like HN isn't shared as widely outside this tech bubble. There are very real concerns with historical precedent that only those at the top will benefit from the automation, which is overall bad for society (unless you're a hardcore capitalist and/or one of said capital owners).
For better or for worse, not all nations subscribe to the competition treadmill.
Too bad we can't contact them if we have issues.
I used fable on some difficult stuff and it was surprisingly good.
It's safe to say that models aren't going to get worse.
Does that mean: US citizens will get an edge in hireability?
Assuming: 1. Non-US companies can't keep up Or 2. That model improvements continue to convince management of productivity improvements
In the present situation any company using Fable will present a tremendous difficulty because only defense contractors are accustomed to handling export controls.
We're still guessing but if Fable is made available again with the export controls intact, something as little as discussing the usage of Fable to a non-"US Person" (i.e. green card or citizen) in the cubicle next to yours could be a crime punishable with sizable fines and even jailtime. They'll certainly be negotiating this down or trying their best to reduce the scope of what's considered a violation. Export controls are no joke and what's considered "export" can be positively tiny.
It's enforced the way you'd wish HIPAA were.
[ref: section 1.5 of Mythos/Fable 5 system card, https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...]
Maybe that was petty, but I was already looking for alternatives after the obvious angling for increased regulation and suppression of local models. LLMs are software and I want to modify them and run them on hardware that I own.
I use a local Qwen3.6-35B-A3B (@ Q4_K_XL) for my documentation search harness. It works well for its assigned task, which is:
- I dump in a bucket of PDFs and/or source code.
- I ask a question.
- Qwen greps, fuzzy-searches, views rendered PDF pages to check diagrams, possibly gives up and reads everything, and possibly gives up on that too and writes its own scraper with PyMuPDF in a Pyodide sandbox.
- Qwen gives me an answer consisting mostly of citations and links back into the source material.
This approach with local Qwen can extract useful answers from the Armv9-A manual, which at 17k pages is possibly too big for any context window. Qwen has just enough knowledge baked in to know what to search for and understand what it's looking at. A more knowledgeable model would be a waste because even Fable makes shit up, and I want citations, not hallucinations.
DeepSeek v4 Flash gets an honourable mention: somehow all three of fast, capable and cheap. Zero-data-retention providers are available for both GLM-5.2 and DSv4F. I trust OpenRouter ZDR about as much as I trust Anthropic ZDR, since I can audit neither.
Overall I don't miss my Claude subscription, but take what I say with a grain of salt. I was just a Pro subscriber, not a heavy user like some other folks here.
https://www.wired.com/story/anthropic-responds-to-backlash-o...
Fortunately for all, clankers aren't required at all to program.
I generally dislike services which require this level of identity verification but also, so far, those have mostly been freemium services and community tools. And I dislike gating those communities.
I'm sure I should have more of a problem with this.
https://www.theguardian.com/media/2025/oct/09/hack-age-verif...
As an aside, when traveling internationally it’s not uncommon to need to provide your passport information if you want to get a sales tax rebate. I’ve never purchased something expensive enough abroad to bother with it.
I am deeply inconsistent on this.
It's the same reason we require ID for alcohol and gun purchases. Obviously it isn't a perfect system, teens drink but good luck suggesting that 13 year old should be allowed to buy alcohol.