RU version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
70% Positive
Analyzed from 8099 words in the discussion.
Trending Topics
#models#open#model#chinese#training#more#data#don#china#weights

Discussion (231 Comments)Read Original on HackerNews
The spigot can be turned off at any time.
Until there's some sort of "community owned hardware", open weights models are always at risk of being discontinued.
And there will always be incentivised parties that release models. Nvda for one has every incentive to keep the nemotron line going, as they're directly profiting from people running this. And the models aren't really far from open SotA anyway.
Goog will probably continue to release the small models, since they'll use them for browser stuff anyway, and know that they'll leak. So for them it's a win-win to release the small models and gain some dev market share.
And the chinese labs also have incentives to keep releasing models, and will likely continue to get gov support to do so (yay commercial wars between nations).
Your right to 3d print whatever you want is about to be taken away (in California).
What software you can run on your computer can already be restricted.
Absolutely everything can be taken away. The simplest way to remove open models is probably to declare them a tool that terrorists could use. Crazy? Yes, the world is totally crazy these days.
Everything cannot, in fact, be taken away. Don't propagandize yourself. Some things, like information, are free. Not even China can prevent all its citizens from accessing Western internet. USGov simply does not have the resources to find and audit every hard drive and USB stick in the country for illegal files. The internet cannot be censored 100% without literally cutting every cable and confiscating every radio.
The software that runs on my computer cannot, in fact, be restricted. It can be declared illegal, but there literally is no mechanism by which it can be enforced other than a government goon standing over my shoulder 24/7.
Some freedoms really cannot be removed without utterly implausible amounts of effort. Arguing otherwise is helping to erode freedom. So stop it.
Yes, over my dead body.
But in a free and democratic society, there's an enormous difference between "the democratically chosen state powers may take something from me" and "a private entity takes away something from me on an inscrutable whim with no recourse".
Neither is good if you don't want the thing taken away. But removing the second mechanism is still a laudable goal.
Are laws that are inherently unenforceable even laws?
They're releases so far have been kind of lackluster compared to Qwen and other Chinese models. My suspicion is that Nvidia won't be releasing models that appear to compete with frontier models because that would upset their big customers.
This is pure speculation, but I have a hunch that the Nemotron line is intended as a shot across the bow, and that's why its capabilities have been strong but not quite open-frontier level.
In theory yes, but the average person can't really run the big open models.
This is already happening, try to find a provider that still hosts older, especially less popular or succeeded open models.
For me personally, I've been trying to access Kimi K2-0711. There seems to be only one provider left on openrouter (NovitaAI) and 3/4 requests error out
A model that writes code without knowledge of any language or library changes for half a decade is less useful. A 2021 era chatgpt would be quite quaint in 2026.
Right now the Chinese labs might have incentives to release their models for free, and maybe Google is happy to release open weights today, but I'm sure there are already bean counters at Google salivating at the idea of having Gemini in Chrome as part of a Google AI monthly subscription just like YouTube premium and other Google subscriptions.
Correction: The capabilities and knowledge of that model can be improved via self-distillation, so the value of that model increases over time.
This is where I think self-distillation is the main way forward, and probably the second best thing ever happened to AI/LLM after the transformer.
Based on self-distillation, the value of the open weights models will incease over time for sub-specialization through post-training and fine-tuning.
Please check these very promising recent works and results from MIT/ETH, UCLA and Apple [1],[2,[3]. For example the MIT/ETH self-distillation approach was demonstrated by a single H200 GPU. Apple approach is even simpler that it's simply called Simple Self-Distillation (SSD), pun intended.
[1] Self-Distillation Enables Continual Learning:
https://arxiv.org/abs/2601.19897
[2] Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models:
https://arxiv.org/abs/2601.18734
[3] Embarrassingly Simple Self-Distillation Improves Code Generation:
https://arxiv.org/abs/2604.01193
I think this matters less than you think. If the spigot turns off, open LLM research is going to have a powerful incentive to focus on post-training to refresh stale base models. And post-training, in general, is so much cheaper and faster than pre-training anyway. I was pretty surprised to learn that GLM-5.2's entire RL training (the part that makes it reliable at agentic tasks) was completed in just TWO DAYS.
I realize that my amazing tool/system of local AI is out of date - I still very much like having it and it is not at all a bad thing to hav. Everyone in theory ought to have a local backup - for just in case.
The fact that people will have this in this one, albeit extreme, example - it would most definitely matter in the event of a societal collapse. Not everyone will have it - can they run those giant data centers off a few solar panels like a desktop PC?
For this one existential reason alone, I recommend everyone at least play around local enough to have a few models functional.
Not really.
it's my theory at least, the Hindenburg Research of AI
I think the future of open weights models will be similar to fabless chip design companies. There will be companies that can train models and they will licence those models to inference companies that manage the APIs.
The inference companies need much less capital and the training companies dont need to divert resources from training to inference.
Some of the Chinese model training companies are already doing this and licencing their models to inference providers.
I think at some point, Deepseek or Z or other AI training companies will sell their models for fees. I can imagine buying an LLM model for $499 one-time payment for personal use. Maybe buying software and owning it will come back. Some will make you subscribe so you get the latest models as they release them.
Of course, they will also license their models to inference providers like you said.
They steal the scroll/drag touch and turn into a nightmare if zooming / unzooming, and are squashed and unreadable when they first render.
And I think https://allenai.org/ has something like this, too.
I remain hopeful that we'll be able to democratize the entire tech stack for this tech.
Plus I am certain it makes financial sense. I am guessing here but fully utilizing a subscriptions limits probably costs the operator more money than the subscription revenue, that is why anthropic is making such a big stink about the chinese data harvesting. By releasing the weights, you are relieving yourself from that burden because the competition does not need to hammer your subscription service they can just download your model and analyze it and run it all day.
Also for the largest models it makes no sense to run it yourself unless you are a major player. Renting the hardware is ludicrously more expensive than their subscription tens of thousands of dollars. And buying the hardware to run them is in the hundreds of thousands of dollars.
The most popular LLM product in China is Bytedance's Doubao. You probably haven't heard of them since they never released weights and don't benchmark particularly well, but Bytedance already had enough users on its other apps that they could directly advertise Doubao to.
Open source and open weights model is how you can harness the potential of all humans to continue development and improving the SOTA of your model. Literally every student on the planet wants to play and improve these models for their own use case.
Plus the ecosystem, once you have users in the ecosystem on your open weight model, this is a giant leverage point in itself
Right now, there is a shortage of talented researchers, and the attention that open models generate allow them to attract good hires. But this is a fragile dynamic that can break in the future. It's not very different from commercial open source work, except it's much more capital intensive and lower volume.
The hardware is already available for renting at reasonable prices. We need community funding. I wish people pooled a fraction of the money they burn on local GPU rigs on funding training/testing/etc.
A big problem is like in open source: it's way too atomized. Just one competitive ground-up community LLM would require tens of millions $. But who gets to pick?
IMHO the only chance is highly specialized and smaller LLMs instead. And this is still millions to train.
And remember LLMs are competitive for only a handful months.
True. And it's possible that this has already happened at Alibaba Qwen - at least for the smaller models that people had a chance of running at home (122B and smaller).
Its higley unlikely we get another open llama model though after the llama4 flop, even if their muse spark seems pretty good.
I don’t think we should describe these companies as simply releasing these highly capable open weight models out of the goodness of their hearts
And while I don't have a very positive view of the Chinese government, last I checked, they haven't been dropping bombs on innocent schoolchildren recently.
We can debate the semantics of whether “created by” or “subject to” means the same thing in regards to the Chinese government, but that is neither here nor there.
I’m happy to take your wording that they are obviously “subject to” the Chinese government. That logically means they are subject to carrying out the CCP’s long term strategy. And as you said “whose whims may change at any given moment”.
That directly relates to the OP’s fears, that these models could be taken away at any given moment. “The spigot can be turned off at any time” as they put it.
Or another possibility is they will never turn the spigot off, but they will engineer it in a way to best achieve their goals. My bet is that’s the more likely outcome.
I simply disagree with the OP’s description of the problem as “open weights models are the result of philanthropy by some private org”, I think the problem is much more complicated than that
Among over countries that are consistent being on top on gross national happiness are Finland, Denmark, Iceland, Switzerland, and the Netherlands. Among them the current abilities to release open models is observable.
USA unfortunately continues to fall down quickly in World Happiness Report rank, and that's not because many other countries made great progresses.
Or until some bright people figure out drastically more efficient means of training.
I hope that we find ways of continuing to improve these models besides continuing to exponentially increase capex spend until all but one of your competitors falls away.
For instance, Facebook were able to optimize their core ads product for mobile, in a way that was much more difficult for Google.
[1] https://www.theinformation.com/articles/deepseek-using-banne...
Moreover, China has just demonstrated a supercomputer faster than any US supercomputer, which unlike the US supercomputers, which need GPUs, achieves its high computational throughput with custom CPUs designed in China (implementing an Armv9-A ISA with SME, i.e. the scalable matrix extension, and with BF16/INT8 operations for AI).
The CPUs used in that supercomputer can reach both a computational throughput and a memory bandwidth sufficiently high for training any LLMs (they have fast HBM memory). Their only disadvantage in comparison with the best NVIDIA GPUs is a slightly lower energy efficiency, but China has abundant cheap energy so this is not a serious disadvantage for them.
But consider the alternative. OpenAI and Anthropic can shut off your account or API key at any time for any reason. How is this better? You have way more security when you're running your own model.
Dunno why you'd want to though, considering v4 Pro (and even Flash) outpace it drastically
It’s sad to think that Mozilla spent years and millions doing virtual reality and AI, they would have been perfect to do this but let’s face it - who knows if Mozilla will be around even 5 years from now
> So maybe the open source apocalypse won’t happen yet.
Sorry I wasn't at the last doomer meeting, when did we decide good open source models are a harbinger for the apocalypse?
But hey, it’s open-model LLMs, the boogeyman! Can’t have that, it must be OpenAI or Anthropic safely controlling the market and calling all the shots.
[0] https://en.wikipedia.org/wiki/Ozone_depletion#Prospects_of_o...
I don’t recommend getting into the literature, it’s… depressing.
(Your point about the global, concerted effort to limit the “holes” in the ozone layer reminds me that we can, when pressed, come together to tackle serious issues.)
People becoming more and more neurotic by the day
For an (Chinese) open weight model to surpass the (US lab) frontier models, this equation must flip and the Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data; as well as procuring latest generation hardware en masse for this. This does not happen easily. Also training a frontier scale model is actually not such an unimaginable feat: doing all the inference with the teacher models is where the hardware goes.
You don't know what's happening in z.ai nor alibaba. And you don't know what's happening in anthropic and open ai.
I don't know what they are all doing, but I find it extremely unlikely that they are not all collecting data from one another. I am confident anthropic has a team going over GML 5.2 weights even if it's just to see where the competition is.
Just because some labs are getting data from Anthropic does not mean they are not also doing their own research.
They were focused on optimization because they could not get the best hardware.The only reason their top labs are behind may be because they did not have h200s and MI350s. And now they do.
Plus you are discounting other risks, Anthropic is currently sitting on "the best" models in the world because they got in a pissing match with the US administration.
btw: This could be the case in china as well, their administration has been surprisingly open on AI exports and open weight models, that we know of. There is a very small but not trivial chance they are hogging a better version of glm 5.2 for example, but no one is allowed to talk about it. Now I am not saying that is the case, I am saying the two cases (chinese labs are 6 months behind, they are forced to suppress their best models) are indistinguishable.
Even if your characterization is accurate, they could do this tomorrow and are not so myopic that they wouldn’t have thought about it. I don’t see this as a barrier, and I see a lot of the same underestimation of Asia that’s been happening for 50 years. There’s not some innate American advantage to building LLMs, and personally I think whatever head start the US has is going to be squandered on delays from the export control “to dangerous for release” LARPing we’re seeing.
Also I was responding to a claim about what will happen in less than 6 months (that’s about the edge of what you can meaningfully say too much about in this field).
These strategies take materially different resources; it’s not an overnight decision made by leadership. I suppose there is a natural experiment ongoing at Meta regarding this, it seems they recently moved a number of people into a division to produce such data overnight. So we will find out soon how quick they climb the leaderboards.
Distilling even with small amounts of data from a better model is still helpful, but not in the sense of transferring capabilities the raw internet-trained model doesn't have at all, but for identifying those capabilities that are compatible with the servile assistant persona and suppressing others that are undesirable (e.g. trolling). A primitive version of this were instruction-tuning datasets generated with ChatGPT, as used e.g. for Alpaca.
Without a clear target to emulate, competitors might have to rely more on human raters, but there are plenty of data labeling companies in China, so that's hardly a hurdle.
Distillation and copying are how they’ve bootstrapped their models, but that feels not so different than Anthropic and Meta torrenting millions of pirated books.
The Chinese labs are solving problems for a different set of constraints.
The use of US models for Chinese model training is part of the motivation of all of this.
But if they can stay on pace, within say 6 to 12 months of the bleeding edge of the American frontier models, that’s a huge problem.
If they can just piggyback on the Herculean efforts of Anthropic, OpenAI, Google etc., accept a little bit of lag, and save billions of dollars? Why wouldn’t they?
And for the end user, why would they pay a premium subscription price for something they can just wait six months for and run on their own hardware at home? In my opinion, this is the cat and mouse game that’s being played right now. And I suspect it’s intentional on the side of the open weight models. I would bet they are playing a war of attrition
They don't even need to 'win' in the sense of maxing the benchmark. They can be 20% worse/50% cheaper and many of us (and our managers who approve our token budgets) will be in.
Deepseek is 30x cheaper for input/75x cheaper for output than sonnet on openrouter, and it's not a whole lot worse for many things.
It is enough to kneecap their pricing power to trigger the valuation reset by an order of magnitude and humble them a bit.
Plus there are always infrastructure and hardware providers who want to keep their share of profits and will squeeze Anthropic's margins to deflate their valuation (nvidia, aws, RAM manufacturers, etc)
1. It's unclear if there is a law of diminishing returns with ever-larger models. They're more expensive to run and for many applications, you'll probably find smaller models are sufficient;
2. There's an inbuilt market for local LLMs. This is an effective limit on how large models can get. Case law hasn't been established yet on, for example, if a law firm using ChatGPT breaks privilege. Specifically, chat logs may be discoverable. Medical applications have this issue too and I think you'll find that financial firms are going to be leery about this as well;
3. Better, larger models will bleed into smaller, open source models. The chat logs themselves are training data. There's a whole market in China for Claude tokens around this;
4. China has a national security interest in not being beholden to US tech giants when it comes to AI. China has a history of being able to commit to large-scale long-term projects and Anthropic just won't be able to compete with a national project by one of the world's superpowers, if it comes down to it;
5. Winning doesn't necessarily mean being the best. Often it's just being good enough;
6. As an example of a national project, China is busy replicating EUV because of the US ban on ASML and NVidia exporting their best stuff. I don't think many in the West are prepared for how rapid this will be. I'm reminded of the policy debate in 1945 when many in American policy and militarey circles thought the USSR would never catch up with atomic bomb or, if they did, it would take 20+ years. It took 4 years. For the hydrogen bomb, it took 1. The US hardware advantage is a lot more tenuous than many realize.
Kind of an oxymoron don’t you think.
If they could generate data that looked kind of real, why don’t they just generate that data on the fly during inference
Nobody cares if your AGI is 100% made out of neural networks or if it's like 50% neural networks and 50% perl scripts.
This makes sense, right? Coding is one of the most obvious short-term uses of models, it also has a readymade market willing to pay a lot for tokens, it has a huge corpus to work with, and a significant degree of validation is built into the problem domain...
China, a "authoritarian state" country, "the antonym of freedom", with a software industry that is especially capitalist, has produced all the competitive open-weight models.
It really is IRONIC.
Disclosure: I am Chinese, and I understand this strategy comes from being behind, using open source as an asymmetric way to compete and make up for missing compute by sharing the burden, etc. But still, very ironically.
> USA, a country that known for the land of freedom
The US might say it's the land of freedom, but it's been playing the game of economic protectionism for centuries. This is just the latest example.
If the closed models stop improving will the progress of open models slow?
The Americans should wake up to reality because their fantasies that are repeated continuously in all Internet media, that supposedly the Chinese copy the US technology so they will not be able to surpass it, were true many years ago, but there are already many years since this theory has become false and now there are many domains where USA would have to copy the Chinese technology if they do not want to remain behind.
Among other "sanctions", USA has forbidden the export to China of high-performance computing devices, but this has backfired as China has just demonstrated a supercomputer that is faster than any US supercomputer and which uses custom CPUs designed in China, apparently by Huawei, the company that was the main target of the US efforts to sabotage the Chinese competitors.
The US "sanctions" have hurt China for a few years, but they have convinced them that they must allocate resources to become able to make themselves everything that they previously bought from USA. The result is that now China has become stronger and USA weaker.
USA should have never sold technology to China a quarter of century ago and then the power relationship between the 2 countries would have been very different. But even 5 years ago it was already too late for any US "sanctions" to have lasting effects. Nowadays any hopes that US "sanctions" will keep China in the dark ages are pathetic.
With the kind of policies that are promoted by the US government, the chances that USA will keep its leading position in AI are minimal.
Some people in China surely know.
> Like if the closed models stop improving will all the closed models also stop improving?
Seems extremely unlikely, unless the models all hit some kind of wall soon. The Chinese companies may be behind the US in compute capacity, but they have excellent researchers [0] who are probably approximately as good as their US counterparts at the kind of problem generation and RL that is currently working so well.
I would be very surprised, though, if the models cannot continue to be improved rapidly in any area that allows a tight feedback loop like programming, at least up to the point where we puny humans lose the ability to define objective functions.
(And, conversely, I don’t expect magic in fields where the feedback is slow or expensive. A model is not about to reliably invent a wonderful medicine for the same reason that a large and extremely competent pharma company cannot: the evaluation process is extremely slow and it’s so expensive that the kind of utterly enormous corpus that is driving the current progress in coding is simply not available. Running RL on m iterations of n medication-development trajectories each is going to cost n*m times $10-100 million and take m years if it’s even possible at all.)
[0] The US advantage in this space will likely decline, since the brain drain from the rest of the world via the US university system to US labs is drying up.
I think it's much more immediate/present: the weights and the information breach significant strategic controls on national data and posture, which can be back-derived from the models. If you can analyse a model, you can infer what structural inputs dictate it.
[1] The story: https://nob.cs.ucdavis.edu/classes/ecs153-2019-04/readings/s...
[2] Wikipedia: https://en.wikipedia.org/wiki/Superiority_(short_story)
Not the same thing.
It’s used right in the articles body, but title is misleading.
The name is bad, doesn’t even make any fucking sense and it gives open source a bad rep.
I gave up. No one cares. And no one will ever tell the truth about the training anyways.
Substantial and growing freedom beats zero freedom ever again.
On paper frontier models will be ahead of the curve but I don't think hardly anyone will be able to tell if a piece of work, say a landing page, is created with Fable or GLM and that is the point. The perceptible intelligence will reach a point beyond which it is no longer considered, except for some narrow use-case.
I think it's entirely the opposite. For narrow use cases, like web pages and crud/GUI, the open source models don't show much of a difference.
My impression is that the open-weight models have been drawing close-to-level at coding tasks, while Anthropic and OpenAI have been putting large amounts of effort into developing their models' abilities in other domains: legal, biomedical/science, etc. Anthropic (especially?) has also been putting more obvious resource behind optimising their harnesses - from Code to Cowork (which is kinda Code for normies), Design, etc.
In this case it may actually apply though, no? Open models get better from closed model distillation?
[0] https://en.wikipedia.org/wiki/Zeno%27s_paradoxes
LLMs are an undeniably valuable tool, and governments like to control those.
But what is impactless to the wider world will always be as significant as something that never existed.
The question is not whether they'll prohibit open-weight models better than the US ones, because we all know the obvious answer.
The unbearable cheapness of open weight models
https://news.ycombinator.com/item?id=48668255
gemma4-26B (#7)
qwen-3.6-27B (#9)
https://news.ycombinator.com/item?id=48640196
Certainly the gap is closing but I feel it still makes more sense to pay pennies to run the full sized open models hosted on much better hardware.
First, we can not be sure the next release will remain open weights as Qwen 3.7 has showed.
And second, they are all Chinese models. So instead of open weights, perhaps Chinese AI models is a better word choice.
Qwen has always alternated having an open release followed by a "max" release that isn't open weights.