Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
57% Positive
Analyzed from 2095 words in the discussion.
Trending Topics
#models#model#open#training#data#chinese#frontier#source#labs#don
Discussion Sentiment
Analyzed from 2095 words in the discussion.
Trending Topics
Discussion (49 Comments)Read Original on HackerNews
For an (Chinese) open weight model to surpass the (US lab) frontier models, this equation must flip and the Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data; as well as procuring latest generation hardware en masse for this. This does not happen easily. Also training a frontier scale model is actually not such an unimaginable feat: doing all the inference with the teacher models is where the hardware goes.
Even if your characterization is accurate, they could do this tomorrow and are not so myopic that they wouldn’t have thought about it. I don’t see this as a barrier, and I see a lot of the same underestimation of Asia that’s been happening for 50 years. There’s not some innate American advantage to building LLMs, and personally I think whatever head start the US has is going to be squandered on delays from the export control “to dangerous for release” LARPing we’re seeing.
The spigot can be turned off at any time.
Until there's some sort of "community owned hardware", open weights models are always at risk of being discontinued.
And there will always be incentivised parties that release models. Nvda for one has every incentive to keep the nemotron line going, as they're directly profiting from people running this. And the models aren't really far from open SotA anyway.
Goog will probably continue to release the small models, since they'll use them for browser stuff anyway, and know that they'll leak. So for them it's a win-win to release the small models and gain some dev market share.
And the chinese labs also have incentives to keep releasing models, and will likely continue to get gov support to do so (yay commercial wars between nations).
Your right to 3d print whatever you want is about to be taken away (in California).
What software you can run on your computer can already be restricted.
Absolutely everything can be taken away. The simplest way to remove open models is probably to declare them a tool that terrorists could use. Crazy? Yes, the world is totally crazy these days.
Everything cannot, in fact, be taken away. Don't propagandize yourself. Some things, like information, are free. Not even China can prevent all its citizens from accessing Western internet. USGov simply does not have the resources to find and audit every hard drive and USB stick in the country for illegal files. The internet cannot be censored 100% without literally cutting every cable and confiscating every radio.
The software that runs on my computer cannot, in fact, be restricted. It can be declared illegal, but there literally is no mechanism by which it can be enforced other than a government goon standing over my shoulder 24/7.
Some freedoms really cannot be removed without utterly implausible amounts of effort. Arguing otherwise is helping to erode freedom. So stop it.
A model that writes code without knowledge of any language or library changes for half a decade is less useful. A 2021 era chatgpt would be quite quaint in 2026.
Right now the Chinese labs might have incentives to release their models for free, and maybe Google is happy to release open weights today, but I'm sure there are already bean counters at Google salivating at the idea of having Gemini in Chrome as part of a Google AI monthly subscription just like YouTube premium and other Google subscriptions.
They're releases so far have been kind of lackluster compared to Qwen and other Chinese models. My suspicion is that Nvidia won't be releasing models that appear to compete with frontier models because that would upset their big customers.
I remain hopeful that we'll be able to democratize the entire tech stack for this tech.
True. And it's possible that this has already happened at Alibaba Qwen - at least for the smaller models that people had a chance of running at home (122B and smaller).
Its higley unlikely we get another open llama model though after the llama4 flop, even if their muse spark seems pretty good.
Or until some bright people figure out drastically more efficient means of training.
Not the same thing.
It’s used right in the articles body, but title is misleading.
The name is bad, doesn’t even make any fucking sense and it gives open source a bad rep.
I gave up. No one cares. And no one will ever tell the truth about the training anyways.
Substantial and growing freedom beats zero freedom ever again.
If the closed models stop improving will the progress of open models slow?
Some people in China surely know.
> Like if the closed models stop improving will all the closed models also stop improving?
Seems extremely unlikely, unless the models all hit some kind of wall soon. The Chinese companies may be behind the US in compute capacity, but they have excellent researchers [0] who are probably approximately as good as their US counterparts at the kind of problem generation and RL that is currently working so well.
I would be very surprised, though, if the models cannot continue to be improved rapidly in any area that allows a tight feedback loop like programming, at least up to the point where we puny humans lose the ability to define objective functions.
(And, conversely, I don’t expect magic in fields where the feedback is slow or expensive. A model is not about to reliably invent a wonderful medicine for the same reason that a large and extremely competent pharma company cannot: the evaluation process is extremely slow and it’s so expensive that the kind of utterly enormous corpus that is driving the current progress in coding is simply not available. Running RL on m iterations of n medication-development trajectories each is going to cost n*m times $10-100 million and take m years if it’s even possible at all.)
[0] The US advantage in this space will likely decline, since the brain drain from the rest of the world via the US university system to US labs is drying up.
On paper frontier models will be ahead of the curve but I don't think hardly anyone will be able to tell if a piece of work, say a landing page, is created with Fable or GLM and that is the point. The perceptible intelligence will reach a point beyond which it is no longer considered, except for some narrow use-case.
Or is the idea that more advanced models will unlock more use cases?
In this case it may actually apply though, no? Open models get better from closed model distillation?
[0] https://en.wikipedia.org/wiki/Zeno%27s_paradoxes