Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

75% Positive

Analyzed from 402 words in the discussion.

Trending Topics

#model#qwen#models#gemma#without#tools#self#full#tool#access

Discussion (12 Comments)Read Original on HackerNews

CharlesWabout 2 hours ago
Previously: https://news.ycombinator.com/item?id=48709744

https://swelljoe.com/post/will-it-mythos/: "Poor performer here, only found the one bug that almost every model found, despite its performance on other benchmarks being excellent for its size. […] It also performs poorly in a chat without tools, exhibiting an ehthusiasm for hallucination. I’m currently working on a replication of this with full tool access, including bash/Python, which may allow this model to be competitive."

NitpickLawyer36 minutes ago
> It also performs poorly in a chat without tools, exhibiting an ehthusiasm for hallucination. I’m currently working on a replication of this with full tool access, including bash/Python, which may allow this model to be competitive.

How is that a serious phrase in '26? I mean I have no idea if this fine-tune is good, haven't tried it, but testing a (clearly) agentic model without tool access and expecting it to work is crazy, no? What was he even testing?!

vikingcat22 minutes ago
Maybe expecting it to recognize it's limitation without tools instead of hallucinate. But yeah, not wholly useful. It's performance (and proclivity to hallucinations) with tools is what really matters.
ricardobayes27 minutes ago
This is the first Qwen fine-tune that is not immediately rejected by the local LLM community, and in some cases even being recommended. Based on my limited usage, it is good, gives creative solutions to coding problems. I don't expect 9-35B models to one-click create full apps. Most people who were complaining did so .
monkmartinez2 minutes ago
> Most people who were complaining did so .

It has been this way since the beginning, unfortunately. There is certainly no harm in trying on local models on local workloads with modest guardrails.

Like most of these models (Qwen, Gemma, Llama, gpt-oss), finding all the little gotchas like, special tokens and prompt structure, model preference are a PITA right now. The reward are really nice models that run exceptionally well in agentic harnesses tuned with the prompts and parameters you fought so hard to learn.

anana_20 minutes ago
They keep mentioning a 31B dense model, but there are no benchmarks or weights for it anywhere?
kennywinkerabout 2 hours ago
Can anyone explain what’s the story here? Is this just a re-skinned qwen? Who is deepreinforce-ai and why isn’t this model listed on their website?

How does it self-improve, does the model change on disk - or just during a single context run it gets better?

simonwabout 2 hours ago
It doesn't self-improve, that's a misleading headline.

As far as I can tell they trained it by running their own reinforcement learning on top of Qwen and Gemma 4 (not sure how they combined weights from both, or if they used Qwen as the basis and Gemma 4 to help train?) - so the "self-improving" is about their training process, not how you use the weights.

kamranjonabout 2 hours ago
I think the 9b and 31b dense are Gemma models and the 35B-MoE, and 397B-MoE are Qwen models since these are model sizes covered by each of them respectively
kennywinkerabout 2 hours ago
Gotcha. That makes more sense. We ran the model to train the model -> “self-improving”.
S0y39 minutes ago
These are simply benchmaxxed versions of either Qwen or Gemma 4.
jorisw21 minutes ago
Citation needed