HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
60% Positive
Analyzed from 1159 words in the discussion.
Trending Topics
#model#more#knowledge#reasoning#models#tool#base#old#able#drive

Discussion (26 Comments)Read Original on HackerNews
I'm glad to see more domain-focused SLMs, we need more of them! A programming focused MoE should work well across many languages.
It is a cheap specialist for closed-world, verifiable reasoning tasks like math, self-contained coding problems, and similar.
"Closed-world" means the needed information is already in the context. It is not a tool-using agent that can discover missing context. "Verifiable" means answers are hard to generate but easy to check.
So no open ended research, repo wide agent work, factual Q&A, or SVG generation. More of a compact reasoning module for bounded problems.
Could you teach a 5 year old to drive a car? A 10 year old? A 12 year old? To drive a car requires being able to read, to have judgement about ice or rainy conditions, to anticipate a child running after a ball. By the time a human in in their mid teens they have acquired the base knowledge...
Small models need to have enough base knowledge to be able to be good enough -- even in a seemingly narrow regime. Where is that? Obviously they don't need all the obscure knowledge of a frontier model but there is some base level which is probably more than it would first seem.
I would be interested to see a formal study of this. I say this not out of anything other than a observation that I think the only real blockers are a) judgement, and b) physical reflexes/strength. As a kid I was certainly aware of ice,snow, and rain, because I road my bike year round and had low confidence in my own ability to control my bike on snowy or wet terrain, especially during season changes. That translated into learning to drive in northern Canada in the winter and applying those lessons to driving.
In an environment devoid of consequences, I have seen kids operate driving simulations (both real simulations, and video games) with a degree of precision that is shocking, including seeing several 9-11 year olds play the simulations and games with a much higher degree of confidence than adult drivers. Children have an awareness that the simulations are consequence free, unless given other motivation. Adults that are consistent drivers have muscle memory and preconceived expectations that govern the decisions they make when playing the game. I am curious about the level of training and exposure required for children to overcome their lack of awareness of the hard limits and consequences of driving and driver error, versus the amount of training and exposure required for expert drivers that are novice gamers to stop applying their learned experience to consequence free simulations.
This requires not only knowledge, but also the control systems that develop with the prefrontal cortex. LLMs don't do much control yet.
Sadly that's not how LLMs work, since all they do is "token prediction". At least the models we have to today ...
> these findings motivate the Parametric Compression-Coverage Hypothesis, which views verifiable reasoning as compressible into compact reasoning cores, while open-domain knowledge and general-purpose competence require broad parameter coverage over facts, concepts, and long-tail scenarios.
These kinds of models might be more useful as tools to be used by larger orchestrator models, than being the orchestrators themselves.
It would look really dumb if someone asked it that, but that's fine. You're trying to make a model that is optimized for efficiency for a specific task. As much as possible, you should prune uncorrelated things.
https://swelljoe.com/post/will-it-mythos/
That's also more aligned to its leetcode style training data, the code under test is fully in the context window. It might be interesting to have a bigger tool use model go through the effort of collecting the context, and feeding it into this kind of model for analysis only. It becomes more of a thinking tool, instead of the orchestrator.