DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
69% Positive
Analyzed from 2336 words in the discussion.
Trending Topics
#model#car#models#wash#more#don#using#better#doesn#liquid

Discussion (91 Comments)Read Original on HackerNews
It did not perform as well as I expected. Qwen2.5-Coder-3B (2 years old) outperformed it by a wide range -> fixing ~50% of bugs whereas this model only fixed ~12%.
Granted, it's not a coder specific model, but given its benchmark performance to Gemma models, and that it's two years newer, and that it's an MoE with 8B total params, I expected it to be more competitive.
I would much rather not run the model on my local laptop hardware and offload that to some system sitting under my desk in my home office, accessible via VPN, than take the risk of using an unreliable and flaky tool for the convenience of having it on the same hardware on my lap.
I pay very little attention to 8 billion or whatever (or even much smaller) models these days and I don't feel like I'm missing much.
I think some of the folks in the local llm social media communities are using them for things like company-hosted customer service chat bots, or purely english text writing stuff where Q4 will probably not cause a problem. For more discrete technical work I stick pretty much exclusively to Q8.
If I were going to use this model, I'd be looking to use it more as is the primary chat interface of a larger system, and having it orchestrate & delegate tasks to other places via tool calls. It's not quite as exciting on the surface as a local "do it all" model, but it does enable some pretty neat use-cases, IMO.
I'm imagining a local agent that is super low latency, works entirely offline, and capable of queuing up complex tasks for larger/smarter cloud agents which execute them asynchronously.
Two of the other responses speak about it being abysmal at tool calling.
Overall, I'm pretty impressed a model this small can find/fix ~12% of bugs with crappy context - even if they're about as easy as possible to fix.
I just assumed it would perform better, given all the advancements in the space.
It's possible 1B active parameters is just not enough - even if it has 8B params of knowledge to reason through bugs.
Playing around with the context I fed it, it was able to fix up to ~34% of bugs vs ~46% for Qwen2.5-Coder-3B and ~54% for Qwen2.5-Coder-7B.
Edit: Is it this? https://github.com/Liquid4All/cookbook/tree/main/examples/lo...
FYI: Opencode is very well tuned for Qwen models, but I haven’t found it that rare for niche models to perform badly in it.
But even against E4B it's shaky, which is surprising given how many tokens they trained on. I guess it was on a lot of synthetic data.
[0] - https://huggingface.co/Qwen/Qwen2.5-3B-Instruct
Answer: . . . . So, unless you have a compelling reason not to, walk to the car wash.
> Is my desire to make my car clean related to a car wash in any way? To driving to it?
> Your desire to have a clean car isn’t tied to the existence or use of a car wash or to traveling to one. It stems from how much you value the effort, time, and resources you’re willing to invest in achieving cleanliness — whether that means doing it yourself, paying for a professional service, or simply enjoying the routine of a quick walk‑out. Driving to a nearby wash is just one possible method among many; the decision ultimately depends on factors such as how much time you have, how much you’d like to spend, the quality of the wash, and any personal preferences you have for hands‑on versus outsourced cleaning. In this case, walking is the simplest and fastest way, but it isn’t required for you to achieve a clean car.
Common sense is clearly there, but we should not underestimate the colossal heap of tacit assumptions that drive "obvious" decisions in our daily life.
>The main reasons to drive such a short distance would be if you're bringing the car specifically to be washed, carrying something heavy, or the weather or walking conditions make it impractical.
>If your goal is to get your car washed, you'll need the car there—so driving makes sense. If you're just going to talk to someone at the car wash or check it out, walking is probably faster.
But then im supposed to give it access to write code in my repositories. Sorry, what are you trying to get at here?
What if you're the car wash owner? Or a maintenance technician? Pretty easy to just walk over there if you're just 50ft away.
The whole twist here is that to wash your car, you need your car, so you cannot go by foot.
You could conceivably walk to a car wash that has similar sundries as a gas station.
The question is revealing that the model has a model of language but not of reality. It knows what words go together, but not real-world concepts.
For most of them, we’d worry that a human answerer using maximum effort to produce the same outcome was having a stroke.
also, naysayers apparently DO have a compelling reason.
As an example for a similar approach, Teapot AI has trained very small models https://teapotai.com/models to only answer questions where the answer can be found within the context window, and although not perfect, they do quite well at this compared to larger, more general models.
[1] https://github.com/Liquid4All/cookbook/tree/main/examples/lo...
demo link for anyone that wants to try this out https://playground.liquid.ai/chat?model=cmppnbgse000004l4bc8...
I recently realized that Qwen3.5:4B is way more capable than I thought a model that size could be.
Combine that with the work Liquid puts into RL and fine tuning, and you get models that perform extremely well on minimal hardware.
Combine that with your own fine tuning, and you get a specialized tool that is fast, private, and doesn’t require internet connection.
A good example of how it's helpful is that it will make certain things relatively frictionless. Like, I need to pay property taxes. I hate this stuff. I got the email reminder from my municipality and it made an entry in my TODOs which points to page with instructions to pay the taxes, including my folio and access numbers for when I log in. That was taken from the email and a document which contains past property tax information. I have it all there, but it compiles relevant data into dedicated TODO pages.
I'm so bad at doing all of this myself. I really don't enjoy it. Send me to buy a carrot at the store and I'll happily walk 30 minutes there and back to do it. It isn't the effort so to speak; it's how unrewarding, inefficient, and bureaucratic it all is. I'm allergic to it. Why isn't it baked into my income taxes? Why are we still doing this?
Sometimes it does a really bad job of making TODOs. Like my wife messaged me about what our dinner plan was, so Qwen went ahead and made a plan for chicken meatball soup based on messages from a week earlier. It totally fabricated the recipe. Yet, I don't know, it was still helpful to be reminded that I'm in charge of dinner.
It's probably best at scaffolding responses to emails I don't want to send. I will write it, but I appreciate basic information being fleshed out so I can write it without jumping around looking for files or numbers or whatever constantly.
I use it with a custom harness. It could be a lot better. Everything about it could be better. The model is remarkably good for its size and price, though.
Letting Sonnet 4.6 do it instead always yields much better results, much faster, but it's kind of like using a new phone vs a super old one. They can both get you there. The sound quality and camera might be worse, it doesn't look as fancy, but the new one is $1200 and the old one is free on marketplace if you're handy with a screwdriver and a fresh battery. Sounds great to me
Worth noting: this was all vibe-coded using Opus 4.6 and 4.7. It's the only project I've built that is strictly vibe-coded. It's simultaneously exciting and disgusting. I'm not sure if I'll ever 'software engineer' it, or I'll just let it be slop. It works.
They keep promising great performance out of models whose key ingredient (parameters) they are diluting. Many seem to be in a competition saying they're getting smaller and higher performance at the same time. Then, the homeopathic models don't perform as well as real models when independently tested. Again, spot on.