ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
69% Positive
Analyzed from 1286 words in the discussion.
Trending Topics
#agent#model#haiku#models#opus#article#expensive#code#cheap#tasks

Discussion (24 Comments)Read Original on HackerNews
> 4 out of 5 failures never reach Opus. A triager match costs around 25x less than a full investigation.
The title feels misleading. Why clickbait on that when you can just be genuine about the architecture?
Despite the original title, a lot of what we learned comes to how Opus evolved and the ability to reason. And also the fact that Haiku is quite capable if scoped properly, that's the whole purpose of the article.
I think you're misrepresenting the whole thing. The blog post boils down to introducing a specialized triage step which is then offloaded to a cheap model. The cost savings come from skipping the expensive model. It has absolutely nothing to do with what choice of expensive model is being used. You could write the same model by completely ignoring and omitting the expensive model.
Unless you're evaluating the agent/person doing the debug session, why would you not provide them with some relevant insight about the problem you have? Given that you're pretty sure about it, of course.
“Let a cheap agent decide if the expensive one is needed.”
We wrote another post that was on HN some time ago that goes into the details of SQL queries (linked at the top of this article). Sonnet is perfect for this.
Buffer user prompts, use conversation history and repo state as context -- and run a local model or a cheap and fast cloud model like Haiku to determine the optimal way to address the user's ask, reframe the query with better context (user reviews and approves if needed) and THEN let expensive models like Opus have a go at it.
If we are operating within Anthropic ecosystem with Haiku and Opus -- this sort of logic should ideally be doable within Claude Code as harness. Currently skills cannot be tagged to different models. Ideally we should be able to say -- for trivial tasks, the skill should always use Haiku even if invoked from a session with Opus xhigh.
You can set the model for a skill. You just set model: haiku at the top and it will use haiku! You can even set the effort level, look for “Frontmatter reference” in this doc article: https://code.claude.com/docs/en/skills
The main driver for writing our own agent was to leave it out of the sandbox (the agent loop runs on our backend, we call the sandbox only when needed). We wrote another post about that (it's the latest post on the blog).
However, I am curious how would you implement the triager pattern by only using Claude Code as harness.
I'm planning to self host qwen3.6 27b basically for this purpose
The local models would also be queryable on-demand (which overrules the 24/7 tasks in terms of priority) as cheap inference. The idea is that in user-queried interactive tasks, the main Claude agent primarily only gets summaries from other agents and makes decisions based on it, thus saving a ton of tokens compared to giving it access to the codebase. These small-model calls would preferentially route to my local model to save costs but overflow to a cloud provider if demand is momentarily too high.
My theory is that the AI frenzy has reached new levels of insane, where it's literally just throw anything and everything at the model, and just burn tokens to let the AI figure everything out. Why bother paying the upfront cost for a RAG, when the models/agents are constantly evolving, so just slap in a markdown file telling it to check a folder, and call it a day.
Like in design world, people are doing minor tweaks like changing the spacing by typing in prompts instead of just changing a number in an input field. We are legitimately approaching just using llms instead of calculators, or memes like that endpoint that calls an llm to generate the code to do some business logic, rather than directly code the logic.
It's what Claude Code is doing now and the principles we applied for Mendral as well.
That said, you're right that some smaller models can outperform Haiku and we're thinking supporting oss models at some point. But it does not change the core design principles IMO.
just seems wasteful all around. having an agent in the critical path when a regular expression (or similar) could do just seems odd. yeah haiku is cheap but re.match() is cheaper.
We started to add deterministic matching on the patterns that the agent sees the most so we don't have to go through the whole thing (for example a flake on PostHog can occurs 100+ times during a day, you don't need to reinvestigate every time). But for new errors, it's tricky.