DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
56% Positive
Analyzed from 1618 words in the discussion.
Trending Topics
#rtk#context#output#tool#tokens#token#information#more#frequency#agent

Discussion (42 Comments)Read Original on HackerNews
My criteria are: if it’s not in a harness it’s probably not that good (the best ideas float up to Codex/Claude imo) and any GitHub advertising some percent of token savings is not to be trusted.
It’s hard to avoid the snake oil and I hope people start thinking critically on this stuff.
Whether or not RTK actually does this has not been established. I would be glad to see some proper benchmarks done on the actual difference this tool makes (not some meaningless "up to 90%" type of language).
Tool use output represents a large amount of my output. I'll take 3.7M tokens saved on 3.9M tokens of input. Tokens saved are tokens saved.
> 3. Where Are the Accuracy Benchmarks?
As a user of RTK, it would be nice to see accuracy benchmarks. However, I've seen no evidence of the model missing anything critical as a result of the compression. As part of their design philosophy they are very strict about preserving correctness to the point that if a filter fails they fall back to raw output. For my most frequently used commands I've inspected the source, was happy with what I saw, they've earned my trust thus far.
> The day git, cargo, npm, or grep updates its terminal formatting by a few spaces or changes an error layout, RTK's regex and parsing filters will break. And returning to the silent failure trap, it won't throw an explicit error; it will fail quietly, feeding corrupted or partial text to your agent.
Again, any filter that fails simply falls back to the raw output. One of their core pillars is avoiding this exact scenario you described. RTK should never feed corrupted or partial text to an agent.
Your concerns are fair but I'd like to see your criticism backed up with evidence. Have you used RTK? Have you found evidence that they are failing to preserve correctness?
"native/built-in Read or cat tools, the data is not intercepted by RTK's shell hook"
But if it's making a dent in token usage (which I have not personally measured), then that's great.
I had to add some system prompt instructions to Pi to help it work (GPT 5.5 initially got confused when `git status` looked different than expected). The Claude Code extension appears to do a proper job of informing the agent about the unexpected shape of the output without any extra work on my part.
My take is that handling so many versions and so many different tools shouldn't be the work of any single repo. The responsibility should be either on coding agent to compress or best case scenario people who are responsible for cli tool
I've been trying it out for a couple days and it seems kinda OK or whatever. If that upsets you, then that's your problem.
I might dump it later on if it doesn't provide much if a benefit. I typically try out new things, then cull whatever doesn't work. This tool seems pretty neutral for now, at least.
https://github.com/toon-format/toon is another interesting one, and I feel like it takes on a much more achievable goal - reduce whitespace and verbosity of JSON, not overall context compression.
There's no gamification of savings here. Tool output can be meaty.
Is the author skeptical of the concept, or the implementation? Because only one of those is worth critiquing.
Concept is fine to me and I believe we should optimize, but a repo that will handle all tools sounds like Sisyphus rolling a rock up the hill.
not the question is which X tokens and which Y tokens? and since the output is non-deterministic how do you validate this?
LLMs aren't random and that enforces something that people are too dumb to realize that random-ness could be normally distributed but LLMs have no reason to be normally distributed or follow any sort of curve of understanding.
They are non-deterministic but with bias so their output might be just be worse with T' transformation for the class of problems A is solving but work great for B. or vice versa.
You can't reproducibly test LLMs and that allows all sorts of benchmarks to exist which can make any model look good or bad as much as we want. Enlightening stuff.
Not much different from sociological or psychological sciences where with enough bias in data you can prove anything.
If you need a piece of information that is buried somewhere, or a high-level summary/distillation of a larger body of info, then subagents may be the right tool for the job.
If you need all the gathered context for later use (i.e. distilled context is insufficient), then subagents probably are not the right tool for the job.
What do you mean by aggresive context management with subagents? Would you add a lopp that would trim the context?
Both of those tasks seem even more difficult
How human cognition tends to work by simultaneously utilizing and combining/separating multiple frequency scales of information. A simple way of thinking about is this: We tend to encode and retrieve both the gist of what is happening, and the verbatim details of what happened. The gist can be thought of as low frequency information, almost like bullet points, that contain the big overview goal, keypoints). The verbatim traces, are the high resolution memory that contains all the details. The gist helps encoding and recall by providing encoding and retrieval context cues. There are also levels in between those two, but I was keeping it simple. During human development, verbatim memory capacity increases first, but then hits a wall/plateau. Further performance increases begin to depend on the ability to utilize and gain from gist-like representations that can guide encoding and retrieval of verbatim details within contexts.
You don't need to keep everything in the context window. My untested, perhaps naive hypothesis is that what is needed is that sub-agents dealing with verbatim tasks (actually writing code), their context window should be managed by an agent above that is tuned to information at a lower frequency, and it by another above it on even lower frequency information. Lowest frequency information context windows feel up slowly. High-frequency information fills up fast. Use the low frequency information to retrieve the needed high frequency information.
I wish the author would have provided one.
and it stands for Rust Token Killer
So do you think rtk cli is ai slop? I had some suspicions looking at their repo and number of issues and their style. The prettier issue with running successfully while binary wasn't even installed was quite entertaining