Show HN: PMB – local memory for coding agents that shows if it is used

docheinestages•about 3 hours ago

We really need a "memory arena" to serve two important purposes:

1. List all the known agent memory projects (of which there are hundreds)\ 2. Objectively compare and score them both against each other and vanilla harnesses like Claude Code

Only then can I have the cognitive capacity to decide which one makes sense for me.

oleksiibond•about 1 hour ago

Agreed, and point number two is the tricky one. Creating a list of tasks is easy; evaluating them is not. You need a consistent task set, a "clean slate" control (i.e., Claude code without memory is your proper control) and an evaluation criteria which differentiates "uses fewer tokens" from "produces better results," otherwise you end up with vendors evaluating their own work.

Currently constructing a repeatable test harness for PMB: Fixed task, with/without memory, repeated N times, giving number of tokens/turns/passed/not passed with a subjective quality score too. Would be happy to share the task set and evaluation criteria for testing on anyone else's memory server or clean slate control, not just mine.

cyanydeez•about 2 hours ago

every time I see these memory agents, all I can think about is context bloat and posioning. We know humans have trouble with memories from a different realm: to "remember" something of significance, the human brain reconstructs the entire experience, which is why they're so easy to influence.

That seems to be what most of these systems are doing: amplifying erros and hallucinations more than anything else.

oleksiibond•about 1 hour ago

It is a legitimate worry, but I would make it two separate questions since bloating and poisoning have their own solutions.

Bloating: PMB does not inject anything into the store, just gets a small top-k relevant snippet for every task - normally a few hundred tokens, not an increasing dump from the store.

Poisoning is the one that is more interesting and your example with reconstruction proves the point that PMB does not have LLM on its read side. Human memory - and indeed any mechanism that uses paraphrasing while recalling information via the model - reconstructs the information on its own each time, and that's what makes it susceptible to manipulation and hallucination.

That which cannot be accomplished is to correct garbage in, where a lesson wrongly learned is faithfully recalled. Mitigations include the fact that everything is verbatim, source-stamped (by who, when, session), de-duped, recently decayed, and correctable, and all of which is displayed on a dashboard – making an error of recall detectable and auditable, rather than silently reconstructive drift. Detection of conflict/supersession is the next build.

oleksiibond•about 1 hour ago

Repo for anyone who wants to look under the hood: https://github.com/oleksiijko/pmb

aerzen•about 3 hours ago

This looks like something I'd need - AGENTS.md only gets you so far. Does anyone have experience with using memory like this?

My main concern is that it can overwhelm the context window with useless facts.

oleksiibond•about 1 hour ago

Proper Framing - AGENTS.md is fixed and always contextual irrespective of its relevance; memory is the exact opposite of it. PMB does not dump the store in, rather extracts only a small top-k relevant chunk per task (a few hundred tokens generally), hence, more storage in the form of facts does not necessarily translate to more context. "Useless facts" are precisely what it is grading against, i.e., items which do not serve any purpose will simply be considered dead and decayed.

Show HN: PMB – local memory for coding agents that shows if it is used

⚡ Community Insights

Discussion (7 Comments)Read Original on HackerNews