Show HN: A memory database that forgets, consolidates, and detects contradiction

ppranabsarkar about 6 hours ago 22 commentsRead Article on github.com

ZH version is available. Content is displayed in original English for accuracy.

Vector databases store memories. They don't manage them. After 10k memories, recall quality degrades because there's no consolidation, no forgetting, no conflict resolution. Your AI agent just gets noisier.

YantrikDB is a cognitive memory engine — embed it, run it as a server, or connect via MCP. It thinks about what it stores: consolidation collapses duplicate memories, contradiction detection flags incompatible facts, temporal decay with configurable half-life lets unimportant memories fade like human memory does.

Single Rust binary. HTTP + binary wire protocol. 2-voter + 1-witness HA cluster via Docker Compose or Kubernetes. Chaos-tested failover, runtime deadlock detection (parking_lot), per-tenant quotas, Prometheus metrics. Ran a 42-task hardening sprint last week — 1178 core tests, cargo-fuzz targets, CRDT property tests, 5 ops runbooks.

Live on a 3-node Proxmox homelab cluster with multiple tenants. Alpha — primary user is me, looking for the second one.

⚡ Community Insights

Discussion Sentiment

48% Positive

Analyzed from 1943 words in the discussion.

Discussion (22 Comments)Read Original on HackerNews

pranabsarkar•about 6 hours ago

Author here. I built this because I was using ChromaDB for an AI agent's memory and recall quality went to garbage at ~5k memories. The agent kept recalling outdated facts, contradicting itself across sessions, and the context window was full of redundant near-duplicates.

I tried to write the consolidation/conflict-detection logic on top of ChromaDB. It didn't work — the operations need to be transactional with the vector index, and they need an HLC for ordering across nodes. So I built it as a database.

The cognitive operations (think, consolidate, detect_conflicts, derive_personality) are the actual differentiator. The clustered server is what made me confident enough to ship — I needed to know the data was safe before I'd put real work on it.

What I genuinely want to know: is this solving a problem you're hitting with your AI agent's memory, or did I build a really polished thing for my own narrow use case? Honest reactions help more than encouragement.

all2•about 3 hours ago

I've bookmarked this. I'll let you know what I find over the next few weeks.

I'm in the middle of building an agent harness and I haven't had to deal with long-running memory issues yet, but I will have to deal with it soon.

pranabsarkar•about 3 hours ago

Thanks, really appreciate it. I am using the server as MCP server and connected all my workspaces. It has definitely changed my experience.

SkyPuncher•about 1 hour ago

How can you decide if something is a contradiction without having the context?

I'm incredibly interested in this as a product, but I think it makes too many assumptions about how to prune information. Sure, this looks amazing on an extremely simple facts, but most information is not reducible to simple facts.

"CEO is Alice" and "CEO is Bob" may or may not actually be contradictions and you simply cannot tell without understanding the broader context. How does your system account for that context?

Example: Alice and Bob can both be CEO in any of these cases:

* The company has two CEOs. Rare and would likely be called "co-CEO"

* The company has sub-organizations with CEOs. Matt Garman is the CEO of AWS. Andy Jassy is the CEO of Amazon. Amazon has multiple people named "CEO".

* Alice and Bob are CEOs of different companies (perhaps, this is only implicit)

* Alice is the current CEO. Bob is the previous CEO. Both statements are temporally true.

This is what I run into every time I try to do conflict detection and resolution. Pruning things down to facts doesn't provide sufficient context understand how/why that statement was made?

pranabsarkar•about 1 hour ago

You're right. Pruning to isolated facts loses the structure that disambiguates them. Three partial mechanisms the system has, none of which fully solve your point:

Graph edges carry scope. Alice ceo_of Acme and Andy ceo_of Amazon are two edges with different src/dst — conflict scanner looks for (src, rel_type) → ≥2 dsts, so Garman/Jassy don't false-flag if edges are modeled. Gap: most agents just write raw sentences and never call relate().

Temporal decay handles "previous vs current" weakly. half_life × importance attenuates old memories. But that's fade, not logical supersession — the DB doesn't know time-of-validity, only time-of-writing.

Namespaces segregate scope when the agent uses them. Leans on the agent.

Honest result from a bench I ran today (same HN thread): seeded 6 genuine contradictions in 59 memories, think() flagged 60. ~54 are noise-or-ambiguous exactly in the ways you listed. Filed as issue #3.

Design stance: contradictions are surfaced, not resolved. yantrikdb_conflicts returns a review queue; the agent has conversation context, the DB doesn't. "These two may be in tension" not "these are contradictory." That doesn't fix your point — it admits the DB can't make that call alone. Co-CEOs, subsidiaries, temporal supersession need typed-relations + time-of-validity schema work. That's v0.6, not v0.5.

endymi0n•about 3 hours ago

I've experimented quite a bit with mem0 (which is similar in design) for my OpenClaw and stopped using it very soon. My impression is that "facts" are an incredibly dull and far too rigid tool for any actual job at hand and for me were a step back instead of forward in daily use. In the end, the extracted "facts database" was a complete mess of largely incomplete, invalid, inefficient and unhelpful sentences that didn't help any of my conversations, and after the third injected wrong fact I went back to QMD and prose / summarization. Sometimes it's slightly worse at updating stuck facts, but I'll take a 1000% better big picture and usefulness over working with "facts".

The failure modes were multiple: - Facts rarely exist in a vacuum but have lots of subtlety - Inferring facts from conversation has a gazillion failure modes, especially irony and sarcasm lead to hilarious outcomes (joking about a sixpack with a fat buddy -> "XYZ is interested in achieving an athletic form"), but even things as simple as extracting a concrete date too often go wrong - Facts are almost never as binary as they seem. "ABC has the flights booked for the Paris trip". Now I decided afterwards to continue to New York to visit a friend instead of going home and completely stumped the agent.

pranabsarkar•about 2 hours ago

Fair criticism — and the failure modes you describe aren't mem0-specific, they hit any system that extracts atomic facts from conversation. I hit a couple of them today while benchmarking YantrikDB's own consolidation (see my reply to polotics): "Alice is CEO" got merged with "Sarah is CTO" on cosine similarity alone because the sentences share too much structural scaffolding. That's exactly the "facts in a vacuum" problem you're naming.

Two small clarifications:

remember(text, importance, domain) takes a free-form string — nothing forces atomic facts. A QMD-style prose block, a procedure, a dated plan, all work. The irony/sarcasm-inverts-the-fact failure mode lives in the agent's extraction layer, not the backend. So "write narrative into it, recall narrative out" is a legitimate usage pattern; the DB is agnostic.

YantrikDB's actual differentiator vs mem0 is temporal decay + consolidation + conflict detection, not smarter fact extraction. The "ABC has the Paris flight booked → actually I'm going to NYC" problem is meant to be addressed by decay (the old fact fades) and contradiction flagging (the new one triggers a conflict for the agent to resolve). But — honest read — my bench today showed conflict detection needs work to actually fire on raw text. Filed as issues #1 and #2, fixing now.

Broader point stands though: if the agent is producing brittle inferred facts upstream, no memory backend saves it. The DB can manage rot and contradiction. It can't fix bad inference. For what it's worth, I mostly use it for durable role context ("user is a data scientist on observability") rather than event lifecycle ("Paris flight booked") — the latter is what prose summarization is genuinely better at, and I think you're right that mem0-style auto-extraction applied to lifecycle events is a bad shape.

lemming•about 1 hour ago

So soon our agents will need spaced repetition flash cards for the things we want them to remember! I eagerly await Anki for Agents.

grant-ai•28 minutes ago

I have experienced similar degradation with vectors at scale.

tcdent•about 3 hours ago

I appreciate the effort you put into mapping semantics so language constructs can be incorporated into this. You’re probably already seeing that the amount of terminology, how those terms interact with each other, and the way you need to model it have ballooned into a fairly complex system.

The fundamental breakthrough with LLMs is that they handle semantic mapping for you and can (albeit non-deterministically) interpret the meaning and relationships between concepts with a pretty high degree of accuracy, in context.

It just makes me wonder if you could dramatically simplify the schema and data modeling by incorporating more of these learnings.

I have a simple experiment along these lines that’s especially relevant given the advent of one-million-token context windows, although I don’t consider it a scientifically backed or production-ready concept, just an exploration: https://github.com/tcdent/wvf

pranabsarkar•about 3 hours ago

Thanks for the careful read — the "schema is ballooning" observation is real and I've felt it building this. You're pointing at a genuine design tension.

My counter, qualified: deterministic consolidation is cheap and reproducible in a way LLM-in-the-loop consolidation isn't, at least today. Every think() invocation is free (cosine + entity matching + SQL). If I put an LLM in the loop the cost is O(N²) LLM calls per consolidation pass — for a 10k-memory database, that's thousands of dollars of inference per tick. So for v1 I'm trading off "better merge decisions" against "actually runs every 5 minutes without burning a budget."

On 1M-context-windows: I think they push the "vector DB break point" out but don't remove it. Context stuffing still has recall-precision problems at scale (lost-in-the-middle, attention dilution on unrelated facts), and 1M tokens ≠ unbounded memory. At 10M memories no context window saves you.

wvf is interesting — just read through. The "append everything, let the model retrieve" approach is the complement of what I'm doing: you lean fully into LLM semantics, I try to do the lookup deterministically. Probably both are right for different workloads. Yours wins when you have unbounded compute + a small corpus; mine wins when you have bounded compute + a large corpus that needs grooming.

Starring wvf now. Curious if you're seeing meaningful quality differences between your approach and traditional retrieval at scale.

tcdent•about 2 hours ago

Appreciate the thoughtful reply.

Absolutely agree the deterministic performance-oriented mindset is still essential for large workloads. Are you expecting that this supplements a traditional vector/semantic store or that it superceeds it?

My focus has absolutely been on relatively small corpii, and which is supported by forcing a subset of data to be included by design. There are intentionally no conventions for things like "we talked about how AI is transforming computing at 1AM" and instead it attempts to focus on "user believes AI is transforming computing", so hopefully there's less of the context poisoning that happens with current memory.

Haven't deployed WVF at any scale yet; just a casual experiment among many others.

polotics•about 4 hours ago

In this day and age, without serious evidence that the software presented has seen some real usage, or at least has a good reviewable regression test suite, sadly the assumption may be that this is a slopcoded brainwave. The ascii-diagram doesn't help. Also maybe explain the design more.

pranabsarkar•about 3 hours ago

Fair. "Does consolidation actually improve recall quality on a running system?" is exactly the benchmark I haven't published, and it's the one that would settle the question.

What I do have right now:

1178 core unit tests including CRDT convergence property tests via proptest (for any sequence of ops, final state is order-independent) Chaos test harness: Docker'd 3-node cluster with leader-kill / network-partition / kill-9 scenarios (tests/chaos/ in the repo) cargo-fuzz targets against the wire protocol and oplog deserializer Live usage: running on my 3-node homelab cluster with two real tenants (small — a TV-writing agent and another experiment) for the past few weeks. Caught a real production self-deadlock during this period (v0.5.8), which is what triggered the 42-task hardening sprint. What I don't have and should: a recall-quality-over-time benchmark. Something like: seed 5,000 memories with known redundancy and contradictions, measure recall precision@10 before and after think(), and publish the curve. That's the evidence you're asking for, and you're right it's missing. I'll run that and post the numbers in a follow-up.

The ASCII diagram fair point too — website has proper rendering (yantrikdb.com) but the README should have an SVG.

Appreciate the pushback — this is more useful than encouragement.

6r17•about 3 hours ago

I kind of agree with the comment here that a lot of stuff happening around comes out from an idea without proof that the project has a meaningful result. A compacting memory bench is not something difficult to put off but I'm also having difficulties understanding what would be the outcome on a running system

Mithriil•about 3 hours ago

The half-life idea is interesting.

What's the loop behind consolidation? Random sampling and LLM to merge?

pranabsarkar•about 3 hours ago

No LLM in the loop. The consolidation pass is deterministic:

Pull the N most recent active memories (default 30) with embeddings Pairwise cosine similarity, threshold 0.85 For each similar pair, check if they share extracted entities Shared entities + similarity 0.85-0.98 → flag as potential contradiction (same topic, maybe different facts) No shared entities + similarity > 0.85 → redundancy (mark for consolidation) Second pass at 0.65 threshold specifically for substitution-category pairs (e.g., "MySQL" vs "PostgreSQL" in otherwise-similar sentences) — these are usually real contradictions even at lower similarity Consolidation then collapses the redundancy set into canonical memories with combined importance/certainty. No LLM call, no randomness. Reproducible, cheap, runs in a background tick every ~5 minutes.

The LLM could improve this (better merge decisions, better entity alignment) but the tradeoff is cost and non-determinism. v1 is deterministic on purpose.

Source: crates/yantrikdb-core/src/cognition/triggers.rs and consolidate.rs next to it.

hazelnut•about 3 hours ago

Congrats, looking promising. How does it compare to supermemory.ai?

pranabsarkar•about 3 hours ago

Fair question. Supermemory is a hosted SaaS built around embedding + ranking. YantrikDB is self-hosted and adds three things Supermemory doesn't do as first-class operations:

think() — consolidates similar memories into canonical ones (not just deduplication, actual collapse of redundant facts) Contradiction detection — when "CEO is Alice" and "CEO is Bob" both exist in memory, it flags the pair as a conflict the agent can resolve Temporal decay with configurable half-life — memories fade, so old unimportant stuff stops polluting recall Supermemory does more on the cloud side (team sharing, permissions, integrations). YantrikDB does more on the "actively manage my agent's memory" side. Different optimization points — no dig at Supermemory.

aleksiy123•about 1 hour ago

Anyone have experience with these and competitors? im curious if you saw a difference.

also any open source local or self hosted options?

altmanaltman•about 3 hours ago

Did you check if this leads to any actual benefits? If so, how did you benchmark it?

pranabsarkar•about 2 hours ago

Update — ran a real bench on the live cluster (59 memories: 8 canonical facts × 3-4 paraphrases + 6 seeded contradictions + 20 distractors). Numbers:

duplicates per query (top-10): 0.9 → 0.0 top-result correct: 75% → 87.5% 11 consolidations in 80ms conflicts detected: 0 of 6 seeded ← this one matters Turns out conflict detection runs on graph edges, and /v1/remember doesn't auto-extract entities — so contradictions sit there invisibly until you explicitly call relate. That's a UX gap, not a missing feature, but it breaks the "drop memories in, get contradictions out" mental model. Filed as issues #1 and #2. Dataset + script + raw results: https://gist.github.com/spranab/49c618d3625dc131308227103af5.... Honest benches surface the kind of thing demos hide; thanks for pushing.