HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
89% Positive
Analyzed from 1874 words in the discussion.
Trending Topics
#graph#llms#schema#cypher#typescript#queries#write#type#crdt#data

Discussion (31 Comments)Read Original on HackerNews
The live queries also caught my eye. Having traversals auto reexecute when data changes sounds straightforward until you realize the underlying data is being merged from multiple peers concurrently. Getting that right without stale reads or phantom edges is genuinely hard.
I've been researching on something like this in a similar space but for source code, therefore built a tool called Weave(https://github.com/Ataraxy-Labs/weave) for entity level merges for git. Instead of merging lines of text, it extracts functions, classes, and methods, builds a dependency graph between them, and merges at that level.
Seeing codemix makes me think there might be something interesting here. Right now our entity graph and our CRDT state are two separate things. The graph lives in sem-core (our analysis engine) and the CRDT lives in weave-crdt. If something like @codemix/graph could unify those, you'd have a single data structure where the entity dependency graph is the CRDT.
What's the advantage of using all these different things in one system? You can do all of this in datalog. You get strong eventual consistency naturally. LLMs know how to write it. It's type safe. JS implementations exist [0].
[0] https://github.com/tonsky/datascript
Zod/Valibot/ArkType/Standard Schema support because you need a way to define your schema and this allows for that at runtime and compile time.
Y.js as a backing store because I needed to support offline sync, branching/forking, and I use Y.js for collaborative editing in my product, so I needed to be able to store the various CRDT types as properties within the graph. e.g. you can have a `description` property on your vertices or edges that is backed by a Y.Text or Y.XmlElement
Cypher because until the arrival of codemode it wasn't feasible to have LLMs write queries using the Gremlin-like API and LLMs already know Cypher.
Most of all though, this was an experiment that ended up being useful.
Though typescript is pretty fast, and the language is flexible, we all know how demanding graph databases are. How hard they are to shard, etc. It seems like this could be a performance trap. Are there successful rbdms or nosql databases out there written in typescript?
Also why is everything about LLMs now? Can't we discuss technologies for their face value anymore. It's getting kind of old to me personally.
How large is large, here? Tens of thousands of triples? Hundreds? Millions?
I'm working on a local-first browser extension for ActivityPub, and currently I am parsing the JSON-LD and storing the triples in specialized tables on pglite to be able to make fast queries on that data.
It would be amazing to ditch the whole thing and just deal with triples based on the expanded JSON-LD, but I wonder how the performance would be. While using the browser extension for a week, the store accumulated ~90k thousand JSON-lD documents, which would probably mean 5 times as many triples. Storage wise is okay (~300MB), but I think that a graph database would only be useful to manage "hot data", not a whole archive of user activity.
The query syntax looks nice by the way.
[0] https://tinkerpop.apache.org/
Fully agree with you, LLM's everywhere is getting churlsome, and trite. Sure, one can generate the code, but can one talk about the code?
Can one move, easily, beyond "but why should I?" into "I should because ..", whenever it comes to some realm, one is supposdely to have "conquered" with AI/ML/etc.?
Sure, we can all write great specs, sit back and watch the prompts rolling in off the horizon...
But seriously: the human reasoning of it is the most important part!
However everyone is busy building things secretly that they don't want anyone to know about with AI/ML (except the operators of course, duh, lol, kthxbai...) because, after all, then anyone else can do it .. and in this secrecy, human literature is transforming - imho - in non-positive ways, for the future.
Kids need to read books, and quick!
>old to me personally
I kind of regret seeing it in one of my favourite realms, retro- computing .. but .. what can we do, it is here and now and the kids are using it even if some don't.
I very much concur with you on the desire to continue being able to discuss technologies at face value, personally. We have to be reviewing the code; the point at which we don't review AI's output, is where we become subjects of it.
This is, probably, like the very early days of porno, or commercial tobacco, or indeed industrialized combustion, wherein it kind of took society a few big leaps and dark tumbles before the tech sort of stabilized. I'm not sure we'll get to the "Toyota" stages of AI, or indeed we're just going to blast right past into having AI control literally every device under the sun, whether we like it or not (and/or, have an impact on the technologically-obsolete curve, -/+'vely...)
Ain't no easily-accessible AI-hookery Typescript in 8-bit land .. I will have my retro- computing without that filthy stinking AI/ML please, mmkay, thanks!!!
I wrote my own in-Memory Graph (i'd rather call it storage than a DB) some years ago in golang, even there i was wondering if golang actually is the optimal technology for something like a database especially due to the garbage collection/stop the world/etc. Its just there are certain levels of optimization i will never be able to properly reach (lets ignore possible hacks). Looking at a solution in typescript, no matter how "nice" it looks, this just doesnt seem to be the correct "tool/technology" for the target.
And inb4, there are use cases for everything, and same as i wouldn't write a website in C, i also wouldn't write a database in javascript/typescript.
I just would argue this is the wrong pick.
@llms : im not even getting into this because if you dont wanne read "llm" you basically can't read 99% of news nowadays. ¯\_(ツ)_/¯
edit: im a big fan of graph databases so im happy about every public attention they get ^
How dos Yjs handle schema migrations? If I add a property to a vertex type that existing peers have cached, does it conflict or drop the unknown field?
This looks neat, but if you want it to be used for AI purposes, you might want to show a schema more complicated than a twitter network.
Imho having a graph database that is really easy to use and write new cli applications on top of works much better. You don't need strong schema validation so long as you can gracefully ignore what your schema doesn't expect by viewing queries as type/schema declarations.
https://github.com/magic-locker/faculties
I'm sure once that problem been solved, you can use the built-in map/object of whatever language, and it'll be good enough. Add save/load to disk via JSON and you have long-term persistence too. But since LLMs still aren't clever enough, I don't think the underlying implementation matters too much.
A: One of the main lessons of the RAG era of LLMs was reranked multiretrieval is a great balance of test time, test compute, and quality at the expense of maintaining a few costly index types. Graph ended up a nice little lift when put alongside text, vector, and relational indexing by solving some n-hop use cases.
I'm unsure if the juice is worth the squeeze, but it does make some sense as infra. Making and using these flows isn't that conceptually complicated and most pieces have good, simple OSS around them.
B: There is another universe of richer KG extraction with even heavier indexing work. I'm less clear on the ROI here in typical benchmarks relative to case A. Imagine going full RDF, vs the simpler property graph queries & ontologies here, and investing in heavy entity resolution etc preprocessing during writes. I don't know how well these improve scores vs regular multiretrieval above, and how easy it is to do at any reasonable scale.
In practice, a lot of KG work lives out of the DB and agent, and in a much fancier kg pipeline. So there is a missing layer with less clear proof and a value burden.
--
Seperately, we have been thinking about these internally. We have been building gfql , oss gpu cypher queries on dataframes etc without needing a DB -- reuse existing storage tiers by moving into embedded compute tier -- and powering our own LLM usage has been a primary internal use case for us. Our experiences have led us to prioritizing case A as a next step for what the graph engine needs to support inside, and viewing case B as something that should live outside of it in a separate library . This post does make me wonder if case B should move closer into the engine to help streamline things for typical users, akin how solr/lucene/etc helped make elastic into something useful early on for search.