OpenData Vector: MIT-Licensed Vector Search on Object Storage

Discussion (3 Comments)Read Original on HackerNews

oliverio•about 2 hours ago

Very interesting, thanks for sharing. This has a lot of nods to Turbopuffer's architecture [0]. My impression is they've spent a lot of time optimizing at the hardware/firmware layer to achieve extremely fast query results.

Inarticulately - how ~close is OpenData Vector to Turbopuffer in terms of performance today and where are the major gaps + mountains to scale?

Really excited to keep an eye on the repos, great read!

[0]https://turbopuffer.com/blog/turbopuffer

rohanpdes•about 2 hours ago

Yep! Vector provides a lot of the same benefits, just as an OSS project. They were definitely a major inspiration. Vector's performance is similar to their published benchmarks. The biggest gap is (unsurprisingly) for larger (e.g. 100s of M - 1B+) datasets. We talk about it in the post, but the main improvement there is adding quantization to reduce the overhead of loading large posting lists. There's also a bunch of storage and caching layer work to be done. That's on our roadmap along with some cool features like full-text search and better support for multi-tenancy.

apurvamehta•about 2 hours ago

Thanks! opendata contributor here.

We're heavily inspired by Turbopuffer. I'd say we are comparable to them when they launched in terms of perf and scale. But they've obviously invested heavily since then, so we're not going to match them on raw perf at scale right now. Our goal is to be a pretty competitive OSS offering over the long term though.

The next biggest lift for us to get much closer is quantization. If we squeeze more signal into fewer bits, we will improve performance end to end.

OpenData Vector: MIT-Licensed Vector Search on Object Storage

⚡ Community Insights

Discussion (3 Comments)Read Original on HackerNews