ES version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
71% Positive
Analyzed from 1145 words in the discussion.
Trending Topics
#database#branching#postgres#branch#https#copy#need#using#don#xata

Discussion (30 Comments)Read Original on HackerNews
A few million rows should take at most, on the most awful networked storage available, maybe 10 seconds. I just built an index locally on 10,000,000 rows in 4 seconds. Moreover, though, there are vanishingly few cases where you wouldn't want to use CONCURRENTLY in prod - you shouldn't need to run a test to tell you that.
IMO branching can be a cool feature, but the use I keep seeing touted (indexes) doesn't seem like a good one for it. You should have a pretty good idea how an index is going to behave before you build it, just from understanding the RDBMS. There are also tools like hypopg [0], which are also available on cloud providers.
A better example would be showing testing a large schema change, like normalizing a JSON blob into proper columns or something, where you need to validate performance before committing to it.
0: https://github.com/HypoPG/hypopg
Looking at Xata’s technical deep dive, the site claims that we need an additional Postgres instance per replica and proposes a network file system to work around that. But I don’t really understand why that’s needed. Can someone explain to me my misunderstanding here?
Are you referring to `file_copy_method = clone` from Postgres 18? For example: https://boringsql.com/posts/instant-database-clones/
I think the key limitation is:
> The source database can't have any active connections during cloning. This is a PostgreSQL limitation, not a filesystem one.
What I'm saying there is that if you do Postgres with on top of a local ZFS volume, the child branches Postgres instances need to be on the same server. So you are limited in how many branches you can do. One or two are fine, but if you want to do a branch per PR, that will likely not work.
If you separate the compute from storage via the network, this problem goes away.
At the same time Postgres people don't seem comfortable with the idea in practice so I'm not sure if this is actually ok to do.
https://www.dolthub.com/
It was a lot of work and had poor performance with a lot of complications. I am not using it in my latest projects as a result.
Not disputing that Oracle might have had something like this built-in, but it sounds like something that I could have whipped up in a day or so as a custom solution. I actually proposed a similar system to create anonymized datasets for researchers when I worked at a national archive institute.
Yes planetscale can branch too, but it takes longer and you pay individually for each branch
I actually built my own immutable database which does support branching (see profile), so it seems like a huge miss that these ones don't. It's pretty much the main reason I would want an immutable database.
The linked article points out that Datomic doesn't support branching from the past. It absolutely does support branching, and I've built entire test suites that way.
From a cursory glance, I'd say Datomic does exactly what the original parent article is discussing. It works great and it's super convenient.
That said, I’m adding xitdb to the list of tech to try out. Thank you for building it!
Oh, and thanks for linking to my article :-)
[0]: https://github.com/replikativ/datahike
[1]: https://datahike.io/notes/the-git-model-for-databases/