HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
84% Positive
Analyzed from 1272 words in the discussion.
Trending Topics
#sequential#write#tree#better#data#https#postgres#more#random#trees

Discussion (37 Comments)Read Original on HackerNews
As such, I have a question for you: contrary to your article, I've always been taught that random primary keys are better than sequential ones. The reason for this, I was told, was to avoid "hotspots". I guess it only really applies once sharding comes into play, and perhaps also only if your primary key is your sharding key, but I think that's a pretty common setup.
I'm not really sure how to formulate a concrete question here, I guess I would like to hear your thoughts on any tradeoffs on sequential Vs random keys in sharded setups? Is there a case there random keys are valid, or have I been taught nonsense?
If you're sharding based purely on sequential ID ranges, then yes this is a problem. Its better practice to shard based on a hash of your ID, so sequential id assignments turn into non-sequential shard keys, keeping things evenly distributed.
And since it's only used for speedy lookup we can even use a fast, cheap and non-secure hashing algorithm, so it's really a low-cost operation!
Thanks! This was really one of those aha-moments where I feel kinda stupid to not have thought of it myself!
It's not as good as just a sequential ID at keeping the fragmentation and data movement down. However, it does ultimately lead to the best write performance for us because the user data ends up likely still appending to an empty page. It allows for more concurrent writes to the same table because they aren't all fighting over that end page.
UUIDv4 is madness.
One neat realization is that a database is in fact more about indexes than the actual raw tables (all things interesting work under this assumption), to the point that implementing the engine you get the impression that everything start with "CREATE INDEX" than "CREATE TABLE". This includes sequential scans, where as visualized in your article show that lay the data sequentially is in fact a form of index.
Now, I have the dream of make a engine more into this vision...
https://planetscale.com/learn/courses/mysql-for-developers
Also, for some reason there have been lots of HN articles incorrectly advising people to use uuid4 or v7 PKs with Postgres. Somehow this is the first time I've seen one say to just use serial.
random UUIDs vs time-based UUIDs vs sequential integers has too many trade-offs and subtleties to call one of the options "incorrect" like you're doing here.
just as one example, any "just use serial everywhere" recommendation should mention the German tank problem [0] and its possible modern-day implications.
for example, if you're running a online shopping website, sequential order IDs means that anyone who places two orders is able to infer how many orders your website is processing over time. business people usually don't like leaking that information to competitors. telling them the technical justification of "it saves 8 bytes per order" is unlikely to sway them.
0: https://en.wikipedia.org/wiki/German_tank_problem
DB itself is “distributed” in that it’s running outside the services own memory in 99% of cases, in complex systems the actual DB write may be buried under multiple layers of service indirection across multiple hosts. Trying to design that correctly while also dealing with pre-write/post-write split on record id is a nightmare.
But, for both Serial & db-gen’d sequential UUID you can still encounter transaction commit order surprises. I think software relying on sequential records should use some mechanism other than Id/PK to determine it. I’ve personally encountered extremely subtle bugs related to transaction commit order and sequential Id assumptions multiple times.
If it’s just being stored in the table, it doesn’t matter, but also if it doesn’t matter, just use v7.
Ideally you use IDENTITY with Postgres, but the end result is the same, yes.
With composite indices in InnoDB it's even more important to keep the tree streamlined and let it fan out according to data cardinality: https://news.ycombinator.com/item?id=34404641
https://github.com/sqlite/sqlite/blob/master/src/btree.c
I always thought this was too complicated to every really understand how it worked, especially the lock policy, but now with LLMs (assisted with sqlite’s very comprehensive comment policy) even a relative neophyte can start to understand how it all works together. Also the intro to the file is worth reading today:
* 2004 April 6 * * The author disclaims copyright to this source code. In place of * a legal notice, here is a blessing: * * May you do good and not evil. * May you find forgiveness for yourself and forgive others. * May you share freely, never taking more than you give. * ************************************* * This file implements an external (disk-based) database using BTrees. * See the header comment on "btreeInt.h" for additional information. * Including a description of file format and an overview of operation. */
But you'd rarely need it. We mostly have write intensive counters. We just write to redis first then aggregate and write to postgres.
This reduces number of writes we need in postgres a lot
If you want a comprehensive resource, I'd recommend reading either Designing Data Intensive Applications (Kleppman) or Database Internals (Petrov). Both have chapters on B-trees and LSMs.
Would love to know if anyones built something using it outside of academic testing.