DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
51% Positive
Analyzed from 2618 words in the discussion.
Trending Topics
#rowid#uuid#table#sqlite#key#need#integer#more#uuids#without

Discussion (105 Comments)Read Original on HackerNews
For a single database, bigints are smaller and faster, with less footguns.
UUIDs can be nice for an opaque public ID, however I'd still prefer something like a Sqid for space and usability.
But be careful!! Javascript WILL interpret your bigints as Number() and round them down because they are too big without telling you!!!
Famously seen by every snowflake user that has interacted with Javascript, quite an annoying problem.
A similar horror story from PHP, which I discovered by diagnosing a test failure. (Or maybe it was in production? Long ago, can't remember.)
I think the code in question was for some kind of web auth, comparing random 32-character hexadecimal strings. PHP has a "feature" where its == operator falls back to trying certain strings as numbers... and that includes a version with scientific notation. (12000 == "12000" == "12e3")
Such a collision through bad comparison may seem unlikely, but there are two islands of higher odds: 0*10^X is zero for any X, and X*10^0 is one for any X. Finally, leading zeros can be included. ("0e1234" == "00000e1" and "1234e0" == "9e0000")
The fix was simply going to stricter ===, but it definitely reinforced my dislike of "loose" languages.
Then it will always be a string and you will be free to change the format/type of the key in the future to UUID or whatever you like.
Node.js drivers will correctly read int64 as string or bigint, not number.
E.g. pg for PostgreSQL
Maybe there’s a buggy driver but I don’t know it.
You can of course, change the api such that it does {"id": "1324535222364012585"} instead and voila, it will no longer try parsing it as number. Or the many other workarounds people have recommended above (like appending a prefix, or using a different encoding), but why is it trying to parse a number thats too big and instead of throwing it just rounds down without telling you????!
In JS - BigInt is 64bit integer.
In anything else - BigInt is a arbitrarily large integer.
All sequences use step 16.
Type A has discriminant/offset 0, yielding IDs {0, 16, 32, 48, 64, …}.
Type B has discriminant/offset 1, mapping to IDs {1, 17, 33, 49, 65, …}.
All the way up to Type P with discriminant/offset 15 and IDs {15, 31, 47, 63, 79, …}.
This is also trivially invertible so that you can determine the type from the ID.
A more common approach is to make IDs opaque strings and put a type prefix—A0, B12, P34, that kind of thing. But this way you can keep it as a number, if you wish.
It does not actually make it impossible to query the wrong table it just tells you quickly when you’ve done so.
IME, most of the arguments for why UUIDs make things better are due to developer ignorance of RDBMS features (or B+tree performance).
Inappropriate aliasing of integer keys allows for silent errors in queries because it will actually return some result a lot of the time. A UUID is immune to this problem. The model recognizes its mistake a lot more reliably when previously non-empty tables start showing up empty after attempting a join.
I read this post more as an illustration of the *value* of UUIDv7 as primary key, over integer primary keys, in lieu of minimal loss of read/write performance, and marginally more data on disk bloat.
SQLite's automatic integer rowID primary key is a no-brainer, when the SQLite application is local-only, such as application storage format (mobile and desktop). Or is never intended to grow beyond a single server instance. Basically, where each SQLite file is private to a singular instance of the application.
However, if there is even an outside chance of needing to cooperate across application instances, e.g. the minimal limit case of a personal knowledge base that should seamlessly sync across a person's devices, as well as a hosted service, then a high-quality sequential random ID starts to make a lot more sense. (No-brainer arbitrary table merges / splits / remerges, de-duplication, etc.)
Random ID primary key is a bad idea period, whether it be the UU kind or the SQ kind, or any other kind. As far as my DB knowledge goes, this class of ID destroys all tree-algorithms, and we are stuck with the fact that there is no practically better way, than an appropriate tree-structure, to group and organise a meaningful amount of data, efficiently and effectively.
cf. https://sqlite.org/withoutrowid.html
> The WITHOUT ROWID syntax is an optimization. It provides no new capabilities. Anything that can be done using a WITHOUT ROWID table can also be done in exactly the same way, and exactly the same syntax, using an ordinary rowid table. The only advantage of a WITHOUT ROWID table is that it can sometimes use less disk space and/or perform a little faster than an ordinary rowid table.
As of now, I am doing the following in my (Bitemporal data system) experiment (When will it see the light of day? Nobody knows.).
All data are globally uniquely identified by a UUIDv7. However all tables have `rowid` integer primary key asc (which is just an alias for SQLite's autoincrement int id). The `rowid` is the basis for joins, and is the foreign key reference. This lets me offload some useful disambiguation work to the DB as well as have it enforce global (across data systems) record uniqueness guarantees, while retaining local (within process) query efficiency by retaining the ability to use integer rowids.
While the idealised insert performance in your bench is indeed mind-boggling, the DB Schema isn't doing anything CPU-intensive during inserts (checks, constraints, triggers etc.). My schema / query pattern yields comparatively meagre throughput, but I am happy with the ballpark it has landed in, given all the work I'm making SQLite do for me on each `assert!` and `redact!`.
cf. my dirty-but-useful-enough bench, with production-like record content:
A poor man's napkin-mathy, append-only SQLite write/read benchmark
https://gist.github.com/adityaathalye/3c8195dc70626b33c23867...
Summary:
Thanks to its oh so convenient automatic integer rowIDs, I believe one can amortise some of the other overheads of UUIDv7s for "in-between" queries, viz. indices, joins, ctes, virtual tables etc., with appropriate schema / query design.
UUID v7 so far seems like the best solution if you want UUID benefits and ordering.
Why would you force database to order rows on the drive according to random id?
I never wished, gee, why didn't I use integer key. But so many times I wished I used uuid because eventually your data rows are going to need to have identity that is not local to this specific database instance.
I disagree. I tried this once. Now you need a client access layer to touch the DB in any context. All your console tools no longer work well or at all. If they show up in URLs you need to deoptimize them for transport.
You give up a lot of convenience for this optimization. You should be absolutely sure your design requires it before using it.
Why would you store it as as str column and not the inbuilt type for this?
https://www.postgresql.org/docs/current/datatype-uuid.html
If you are using SQLite well I guess that doesn't work.
So, UUIDv4 as a PK on a clustered index can be perfectly feasible for cases where you want to avoid disclosing stuff and row insertion performance isn’t that important.
I get saving 8 bytes per row seems attractive, but the tradeoff is not explained.
The tradeoff is what the benchmark is hitting. Once the table is physically ordered by the key, a random v4 scatters every insert across the tree and you pay for the page splits. A plain rowid table keeps that churn in the secondary index, which is just the key plus a rowid, while the table itself stays append-ordered. So it only really pays off when the key is something you look up directly and is roughly sequential, which is why v7 comes back near baseline.
The Integer id is used for joins and looks ups and such but that's it. If I need to send anything to the frontend or outside of the app/DB then that's the UUID.
The Guid is purely for an external system to grab onto something that I can tie back to an actual row in the database but the external system does not need to know anything about the backend other than <guid>.
How much trouble does SQLite reysing rowid's actually cause?
Regular rowids are definitely the way to go if you can use them.
https://sqlite.org/inmemorydb.html
There's only one index so there's no real write amplification. The numbers will go down as you add more data and indexes.
But a Url62 as a url safe public id from the pk is simple and straightforward to use and comes with few risks of leak issues. Wish postgres had native base62 encoding for url62 now that it has uuidv7 native.
Contention and coordination are real killers, concurrent writes (that require coordination like postgres) often underdeliver.
Also INSERT speed instead of SELECT? Typically most time is spend in SELECT or UPDATE.