HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
75% Positive
Analyzed from 1664 words in the discussion.
Trending Topics
#system#queue#distributed#postgres#database#state#need#same#once#workflow

Discussion (33 Comments)Read Original on HackerNews
One of the technical questions was "if you have a db and a message queue, how do you get your update to alter both or neither (i.e. transactionally)"?
I thought about it for a couple of minutes, then came back with something like "I can't, and you can't either." Then I proposed the usual spiel about using a replicated-state-machine/write-ahead-log/event-sourcing (whatever it might be called at the time) and leaning into eventual consistency as the only practical solution.
He asked if I'd heard about the outbox pattern, so I let him describe it. Sure enough it sounded like this article. The secret to transacting across the database D and the message queue Q:
is to split D into two parts (the State and the Outbox), transact across those instead and then just pretend that you have a transaction across D and Q.FWIW The article literally talks about the challenges with getting this to actually work and recommends removing it and just using the DB for everything.
Step 2: propose a source of truth that everyone can listen to. Hearing the same facts in the same order should put everyone in the same state (eventual consistency)
Step 3 (you are here): try to do better than EC, by merging the external queue into one of the nodes, making it the master.
Step 4: Now there's no distance between the nodes, so no need to solve the distributed systems problem and you can retire the queue.
However the way Postgres keeps around obsolete rows (deleted or modified) until they're vacuumed can cause problems for high throughput queues. So for those systems the complexity might be worth it. But I bet 90% of the time the choice to use a separate queue is premature optimization. And hopefully OrioleDB (undo based storage engine for postgres) will avoid most of these pitfalls reducing the need for separate queues even further.
In most services, I often swap out the message broker or the workflow engine, but the database almost always stays the same.
I'm not sure if I've understood this correctly.
Is it really a distributed system or just a bunch of services with a central database?
*: edit, maybe a better example here is a rail system with a single central dispatcher is centralized but may still be distributed
There are always tradeoffs of course, but building a truly decentralized system requires some really difficult compromises to correctness. The two general's problem is a great piece of reading on this topic - distribution always requires compromises in general, but to fully remove an authority on truth gets quite tricky.
It is!
And the solution is to add an extra general on the left side. Let's call him Outus Boxus. The two generals on the left side can communicate in perfect lockstep. Then if you need the general on the right to find out about something, you can send a few workers to tell him or something...
More seriously though, you can have a DS for two reasons: tech or political.
Tech means scaling or reliability. So clients can be serviced by any of the nodes.
Political means different actors don't have a central authority. You can't stick two banks into one db.
This technique doesn't seem to address either aspect.
A similar pattern has spilled out of projects like Warpstream[2], which I suspect is using Postgres behind the scenes of their control plane.
[1]: https://ducklake.select
[2]: https://www.warpstream.com/
Neal Ford calls this a distributed monolith because a change to a database schema can break every single service at once, but there are very valid uses of this method.
There are decades of books on the foot guns as we used this even back in the client-server days.
One suggestion I have is to research where the first version of SoA failed, especially as these systems tend to erode into Enterprise Service Busses.
Products like Apache airflow tend to have value not because of the persistence layer, but because they force workflows into DAGs, which is an enforceable structural constraint, while SQL, being declarative, can sometimes force you into trying to enforce governance through observing behavior.
The former is not subject to Rice’s theorem, while the latter is.
If you actively control for these it will greatly increase the lifetime of this system before (or if) you reach the point you have to replace the system.
There will always be a window for potential loss due to solar flares/whatever but the key in designing a system like this is to make sure you're aware of how the system can fail, accept that outcome and then work to, as much as possible, shrink the distance in cycles/logic between each persistence committal. Logic should be front-loaded to do as much prep work as possible before any irreversible actions happen and then those irreversible actions should be ordered to your preference and dispatched as quickly and cheaply as possible in a safe manner.
Here's another blog post about how a Postgres-backed task queue can run at scale: https://www.dbos.dev/blog/making-postgres-queues-scale
When workers query the db for jobs the rows get locked by the select and there are no race conditions or duplicate assigned jobs
It seems this article is trending toward that view: If you can maintain transactional consistency along with application workflow state, then would this generalize to maintaining distributed application state in general?
The follow-up would be: Would this be preferable to Valkey/Redis?
Yes, in the sense of 'too good to be true'
As to which technical solution would be optimal there are a bunch of factors to consider and I think preferences around features could lead you to a variety of options. Postgres is excellent as long as you're minimizing the amount of data piping directly through it or operating at a reasonable scale.
This sounds a lot like reinventing a message queue. Someone trying this in the future might learn painful lessons about ordering, commits, partitioning, dead-letter-queues, replayability, don't-call-me-I'll-call-you, and anything else a Kafka-like comes with out of the box.