ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
71% Positive
Analyzed from 4120 words in the discussion.
Trending Topics
#postgres#https#need#workflow#temporal#workflows#don#dbos#durable#scale

Discussion (101 Comments)Read Original on HackerNews
It supports pipelines, batched pipelines, and basic runners, as well as idempotent keys (including batching them). It also lets you "partition" a queue into multiple sub-queues so that you can easily segregate your jobs within your application without a lot of setup on the outside. For example, you create a root queue talking to PostgreSQL and pass it around to subsystems that then each create their own sub-queue off that to enqueue entries into and their own workers that dequeue them.
It's only used internally right now but I've been thinking about creating a separate package (with documentation) with it for others to use as well. Any feedback or pull requests would be appreciated !
[0] https://github.com/KeetaNetwork/anchor/blob/main/src/lib/que...
[1] https://github.com/KeetaNetwork/anchor/blob/main/src/lib/que...
https://lucumr.pocoo.org/2025/11/3/absurd-workflows/
https://github.com/earendil-works/absurd
https://earendil-works.github.io/absurd/
I've not used it, but it's worth comparing to other options
You might consider another name for it, that one is wholly ungoogle-able! Looks neat though
I have used Temporal in the past, works really good, my only problem with it was some limits on request payload or event sizes, created some inconveniences to us when building solutions. It also enforces good engineering practices, but sometimes you don't want to write special logic if your CSV file is larger than 2Mb, upload it to S3, pass link, then download it in the workflow.
What is your experience with DBOS? How does it compare to Temporal in terms of operational complexity, feature parity and anything else
I really like running a thin rest API in front of it inside your vpc or k8s cluster or whatever to help with event driven triggers so that they don't have to worry about Temporal auth and checking workflow status if there is any decision making around that. This helps keep your event as logic-free as possible.
Let me give a vague example: you have some sort of db trigger, and this trigger either acts directly or puts the event on a queue, your handler calls the thin rest api with the necessary event details, rest API can make the decision if this starts a workflow, signals an existing one, or ignores it (the pattern for this can vary based on the situation, but SignalWithStart is common for me or just dropping if the event is not worthy of starting a workflow and no workflow for that <ItemYouCareAbout> exists).
Then the parent/child workflow ability is very valuable when you need to orchestrate different self-contained behaviors for a single object's lifecycle, with cancellability when an external factor changes the trajectory of an object.
Long, vague story short, I find it very powerful and easy to work with and has really helped move lifecycle logic out of APIs where things can easily become riddled with debt and precarious to manage. I agree with you that it helps follow more best-practices instead of just throwing logic some place that seems easy but becomes a hidden trap later.
Then I tried their Cloud offering and was appalled at their pricing. I burned through the $1,000 free credits before I even got something to production. Didn't want to bother with running a local Temporal, either.
Best solution is to just take inspiration from their architecture and then do it yourself in Postgres, IMO.
Very happy we made the switch.
Temporal is, in my opinion having run it in prod for over a year - poorly designed, slow and ridicliously heavy infra wise.
If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day, you're going to spend millions on infra, and it's still going to absolutely suck.
Try running their own benchmarks, the numbers are pathetic.
Their sales team is also absolutely appalling and desperate.
From a Developer standpoint, the SDK is quite nice though.
Don't get trapped into nexus, and if the sales team call you make sure legal is in the room.
Ballparking: 200 events/workflow, 200 workflows/per day and assuming 1 event = 1 cloud action[1], that is 1.2M or so actions per month. The $100/month plan includes 1M actions each month, and even the pay-as-you pricing when you exceed that is $50 per 1M actions[2].
Temporal Cloud seems extremely cheap for your use case, even if I'm off by a factor of 10. Is there a catch? You still need infra to run your Temporal workers, and I assume there are storage and other costs, but I assume action usage is the majority of it.
1. Not sure exactly what constitutes an "Action". At a glance, seems like most events have a corresponding action(?) and a subset of those actions are actually billable(?)
2. https://docs.temporal.io/cloud/pricing#payg-action-pricing
Temporal was a bad fit for us, and we regret it deeply.
https://github.com/temporalio/temporal/blob/e22e6304b3c4a409...
https://github.com/temporalio/temporal/blob/e22e6304b3c4a409...
Temporal does a crazy amount of database operations and all of these are behind that mutex.
Oh, and you can't change the shard count on existing clusters.
Great stuff.
Where are the “millions” on infra going? It’s a handful of services and a Postgres?
> Their sales team is also absolutely appalling and desperate.
You said “on-prem”. It’s open source; why are you dealing with their sales team?
> If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day…
If “millions” were required to obtain such tiny scale, I’d agree there’d be a massive problem. No one would use Temporal; it would be a complete waste of resource. If this were true.
Postgres doesn't scale at all four our workload, so you're into cassandra.
For a medium sized deployment, you're looking at 200+ vcpus, and then lets say standard dev/uat/prod. So now you're at 600 cpus. Now you need two geographic regions, dev can stay in one place, so now you're at 800. Want a failover cluster for prod? Have another 200 cpus.
and 200 CPUs is a medium deployment, assuming something like 36 cpus per cassandra node, then say 4-8 per instance of matching, worker, history, frontend. Then all your other components around it, ingress controller, service mesh, etc.
There's a million a year easy, for a small deployment.
Our prod one is 4x this size.
We need a 12 node cassandra cluster for this, with 64cpu nodes. So no, it's not a couple of services and a postgres.
Sales team, as we are an enterprise, and they want to extract money from us.
https://github.com/agentspan-ai/agentspan which is essentially an agentic SDK layer for Conductor can convert any of your langgraph, openAI, vercel, or ADK agent and makes it durable and adds orchestration with no code changes.
That said, my gamer-brain wants to call this "Save-scumming at scale." Which is to say, a lot of people already know that this approach works, but maybe they haven't made the connection to abstract CS stuff.
Another strategy that can be used to build robustness is to build your workflow out of idempotent operations. That can be useful for situations where the workflow state is too large to back up. Instead, you just run the job from the top and it's a bunch of no-ops until you start making progress again.
Once you need retries, backoff, timeouts, cancellation, versioning, visibility, task routing, rate limits, leases, heartbeats, stuck-worker detection, replay/debugging semantics, workflow migration, fanout/fanin, long timers, audit trails, and operator tooling, the “just use a database” story becomes “build a poor copy of a workflow engine plus a bunch of workers.” pretty quick.
That may still be a good tradeoff for many applications, especially if Postgres is already the core operational dependency. But the comparison shouldn’t be “database vs overcomplicated orchestrator.” It’s more like “what complexity do you want to own, and what do you want to buy / offload to a professional system?”
I feel like this is the usual "just use postgres" garbage post that lacks any kind of nuance.
In fact you could replace that post with any other db and the statements keep being true, and naive.
Possibility one: There is one index on the table, and it is the created_at TS. This query has to scan 10,000 jobs/sec * 60 seconds * 60 minutes * 24 hours * 31 days * 1024 bytes / job = 25,543 GB.
A KV store would scan exactly that much.
Possibility two: The primary key is refined to (state, timestamp). Assume a 1% failure rate. Now, we "only" scan and return 255 GB. A key value store would scan exactly that much. (This is probably the right physical design).
Possibility three: The primary key is (timestamp), and there's a secondary index on state. I guess we do an index join, where one side of the join is 25,543 GB, and the other side is one unsorted bucket with 255GB * number of months the system has been in operation in it.
A KV store wouldn't let you express that.
Now, what other ad hoc queries are we supposed to efficiently support over a one month lookback? Also, what does PG do if you tell it to scan 25TB at the same time as it's inserting 10MB/sec at 10K TPS? How is vacuuming configured?
I recently developed a distributed queue and it works really great - benchmarks great too, with no race conditions or conflicts. I used SKIP LOCKED so that workers can compete safely.
You can also have multiple workers across nodes avoid conflict by using session wide mutexes i.e. pg advisory lock.
Edit: Actually I checked this again and apparently the advice has now changed to the inverse.
Here's a an example computing a Fibonacci sequence (very inefficiently, with lots of spawned sub-tasks and message passing) [2]
[1] https://github.com/estuary/flow/tree/master/crates/automatio... [2] https://github.com/estuary/flow/blob/master/crates/automatio...
I also recently started experimenting with https://github.com/earendil-works/absurd which is also Postgres and even simpler than DBOS. Their comparison is a great read:
https://earendil-works.github.io/absurd/comparison/
But for operational reasons I've started using sqlite for durable workflows instead. Porting the database concepts from either DBOS or absurd PG to SQLite is remarkably easy these days. A small polling loop instead of notify/listen feels fine for smaller workloads.
It also has VirtualObject which is a nice vendor-lock-in-free OSS alternative to CF's single threaded DurableObject.
Where DBOS absolutely shines is
1) Atomic messaging in the same db tx as your business logic via dbos.enqueue_workflow! This is often the most brittle part of any solution and doing it atomically and durably with same tx that ran your business logic drastically reduces lots of complexity.
2) Since DBOS stores workflow state in db it should be easy to build dashboard for observability from metabase/looker(I wish restate exposed its rocksdb instance so it could be hooked up to metabase).
The main benefit is centralizing all the data in one place so we don't need to worry about copying data in between multiple systems. Once something becomes the bottleneck, you can eventually migrate to a purpose specific tool to scale out.To be honest, LISTEN/NOTIFY in my opinion is the most fragile part of PG but it's fine as start until you scale out.
I'm working at a scale where almost every day I have to ask people "are you use you need to treat that as relational data? It doesn't seem relational"
It's much, much worse in my experience to have to develop for the opposite -- working on a system that was designed for an imagined "infinite" scale that in reality like 100GB and a few transactions a minute.
Is this intended to be "you sure you need..."?
apropos bad naming, postgresql authors are not forgiven for naming all the databases on a single host a "cluster". I mean __really__.
I think if you grow enough to look for these extensions, it's usually better to bet on purpose-specific tooling. For example, I use DuckDB/Iceberg combination extensively for columnar data and connect DuckDB to PG when I need it.
In any case there can be more to durable workflows than just saving the current step, and not all intermediate steps are serializable thus I don't get where's the postgres magic that more mature solutions don't have.
Not sure where the NIH ends and where you're actually better off with a supported orchestration approach. I suppose if you expect your program to be around a while (or need advanced features), maybe think about using something a bit more battle tested?
Strong correctness guarantee is something that should not be undermine. Even more important than availability.
The examples on the website is simple but heavily undermines the importance of correctness. Anyone who implement similar pseudo-code directly will eventually suffer from data correctness issue in crashes.
For that particular usage, the volume we process and business criticality make it a good choice for inventing here - but for other durable processes we just use off the shelf tools since the cost of maintenance would quickly outstrip the value.
Postgres is a great tool to use and far more powerful than most people give it credit for - but there's always the balance of in-house maintenance vs. paying rent for someone else's solution.
That said, as a predecessor to dbos in building durable workflows just using Postgres, I concur with the overall sentiment.
Postgres is not cheap to run in the cloud at scale. We went for the cheapest infra, which is basically the disk storage.
Either way, I'd bet a hosted Postgres with HA is cheaper than whatever PaaS you're thinking of.
Given the above, it would seem that durable workflow software is pushed forward by those who have a surplus of VC money to spend. As for the vendors, there is no shortage of people trying to sell you things that you don't need.
Typescript: https://www.pgflow.dev
Elixir: https://github.com/agoodway/pgflow/blob/main/docs/COMPARISON...