ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
61% Positive
Analyzed from 4895 words in the discussion.
Trending Topics
#async#rust#threads#code#language#more#thread#things#stack#await

Discussion (125 Comments)Read Original on HackerNews
I still don’t have enough experience to have a strong opinion on Rust async, but some things did standout.
On the good side, it’s nice being able to have explicit runtimes. Instead of polluting the whole project to be async, you can do the opposite. Be sync first and use the runtime on IO “edges”. This was a great fit to a project that I’m working on and it seems like a pretty similar strategy to what zig is doing with IO code. This largely solved the function colloring problem in this particular case. Strict separation of IO and CPU bound code was a requirement regardless of the async stuff, so using the explicit IO runtime was natural.
On the bad side, it seems crazy to me how much the whole ecosystem depends on tokio. It’s almost like Java’s GC was optional, but in practice everyone just used the same third party GC runtime and pulling any library forced you to just use that runtime. This sort of central dependency is simply not healthy.
But maybe my fears are unfounded.
Traits in the stdlib for common functionality like "spawn" (a task) and things like async timers. Then executors could implement those traits and libraries could be generic over them.
We could have something similar for a global async executor which can be overridden. Or maybe you launch your own executor at startup and register it with std, but after that almost all async spawn calls go through std.
And std has a decent set of defaults if you just don't care, but want everything to work on every OS.
There are issues in particular with core traits for IO or Stream being defined in third-party libraries like tokio, futures or its variants. I've seen many cases where libraries have to reexport such types, but they are pinned to the version they have, so you can end up with multiple versions of basic async types in the same codebase that have the same name and are incompatible.
I've felt before that compilers often don't put much effort into optimizing the "trivial" cases.
Overly dramatic title for the content, though. I would have clicked "Async Rust Optimizations the Compiler Still Misses" too you know
Yes, we can have async in traits and closures now. But those are updates to the typesystem, not to the async machinery itself. Wakers are a little bit easier to work with, but that's an update to std/core.
As I understand it, the people who landed async Rust were quite burnt out and got less active and no one has picked up the torch. (Though there's 1 PR open from some google folk that will optimize how captured variables are laid out in memory, which is really nice to have) Since I and the people I work with are heavy async users, I think it's maybe up to me to do it or at least start it. Free as in puppy I guess.
So yeah, the title is a little baitey, but I do stand behind it.
Great to see people wanting to get involved with the project, though. That’s the beauty of open source: if it aggravates you, you can fix it.
Retrospectively, i think everyone is satisfied with the adopted syntax.
The author seems to be obsessing about the overhead for trivial functions. He's bothered by overhead for states for "panicked" and "returned". That's not a big problem. Most useful async blocks are big enough that the overhead for the error cases disappears.
He may have a point about lack of inlining. But what tends to limit capacity for large numbers of activities is the state space required per activity.
Is it really though?
In my experience many Rust applications/libraries can be quite heavy on the indirection. One of the points from the article is that contrary to sync Rust, in async Rust each indirection has a runtime cost. Example from the article:
I would naively expect the above to be a 'free' indirection, paying only a compile-time cost for the compiler to inline the code. But after reading the article I understand this is not true, and it has a runtime cost as well.This may look like a case of over-optimization, but given how many times i've seen this pattern, i assume it builds up to a lot of unnecessary fluff in huge codebases. To be clear, in that case, the concern is not really about runtime speed (which is super fast), but rather about code bloat for compilation time and binary size.
Most useful async blocks are deeply nested, so the overhead compounds rapidly. Check the size of futures in a decently large Tokio codebase sometime
Depends somewhat on your expectations, I suppose. Compared to Python, Java, sure, but Rust off course strives to offer "zero-cost" high level concepts.
I think the critique is in the same realm of C++'s std::function. Convenience, sure, but far from zero-cost.
not just too dramatic
given that all the things they list are
non essential optimizations,
and some fall under "micro optimizations I wouldn't be sure rust even wants",
and given how far the current async is away from it's old MVP state,
it's more like outright dishonest then overly dramatic
like the kind of click bait which is saying the author does cares neither about respecting the reader nor cares about honest communication, which for someone wanting to do open source contributions is kinda ... not so clever
through in general I agree rust should have more HIR/MIR optimizations, at least in release mode. E.g. its very common that a async function is not pub and in all places directly awaited (or other wise can be proven to only be called once), in that case neither `Returned` nor `Panicked` is needed, as it can't be called again after either. Similar `Unresumed` is not needed either as you can directly call the code up to the first await (and with such a transform their points about "inlining" and "asyncfns without await still having a state machine" would also "just go away"TM, at least in some places.). Similar the whole `.map_or(a,b)` family of functions is IMHO a anti-pattern, introducing more function with unclear operator ordering and removal of the signaling `unwrap_` and no benefits outside of minimal shortening a `.map(b).unwrap_or(a)` and some micro opt. is ... not productive on a already complicated language. Instead guaranteed optimizations for the kind of patterns a `.map(b).unwrap_or(a)` inline to would be much better.
So threads was the right programming model.
Now language runtimes prefer “green threads” for portability and performance but most languages don’t provide that properly. Instead we have awkward coloring of async/non-async and all these problems around scheduling, priority, and no-preemption. It’s a worse scheduling and process model than 1970.
Not really. I’ve observed async code often is written in such a way that it doesn’t maximize how much concurrency can be expressed (eg instead of writing “here’s N I/O operations to do them all concurrently” it’s “for operation X, await process(x)”). However, in a threaded world this concurrency problem gets worse because you have no way to optimize towards such concurrency - threads are inherently and inescapably too heavy weight to express concurrency in an efficient way.
This is is not a new lesson - work stealing executors have long been known to offer significantly lower latency with more consistent P99 than traditional threads. This has been known since forever - in the early 00s this is why Apple developed GCD. Threads simply don’t provide any richer information it needs in the scheduler to the kernel about the workload and kernel threads are an insanely heavy mechanism for achieving fine grained concurrency and even worse when this concurrency is I/O or a mixed workload instead of pure compute that’s embarrassingly easily to parallelize.
Do all programs need this level of performance? No, probably not. But it is significantly more trivial to achieve a higher performance bar and in practice achieve a latency and throughput level that traditional approaches can’t match with the same level of effort.
You can tell async is directionally kind of correct in that io_uring is the kernel’s approach to high performance I/O and it looks nothing like traditional threading and syscalls and completion looks a lot closer to async concurrency (although granted exploiting it fully is much harder in an async world because async/await is an insufficient number of colors to express how async tasks interrelate)
But as you observed, async/await fails to express concurrency any better. It’s also a thread, it’s just a worse implementation.
Your premise is wrong. There are many counterexamples to this.
Sure, but once you involve the kernel and OS scheduler things get 3 to 4 orders of magnitude slower than what they should be.
The last time I was working on our coroutine/scheduling code creating and joining a thread that exited instantly was ~200us, and creating one of our green threads, scheduling it and waiting for it was ~400ns.
You don't need to wait 10 years for someone else to design yet another absurdly complex async framework, you can roll your own green threads/stackful coroutines in any systems language with 20 lines of ASM.
2. Unchecked array operations are a lot faster. Manual memory management is a lot faster. Shared memory is a lot faster.
Usually when you see someone reach for sharp and less expressive tools it’s justified by a hot code path. But here we jump immediately to the perf hack?
3. How many simultaneous async operations does your program have?
When it comes time to test your concurrent processing, to ensure you handle race conditions properly, that is much easier with callbacks because you can control their scheduling. Since each callback represents a discrete unit, you see which events can be reordered. This enables you to more easily consider all the different orderings.
Instead with threads it is easy to just ignore the orderings and not think about this complexity happening in a different thread and when it can influence the current thread. It isn't simpler, it is simplistic. Moreover, you cannot really change the scheduling and test the concurrent scenarios without introducing artificial barriers to stall the threads or stubbing the I/O so you can pass in a mock that you will then instrument with a callback to control the ordering...
The problem with callbacks is that the call stack when captured isn't the logical callstack unless you are in one of the few libraries/runtimes that put in the work to make the call stacks make sense. Otherwise you need good error definitions.
You can of course mix the paradigms and have the worst of both worlds.
There is one hill I'll die on, as far as programming languages go, which is that more people should study Céu's structured synchronous concurrency model. It specifically was designed to run on microcontrollers: it compiles down to a finite state machine with very little memory overhead (a few bytes per event).
It has some limitations in terms of how its "scheduler" scales when there are many trails activated by the same event, but breaking things up into multiple asynchronous modules would likely alleviate that problem.
I'm certain a language that would suppprt the "Globally Asynchronous, Locally Synchronous" (GALS) paradigm could have their cake and eat it too. Meaning something that combines support for a green threading model of choice for async events, with structured local reactivity a la Céu.
F'Santanna, the creator of Céu, actually has been chipping away at a new programming language called Atmos that does support the GALS paradigm. However, it's a research language that compiles to Lua 5.4. So it won't really compete with the low-level programming languages there.
[0] https://ceu-lang.org/
[1] https://github.com/atmos-lang/atmos
If your threads are "free" you can just run 400 copies of a synchronous code and blocking in one just frees the thread to work on other. async within same goroutine is still very much opt in (you have to manually create goroutine that writes to channel that you then receive on), it just isn't needed where "spawn a thread for each connecton" costs you barely few kb per connection.
except when a RAM fetch is so expensive a load is basically an async call - and it's a single machine code instruction at the same time
Every explanation of the feature starts with managing callback hell.
Threads offer concurrent execution, async (futures) offer concurrent waiting. Loosely speaking, threads make sense for CPU bound problems, while async makes sense for IO bound problems.
For problems that aren't overly concerned with performance/memory, yes. You should probably reach for threads as a default, unless you know a priori that your problem is not in this common bucket.
Unfortunately there is quite a lot of bookkeeping overhead in the kernel for threads, and context switches are fairly expensive, so in a number of high performance scenarios we may not be able to afford kernel threading
But what you said about kernel implementation is true. But are we really saying that the primary motivation for async/await is performance? How many programmers would give that answer? How many programs are actually hitting that bottleneck?
Doesn’t that buck the trend of every other language development in the past 20 years, emphasizing correctness and expressively over raw performance?
Of course - what else would it be? The whole async trend started because moving away from each http request spawning (or being bound to) an OS thread gave quite extreme improvements in requests/second metrics, didn't it?
The original motivation for not using OS threads was indeed performance. Async/await is mostly syntax sugar to fix some of the ergonomic problems of writing continuation-based code (Rust more or less skipped the intermediate "callback hell" with futures that Javascript/Python et al suffered through).
It's all nuanced and what to choose requires careful evaluation.
Most stacks are tiny and have bounded growth. Really large stacks usually happen with deep recursion, but it's not a very common pattern in non-functional languages (and functional languages have tail call optimization). OS threads allocate megabytes upfront to accommodate the worst case, which is not that common. And a tiny stack is very fast to copy. The larger the stack becomes, the less likely it is to grow further.
>cannot have pointers to stack objects
In Go, pointers that escape from a function force heap allocation, because it's unsafe to refer to the contents of a destroyed stack frame later on in principle. And if we only have pointers that never escape, it's relatively trivial to relocate such pointers during stack copying: just detect that a pointer is within the address range of the stack being relocated and recalculate it based on the new stack's base address.
Yes, you're not getting Rust performance (tho good part of it is their own compiler vs using all LLVM goodness) but performance is good enough and benefits for developers are great, having goroutines be so cheap means you don't even need to do anything explicitly async to get what you want
Now the languages that don't offer choice is another matter.
I also want to address something that I've seen in several sub-threads here: Rust's specific async implementation. The key limitation, compared to the likes of Go and JS, is that Rust attempts to implement async as a zero-cost abstraction, which is a much harder problem than what Go and JS does. Saying some variant of "Rust should just do the same thing as Go", is missing the point.
For now the best option to write code that wants to live in both worlds is sans-io. Thomas Eizinger at Fireguard has written a good article about this[1] pattern. Not only does it nicely solve the sync/async issue, but it also makes testing easier and opens the door to techniques like DST[2]
I have my own writing on the topic[3], which highlights that the problem is wider than just async vs sync due to different executors.
0: https://github.com/rust-lang/effects-initiative
1: https://www.firezone.dev/blog/sans-io
2: https://notes.eatonphil.com/2024-08-20-deterministic-simulat...
3: https://hugotunius.se/2024/03/08/on-async-rust.html
Broadly I think there are three approaches:
1. For frequent and small CPU heavy tasks, just run them on the IO threads. As long as you don't leave too long between `.await` points (~10ms) it seems to work okay.
2. Run your sans-io code on a dedicated CPU thread and do IO from an async runtime. This introduces overhead that needs to be weighed against the amount of CPU work.
3. Have the sans-io code output something like `Output::DoHeavyCompute { .. }` and later feed the result back as `Input::HeavyComputeResult { .. }`, in the middle run the work on a thread pool.
I somehow miss noticing that in C++ and I have no idea how it is working in other domains.
My only gripe is that a lot of it is feeling a bit kick-starter-y, with each of the goals needing specific funding. Is that the best model we've found so far?
There seems to be some consensus even within the C++ ISO committee that the evolution process of that language is somewhat broken, mostly due to its size and the way it is organized.
> My only gripe is that a lot of it is feeling a bit kick-starter-y, with each of the goals needing specific funding. Is that the best model we've found so far?
Sadly, this seems to be the way things go once a technology catches on, commercially. Can't blame large donors for sponsoring only the parts they are interested in. Fortunately, considerable funding of TweedeGolf comes from (Dutch) government, I think.
You can 'sell' new features. They cost money to create, but they solve real problems. Those problems also cost money and if that's more than the cost of creating the feature, companies are willing to put in money (generally).
Maintenance is harder. But there are now some maintainer funds! Like the one from RustNL: https://rustnl.org/maintainers/ These are broader ongoing work and backed by many orgs chipping in a little bit.
Idk if it's the best model, but at least it seems to kinda work
I never really liked the viral nature of async in rust when it was introduced.
I wish rust the best of luck and with more people like this rust could have a brighter future.
In my programming language I wrote custom pass for inlining async function calls within other async functions. It generally works and allows to remove some boilerplate, but it increases result binary size a lot.
Technically Rust can do the same.
Examples in the blog seem too simple make any conclusions
So yes, it does really matter. Keep in mind that optimizations stack. We're preventing LLVM from doing it's thing. So if we make the futures themselves smaller, LLVM will be able to optimize more. So small changes really compound.
The risk they took was very calculated. Unfortunately they’re bad at math and chose the wrong trade-offs.
Ah well. Shit happens.
They chose the exact same tradeoffs as C++'s async/await (and the same overall model as Python/NodeJS), so I'm not sure what that says about programming as a whole.
Not to mention Tokio (most popular runtime for Rust) is multi-threaded by default. So you have to deal with multithreading bugs as well as normal async ones. That is not the case with most async languages. For example both Python and NodeJS use a single thread to execute async code.
Python still has pluggable eventloops - this is sort of mandatory to interact with weird things like GUI toolkits, and Python's standard event loop was standardised pretty late in the game. Early on there was even an ecosystem split between Twisted and competing event loops implementations.
> For example both Python and NodeJS use a single thread to execute async code
I'd argue this is more a historical artefact of how the languages functioned before futures were introduced, rather than an inherent limitation.
You could've deduced that from the fact that someone who puts this amount of energy in a detailed article about intricacies of an area of "foo", quite certainly does not "hate on foo".
The article is fine besides the bait title.
I don't know enough about the domain to be objectively helpful, so it's all wishy-washy feelings on my part. I keep reaching for orchestrating things with threads in Rust where most people would probably reach for async these days. The only language where I've felt fine embracing the blessed async system is Haskell and its green threads (which I understand come with their own host of problems).