ES version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
67% Positive
Analyzed from 4613 words in the discussion.
Trending Topics
#rust#code#more#performance#performant#compiler#compile#language#level#checks

Discussion (128 Comments)Read Original on HackerNews
I think most of this is attributable to the ergonomics of compile-time expressiveness. C++ can effortlessly do things that require mountains of ugly boilerplate and macros in C or Rust. In principle they can express the same things but no one wants to write or deal with that ugly boilerplate so the equivalency is never realized in real code bases.
Zig is interesting because it slots in as a C-like language with a competent and expressive compile-time story. I don’t use Zig but I recognize its game.
If anything, Rust has the potential to be more performant than C due to its aliasing rules (C has `restrict` but it's rarely used, standard C++ does not have even that). The current perf stats show it does make Rust code faster but just a little bit, although we don't utilize the full optimization potential currently (LLVM does not do many possible optimizations here, and `noalias` is weaker than Rust's aliasing rules). It can also affect autovectorization, and if it does the effect could be dramatic.
The poor applicability of auto-vectorization is another area where C++ is strong. You can transparently codegen e.g. AVX512 from intrinsics directly in C++ in contexts that would be opaque to auto-vectorization and difficult to generalize in C. This allows you to get some degree of “auto-vectorization” where the compiler can’t see it because it works at the wrong level of abstraction.
With sufficiently heroic efforts you can write C that matches the performance of C++. I’m not arguing that. Virtually no one writes C to that standard, including myself when I was writing high-performance C because the effort was too high, so it is a bit of a strawman.
It is the difference between theory and practice. All code bases have a finite budget. C++ can do a lot more optimization in the same budget as C.
But this is not a valid argument, as all languages are Turing complete, and most modern languages can do low level stuff at optimum speeds. As an extreme example, in Java, you could just allocate a large chunk of memory and run an allocator inside of it and sidestep the GC entirely.
With a programming language the question is thus not what can you do with it and how fast can it run with infinite effort, but what are the ergonomics, and what performance will you get in practice.
With LTO you get many of the same advantages as C++ template code, there's nothing magic about C++ template optimizations, it's all about whether the compiler can see all function bodies in a call hierarchy.
> The only candidate is using virtualization and void* pointers instead of monomorphized generics which some C code does for the lack of better options, but that's not a problem in Rust as well.
But in fact, if speed is a concern to you, even in C you will use "templated" sorting (via macros or code generation).
Restrict could make things go different but I've never heard someone say otherwise.
Note that we are talking about differences that are tiny here. They can be measured if you are careful but they are almost guaranteed to not be something anyone would notice if they were not measuring
At the compiler level, no. But as you write projects, you will for instance run into things you can do with templates which are infeasible to attempt with macros.
One example might be qsort() - a C compiler _could_ catch cases where it could create an intrinsic qsort based on the data type and function pointer being passed. However, in C++ you have the facilities to create a type safe, genericized sort that will be inlined based on the data structure.
Eg: delete_scene(void *arg) vs delete_scene<T>(T *arg)
In Twitter a user explained me that it is common in embedded space.
You do not need the OOP, RTTI, exceptions.
Like C with most use cases of preprocessor replaced by generic programming.
Please provide proof for this outrageous statement.
Compile time reflection will make this even easier.
Modern Fortran has a lot to offer for scientific and numeric computation - easier to learn than C++, and easier to optimize in many cases. Scales from small systems to supercomputers, and there is even CUDA Fortran.
Nobody uses "performant" to refer to any of those. It usually means either high throughput, or some aggregate of high throughput + low latency + low memory usage.
This type of code tends to be hard to maintain though.
AFAIK you can get there in Rust but it's a little more cumbersome. You have to implement a lot of operators, and for that type of code you might actually benefit from #[inline(always)] which is discouraged in normal Rust.
Depends on which C++ version one needs to support, in C++20 and later, it is relatively maintainable with concepts, constexpre, and reflection.
I don't think this holds. Rust has the same facilities which C++ has. Rust's metaprogramming capabilities are now on par with C++ (they weren't always). Rust has a similar generics implementation which allows it to do what C++ does in terms of method dispatch and generation. And now Rust has pretty much the same compile time constant generation capabilities that C++ has.
I don't think there's a part of C++ which isn't in Rust at this point. The only thing potentially missing is the experience and investment using those features.
Is that really true, though? I haven't really written any Rust code, so I have no idea, but I don't think Rust has static reflection. Also, aren't const generics much more limited? I've also heard there is no template specialization and no "if constexpr". Or what about dynamic allocations in constexpr functions?
Before C++ in fact through procedural macros. You can do everything you can do with C++ static reflection.
Now, it could be better. Proc macros require you to pull in secondary packages for parsing the token stream. But all the sorts of operations you can do via static reflection you can do via proc macros. That's how the most popular rust serialization package like serde works. It's also how some more popular database libs work like sqlx.
> Also, aren't const generics much more limited? I've also heard there is no template specialization and no "if constexpr".
Both have been added and expanded. AFAIK they are now roughly on par with what C++ const expressions can do. What they can't do, proc macros can do.
> Or what about dynamic allocations in constexpr functions?
IDK if that's possible in rust. Const expr capabilities of rust have been rapidly expanding though in the last year.
Several features on C23 were done thanks to his work.
Also compile time execution is much more rich in C++ than Rust, regardling language features and standard library that can be used at compile time.
Naturally none of the languages is standing still, and they will both improve on that regard.
There were also some bugs (hence disabled optimization passes) and missed opportunities from the lack of aliasing Rust precipitates - again, not sure where those sit - and GCC will have to play catch up here (unless there are other languages that exercise this part of the backend).
In practice, some of the cases about specialization that was made possible with C++ constructs is also achieved by modern C compilers.
This really needs more realworld evidence to back up the claim. In the end the important optimizations happen down in the Clang optimizer passes on the LLVM IR, and those optimizations are the same across C, C++, Rust (or Zig for that matter) - assuming of course that the optimizer can see all function bodies, which in C can be achieved via LTO or alternatively via 'unity builds'.
If the output of one of those languages differs so much (on an LLVM-based compiler) that there are noticeable performance differences I would start investigating whether there's a compile/link setting missing somewhere instead.
https://www.godbolt.org/z/n3Y54Yhqr
This is basically the gist of C++ 'zero cost abstraction', but C-style (the bulk of what enables C++ zero-cost-abstraction doesn't happen up in the language, but down in the optimization passes).
To me programming Rust feels so limiting due to lack of good compile time meta programming with types. That’s the key.
The entire concept of the "performance of a PLang" in terms of the run-time of programs written "mostly in it" is rather seriously under-specified, TBH. This is (or should be) uncontentious in spite of the slew of articles with titles like the one for this thread.
It shares some of the same drawbacks as C++, though. The language is extremely powerful, so while it is easy to write performant code, it is also easy for non experts to write very suboptimal code.
So not generally fast, no.
But not at the same time
While you can write high performance C++ my experience is that many people will reach for shared_ptr and their like while Rust will force them into proper structure/ownership as Arc and their like have a lot higher friction.
It's true that you can express many things in C++ -- the problem is that the language deliberately doesn't distinguish whether the things you've expressed are nonsense, so you might well have written total nonsense and you only find out when, much later, diagnosing a real world event you discover oh, this is nonsense, why did this even compile? Well sorry, it was "more performant" to allow nonsense.
Compounding on this, Rust is also unstable underneath, since there is no public, stable contract for carrying high-level semantics from HIR into MIR. Because these high-level invariants are lost during compilation, the compiler cannot easily use them to prove and eliminate low-level safety checks. But even if the frontend was perfect, Rust relies on LLVM's language-neutral SCEV, which operates purely on low-level math and cannot reason about high-level language semantics.
Ultimately, a lot of things would need to change for Rust to pay no performance for safety features.
Not sure if I'm just out of the loop, but I'm having a hard time following this line of reasoning. Why is a public and/or stable contract needed to carry high-level semantics from HIR to MIR? Neither seems necessary to me; from what I understand HIR and MIR are rustc-internal so public contracts shouldn't matter, and the lack of stability means the Rust devs aren't precluded by backwards compatibility from modifying the IRs to add the ability to carry such invariants.
I dug up a proposal from 2021 around bounds check hoisting in MIR, and from the discussion, details are pretty thorny [0]. It's narrower than general proofs but the frictions are very similar. The easiest example that shows HIR -> MIR difficulties is the part around `for i in 0..32 { a[i] = 1; }`. At the source level the range fact is super obvious, but after the for-loop/iterator lowering the MIR optimiser has to recover that `i` comes from exactly that range before it can turn 32 checks into the one hot-path check. Then it also would have to check for panic strategy to maintain the correct behaviour after optimisation.
[0] https://github.com/rust-lang/rust/issues/92327
a[0..32].iter_mut().for_each(|el| *el = 1)
and have per-iteration bounds checks elided in Rust today.
There are techniques to minimize the perf loss, though (safely), and of course you can use unsafe code. If you do it smartly, in the vast majority of cases bound checks do not matter (in fact, even in C++ there is a push for a hardened standard library that does bound checks, and e.g. Google uses that).
Rust will never include full proofs, but it might include ranged integers which can minimize bound checks even more.
Actually nm, I forgot those are disabled in release mode. Good decision I guess.
But no, "memory safety" includes most of the things discussed on the slides, and those number are for bounds checking only.
Link : https://godbolt.org/z/qT4TsKPxz
Need to try an example where the size isn't known until run time.
Even if the panic message would not include the index, LLVM was unable to do that if the previous iterations had side effects (for example if `tab` is not a local variable).
What the compiler is allowed to do is to shorten the loop by one and unconditionally panic after the loop, but this falls under the purview of the LLVM optimizer.
That does create a problem for early panics, panicking when panic becomes inevitable but has not happened yet. This deserves more thought.
Templates in C++ benefit from being part of the core language, -- stick a `template` above your `class`, and you're in metaprogramming land. Stick a template specialization, and you've done a niche optimization. You didn't need a separate crate or a whole macro DSL. Variadic templates are also really really nice for monomorphizing N-ary generic functions. The duck typing of templates makes
This is precisely where I struggle with Rust the most -- monomorphization is limited within generics, so you end up going to the `proc_macro` hell, which involves a separate crate, a separate Cargo.toml, etc.
Zig seems like it would fit the bill -- and doing micro-optimizations within zig is surprisingly easy. The language's comptime facilities allow for really good niche optimizations -- however, the language also has some strange decisions. The allocator interface is notoriously a vtable, so a lot of the DOD optimizations that andrewrk has spoken numerously of (and to be clear -- I did learn a lot about DOD from his talks back when I was a wee engineer), raise one of my eyebrows.
C seems like it should be fast, but implementing any data structure, any generic algorithm in C is impossible. Either you're copy-pasting, or you're making macro DSLs. None of which is great.
---
To further talk about the C++ situation -- the monomorphic allocator interface was always awesome. Compared to Zig's vtables and Rust's nothing (up until a couple days ago), having a way to pass custom allocators with types was awesome. The new std::pmr::* interfaces and containers are also really exciting -- monomorphization, as beautiful as it is, does cost a lot -- refactoring it is not easy, compilation times are a mess. Sometimes the right tool is a vtable interface, and, C++ gives you those facilities.
And this is C++'s no1 problem when it comes to performance too -- it's a leviathan -- it'll give you the tools to write REALLY fast code, but it will also give you inheritance -- forget about your caches then.
When I was working at Tesla, there were some pretty gnarly vtable jumps in firmware (of all places), and I suspect part of that could've been alleviated if people knew more about CRTP.
So, here's where I land -- C++ really will give you the tools it can to let you write the fastest code possible. But it will also give you the tools to make your code really slow. Committee language means everyone in the committee needs to be happy.
Rust, on the other hand, is really designed to promote safe-but-very-fast practices -- had the firmware that I discussed used Rust, my guess is that we would've gravitated towards generics and monomorphization, rather than the heavy dynamic inheritance. C++, when it comes to performance, as it does to all other things, is a barreled shotgun. Rust's design almost always promotes the best available pattern and that's why I rarely reach out for C++.
There are also some optimization tricks related to how you split your code among crates, since a unit of compilation is mostly one crate. Putting your FFI code in a separate crate (-sys crates are the norm) and splitting some of your code in libraries that can be compiled in parallel are the common examples
Use the lld linker instead of the default one, see https://kerkour.com/rust-production-checklist#use-the-lld-li...
I just don't understand how people find this sort of thing normal. If you implement a feature, and then you want to see it in action, the feedback loop for that is insanely slow. It's incredibly jarring coming from Clojure where you have a live development experience.
I've used Go before, and while you still have to wait for compiling, at least the compiler is actually fast.
I get the problem Rust aims to solve, but the ergonomics are just not there in my opinion.
At some point though a lot of work will be able to start advancing at once, so long as people exist to do the work.
e.g. https://rust-lang.github.io/rust-project-goals/2026/parallel...
It is a matter of tooling, eventually they will get there I guess.
However as language nerd, I still think there is a possible way out of it.
[0] https://github.com/rui314/mold
1. you get parallelism across all your dependencies, but
2. your (final) crate is serial.
Splitting your crate can therefore get you parallelism back in step 2, which can be a (compilation) perf win. You can also get a caching benefit when there aren't changes, but even absent that you can get wins.
It can come at the cost of runtime performance though, as rust won't (by default) do things like inline across translation units iirc. You can get back some of this perf back via various tricks (for example link-time optimization).
Looks like the rust performance book has some writing on these tradeoffs. I haven't read these articles, but the table of contents on the left seems reasonable enough.
https://nnethercote.github.io/perf-book/build-configuration....
Again I haven't read that (so I'm assuming it says the following), but you can get "best of both worlds" by configuring debug builds to not do LTO (= faster compilation speed, slower runtime), and release builds to do LTO (= faster runtime, slower compilation speed). There are also variants of LTO that make various tradeoffs you could look into.
That article also links to the `cargo-wizard` subcommand of cargo
https://github.com/Kobzol/cargo-wizard
it won't auto-split crates for you (of course), but it does seem to give you some default configurations of `cargo`, one of which configures for faster compilation speed. could be an easy way to mess around with things.