ES version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
71% Positive
Analyzed from 4720 words in the discussion.
Trending Topics
#rocm#amd#nvidia#cuda#vulkan#gpu#support#more#don#years

Discussion (135 Comments)Read Original on HackerNews
It has been a bit of a nightmare and had to package like 30+ deps and their heavily customized LLVM, but got the runtime to build this morning finally.
Things are looking bright for high security workloads on AMD hardware due to them working fully in the open however much of a mess it may be.
It truly is a nightmare to build the whole thing. I got past the custom LLVM fork and a dozen other packages, but eventually decided it had been too much of a time sink.
I’m using llama.cpp with its vulkan support and it’s good enough for my uses. Vulkan so already there and just works. It’s probably on your host too, since so many other things rely on it anyway.
That said, I’d be curious to look at your build recipes. Maybe it can help power through the last bits of the Alpine port.
Agreed on the others though. Why's it even installing ncurses, surely that's just expected to be on the system?
You don't trust Nvidia because the drivers are closed source ?
I think Nvidia's pledged to work on the open source drivers to bring them closer to the proprietary ones.
I'm hopping Intel can catch up , at 32GB of VRAM for around 1000$ it's very accessible
It's like running Windows in a VM and calling it an open source Windows system. The bootstrapping code is all open, but the code that's actually being executed is hidden away.
Intel has the same problem AMD has: everything is written for CUDA or other brand-specific APIs. Everything needs wrappers and workarounds to run before you can even start to compare performance.
https://developer.nvidia.com/blog/nvidia-transitions-fully-t...
For some workloads, the Arc Pro B70 actually does reasonably well when cached.
With some reasonable bring-up, it also seems to be more usable versus the 32gb R9700.
...I have a feeling you might not be at liberty to answer, but... Wat? The hell kind of "I must apparently resist Reflections on Trusting Trust" kind of workloads are you working on?
And what do you mean "binaries only built using a single compiler"? Like, how would that even work? Compile the .o's with compiler specific suffixes then do a tortured linker invo to mix different .o's into a combined library/ELF? Are we talking like mixing two different C compilers? Same compiler, two different bootstraps? Regular/cross-mix?
I'm sorry if I'm pushing for too much detail, but as someone whose actually bootstrapped compilers/user spaces from source, your usecase intrigues me just by the phrasing.
Beyond the fact they're competing with the most valuable companies in the world for talent while being less than a decade past "Bet the company"-level financial distress?
I would love AMD to be competitive. The entire industry would be better off if NVIDIA was less dominant. But AMD did this to themselves. One hundred percent.
I'm not talking gaming CUDA, but CV and data science workloads seem to scale well on Arc and work well on Edge on Core Ultra 2/3.
(1) - Supporting only Server grade hardware and ignoring laptop/consumer grade GPU/APU for ROCm was a terrible strategical mistake.
A lot of developers experiments first and foremost on their personal laptop first and scale on expensive, professional grade hardware later. In addition: some developers simply do not have the money to buy server grade hardware.
By locking ROCm only to server grade GPUs, you restrict the potential list of contributors to your OSS ROCm ecosystem to few large AI users and few HPC centers... Meaning virtually nobody.
A much more sensible strategy would be to provide degraded performance for ROCm on top of consummer GPUs, and this is exactly what Nvidia with CUDA is doing.
This is changing but you need to send a clear message there. EVERY new released device should be properly supported by ROCm.
- (2) Supporting only the two last generations of architecture is not what customers want to see.
https://rocm.docs.amd.com/projects/install-on-linux/en/docs-...
People with existing GPU codebase invests significant amount of effort to support ROCm.
Saying them two years later: "Sorry you are out of update now!" when the ecosystem is still unstable is unacceptable.
CUDA excels to backward compatibility. The fact you ignore it entirely plays against you.
(3) - Focusing exclusively on Triton and making HIP a second class citizen is non-sensical.
AI might get all the buzz and the money right now, we go it.
It might look sensible on the surface to focus on Python-base, AI focused, tools like Triton and supporting them is definitively necessary.
But there is a tremendous amount of code that is relying on C++ and C to run over GPU (HPC, simulation, scientific, imaging, ....) and that will remain there for the multiple decades to come.
Ignoring that is loosing, again, custumers to CUDA.
It is currently pretty ironic to see such a move like that considering that AMD GPUs currently tend to be highly competitive over FP64, meaning good for these kind of applications. You are throwing away one of your own competitive advantage...
(4) - Last but not least: Please focus a bit on the packaging of your software solution.
There has been complained on this for the last 5 years and not much changed.
Working with distributions packagers and integrating with them does not cost much... This would currently give you a competitive advantage over Nvidia..
NVidia is acknowledging Python adoption, with cuTile and MLIR support for Python, allowing the same flexibility as C++, using Python directly even for kernels.
They seem to be supportive of having similar capabilities for Julia as well.
The IDE and graphical debuggers integration, the libraries ecosystem, which now are also having Python variants.
As someone that only follows GPGPU on the side, due to my interests in graphics programming, it is hard to understand how AMD and Intel keep failing to understand what CUDA, the whole ecosystem, is actually about.
Like, just take the schedule of a random GTC conference, how much of it can I reproduce on oneAPI or ROCm as of today.
I am critical of AMD for not fully supporting all GPUs based on RNDA1 and RDNA2. While backwards compatibility is always better than less for the consumer, the RX 580 was a lightly-updated RX 480, which came out in 2016. Yes, ROCm technically came out in 2016 as well, but I don't mind acknowledging that it is a different beast to support the GCN architecture than the RDNA/CDNA generations that followed (Vega feels like it is off on an island of its own, and I don't even know what to say about it).
As cool as it would be to repurpose my RX 580, I am not at all surprised that GCN GPUs are not supported for new library versions in 2026.
I would be MUCH more annoyed if I had any RDNA1 GPU, or one of the poorly-supported RDNA2 GPUs.
It's not ideal. CUDA for comparison still supports Turing (two years older than RDNA 2) and if you drop down one version to CUDA 12 it has some support for Maxwell (~2014).
Vulkan backend of Ollama works fine for me, but it took one year or two for them to officially support it.
A few years ago I thought I had used the ROCm drivers/libraries with hashcat on a RX580
Now it’s obsolete ?
Personally I opted in to being NVIDIA-vendor-locked a couple of years ago because I just couldn't stand the insanely bonkers and pointless complexity of APIs like Vulkan. I used OpenGL before which supported all vendors, but because newer features weren't added to OpenGL I eventually had to make the switch.
I tried both Vulkan and CUDA, and after not getting shit done in Vulkan for a week I tried CUDA, and got the same stuff done in less than a day that I could not do in a whole week in Vulkan. At that moment I thought, screw it, I'm going to go NV-only now.
Additionally Vulkan has proven to be yet another extension mess (to the point now are actions try to steer it back on track), Khronos is like the C++ of API design, while expecting vendors to come up with the tools.
However, as great as CUDA, Metal and DirectX are to play around with, we might be stuck with Khronos APIs, if geopolitcs keep going as bad or worse, as they have been thus far.
"Anush's success is due to opting out of internal bureaucracy than anything else. most Claude use at AMD goes through internal infrastructure that can take hundreds of seconds per response due to throttling. Anush got us an exemption to use Anthropic directly. he is also exempt from normal policies on open source and so I can directly contribute to projects to add AMD support. He's an effective leader and has turned ROCm into a internal startup based in California. Definitely worth joining the team even if you've heard bad things about AMD as a whole."
This kind of bullshit is why I don't want to join AMD, even if this particular team is temporarily exempt from it.
It's crazy that this is a big deal.
I understand the need for some kind of governance around this but for it to require a special exemption just shows how far the AMD culture needs to shift.
The advantages for ROCm would be integration into existing codebases/engineer skillsets (e.g. porting an existing C++ implementation of something to the GPU with a few attributes and API calls rather than rewriting the core kernel in something like GLSL and all the management vulkan implies).
I don't think this is true. ROCm is a huge advantage for Nvidia but as far as I can tell it is more a set of R&D libraries than anything else, so all the Hot New Stuff keeps being Nvidia first and only (to start with) as the library ecosystem for the hotness doesn't exist yet. Then eventually new libraries are created that are CUDA independent and AMD turns out to make pretty good graphics cards.
I wouldn't be surprised of ROCm withered on the vine and AMD still does fine.
Meanwhile nvidia just dropped CUDA/driver support for 1xxx series cards from their most recent drivers this year.
For me ROCm's mayfly lifetime is a dealbreaker.
Seems like they're making some effort in that direction at least. If you have specific concerns, maybe try hitting up Anush Elangovan on Twitter?
Hmmm
https://rocm.docs.amd.com/en/latest/compatibility/compatibil...
I suspect, given AMDs relative openness vs. nvidia, even consumer-level stuff released today will end up with a longer useful life than current nvidia stuff.
I could be wrong, of course. I've taken the gamble...the last nvidia GPU I bought was a 3070 several years ago. Everything recent has been AMD. It's half the price for nearly competitive performance and VRAM. If that bet turns out wrong, I'll just upgrade a little sooner and still probably end up ahead. But, I think/hope openness will win.
Also, nvidia graphics drivers on Linux are a pain in the ass that I didn't want to keep dealing with. I decided it wasn't worth the hassle, even if they're better on some metrics. I've been able to run everything I've tried on an AMD Strix Halo and an old Radeon Pro V620 (not great, but cheap, compared to other 32GB GPUs and still supported by current ROCm).
It is Nvidia that has the track record of closed drivers and insisting on doing all software dev without community improvements to expected results.
The defacto GPU compute platform? With the best featureset?
Also pretty hard to beat a Strix Halo right now in TPS for the money and power consumption.
Even that aside there exist plenty like me that demand high freedom and transparency and will pay double for it if we have to.
LLM’s run great on it, it’s happily running gemma4 31b at the moment and I’m quite impressed. For the amount of VRAM you get it’s hard to beat, apart from the Intel cards maybe. But the driver support doesn’t seem to be that great there either.
Had some trouble with running comfyui, but it’s not my main use case, so I did not spent a lot of time figuring that out yet
May I ask, what kind of tok/s you are getting with the r9700? I assume you got it fully in vram?
Edit: I misread the "2x r9700" as "2 rx9700" which differs from the topic of this comment (about RNDA4 consumer SKUs). I'll keep my comment up, but anyone looking to get Radeon PRO cards can (should?) disregard.
I really wish AMD and Intel boards get replaced by competent people. They could do it in very short time. Both have integrated GPUs with main memory. AMD and Intel have (or at least used to have) serious know-how in data buses and interconnects, respectively. But I don't see any of that happening.
ROCm? It can't even support decent Attention. It lacks a lot of features and NVIDIA is adding more each year. Soon they will reach escape velocity and nobody will catch them for a decade. smh
All those people who bought them for openclaw just bought them because it was the trendy thing to do. No one of those people is running local models on there.
It's pretty insane how overpriced NVIDIA hardware is.
And a 5090 has a little over 2x the memory bandwidth - ~820GB/s vs ~1790GB/s. And significantly higher peak FLOPS on the 5090 too.
Sure, if the goal is to get the "Cheapest single-device system with 256GB ram" it looks pretty good, but there's lots of other axes it falls down on. Great if you know you don't care about them, but not "Better In Every Way". Arguably, better in only a single way - but that single way may well be the one you need.
And the current 5090 price might be a transient peak - only three months ago they were closer to $2500 - significantly less than half the $6000 base-spec 256GB Mac Studio. While the Mac Studio has been constant.
Running games on my loaded M4 Max is worse than on my 3090 despite the over-four-year generational gap.
Like, Pacific Drive will reach maybe 30fps at less than 1080p whereas the 3090 will run it better even in 4K.
That could just be CrossOver's issue with Unreal Engine games, but "just play different games" is not a solution I like.
Intel? Agreed. But AMD is making money hand over fist with enterprise AI stuff.
Right now, any effort that AMD or NVIDIA expend on the consumer sector is a waste of money that they could be spending making 10x more at the enterprise level on AI.
AI won't be working in nVidia's advantage this time.
It's a huge step though.
Of course everybody knows what's really going on here. It's not an open discussion, however.
From all the existing examples, it really looks the most interesting.
I.e. what I'm surprised about is lack of backing for it from someone like AMD. It doesn't have to immediately replace ROCm, but AMD would benefit from it advancing and replacing the likes of CUDA.
We've started a company around Rust on the GPU btw (https://www.vectorware.com/), both CUDA and Vulkan (and ROCm eventually I guess?).
Note that most platform developers in the GPU space are C++ folks (lots of LLVM!) and there isn't as much demand from customers for Rust on the GPU vs something like Python or Typescript. So Rust naturally gets less attention and is lower on the list...for now.
> Note: This project is still heavily in development and is at an early stage.
> Compiling and running simple shaders works, and a significant portion of the core library also compiles.
> However, many things aren't implemented yet. That means that while being technically usable, this project is not yet production-ready.
Also projects like rust gpu are built on top of projects like cuda and ROCm they aren’t alternatives they are abstractions overtop
What I meant more is the language of writing GPU programs themselves, not necessarily the machinery right below it. Vulkan is good to advance for that.
I.e. CUDA and ROCm focus on C++ dialect as GPU language. Rust GPU does that with Rust and also relies on Vulkan without tying it to any specific GPU type.
They just don't care enough to compete.