Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
100% Positive
Analyzed from 271 words in the discussion.
Trending Topics
#amx#sme#gpu#performance#still#tflop#article#swift#instructions#truly
Discussion Sentiment
Analyzed from 271 words in the discussion.
Trending Topics
Discussion (6 Comments)Read Original on HackerNews
Even for those who don’t care about LLM use, this is just a great article on optimizing Swift performance, which is sadly something that doesn’t have a lot of written material for.
I’m curious if the AMX instructions are truly secret. In theory you could use an M4 or above and get them via SME I think but I’m just guessing as I’ve never tried intrinsic from Swift myself.
I have no idea what this means - AMX was replaced by SME on M4. It's a new unit not just an "abstract intrinsic" (which would make zero sense).
What I’m saying is that instead of using the secret AMX instructions, just use SME , assuming they have the hardware available to them.
AMX isn’t truly gone afaik , at least according to the folks who have been looking at it. It’s just deprecated and it seems like the architecture treats them somewhat like aliases, preventing concurrent use within a process.
This is so true. And also why people should not take basic GPU benchmarks so seriously. Getting peak performance out of a GPU is much more complex than it is with a CPU.
And it is one of the reasons why Nvidia still has a software moat compared to other GPU companies. CUDA has so many small kernels tuned for getting peak performance for your dataset.
https://siboehm.com/articles/22/CUDA-MMM