FR version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
40% Positive
Analyzed from 680 words in the discussion.
Trending Topics
#cpus#intel#amx#avx#extension#amd#support#available#arm#server

Discussion (12 Comments)Read Original on HackerNews
Some of the latest generations of Intel server CPUs with P-cores already have the AMX matrix extension, which can be used to implement fast AI inference.
AMD has not implemented AMX yet, and probably they will not implement it, because this new "AI Compute Extension", which has been defined by Intel and AMD together, is an alternative/extension to AMX (ACE inherits some parts of AMX, but not all). It appears that the fate of Intel AMX will be the same as that of the original Apple undocumented AMX extension, which was replaced by the SME extension defined together with the Arm company (like Intel AMX will be replaced by ACE defined together with AMD).
Matrix extensions are more efficient for AI inference than vector extensions, because they reduce the ratio between memory accesses and computation operations.
However, I would like to have not only a matrix extension for AI, but also a matrix extension for all numeric formats until FP64, like in Arm/Apple SME or in the NVIDIA and AMD "datacenter" GPUs.
I can already immediately think of a use case for vunpackb in some of the stuff I'm working on, where we'd like to efficiently unpack weights from the high half of a vector.
Separately, adding all signed–unsigned variants of the VNNI dot product instructions is a welcome (albeit niche) change. There was an annoying divergence here between major ISAs: x86 added vpdpbusd which computed a dot product between u8 and i8, while ARM added vdotq, which computes a dot product either between u8 and u8 elements, or i8 and i8. So for broad compatibility, you generally had to restrict one of your inputs to [0,127]. This difference shows in the design of (for example) WASM relaxed SIMD, where the result of wasm.dot.i8x16.i7x16.add.signed is implementation-defined if you exceed the [0,127] range. ARM later added mixed-sign variants, and now x86 consummates it.
CPUs with ACE will in most cases replace CPUs that did not support AMX, so all the registers specified by ACE, but not by AVX10 a.k.a. AVX-512 are new.
It has been available for several years in all the Zen 4 and Zen 5 CPUs and it will be available in the Zen 6 CPUs that should be launched early next year (the launches of both the new Intel and AMD CPUs have been delayed for next year because of the memory price problem, which discourages computer upgrades).
A large fraction of the Intel server CPUs support AVX-512.
All the CPUs that will be launched by Intel from now on will support AVX-512. The CPUs launched by Intel in the first months of this year, i.e. Panther Lake, Wildcat Lake and Clearwater Forest, are the last Intel CPUs without AVX-512 support.
The only market segment where Intel still outsells AMD is in laptop CPUs, especially in corporate laptops, so indeed for now AVX-512 is still not supported in most new laptop and mini-PC CPUs.
Please define new. Also, I think AMD uses very similar cores in server and client. So, disabling AVX512 may be an Intel thing (my guess is that so they can easily move threads between E & P cores).
It's pretty surprising that multiple CPU vendors have run into issues like this (some more than once, fucking Samsung), when it's pretty much the first thing that anyone on the toolchain side of thing asks when they hear about heterogenous cores on a CPU.