FR version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
77% Positive
Analyzed from 1251 words in the discussion.
Trending Topics
#cores#ghz#memory#run#bandwidth#faster#bit#swallow#risc#spacemit

Discussion (6 Comments)Read Original on HackerNews
Is the LPDDR5 soldered or can you upgrade it? E: Looks like it's soldered, I wonder what the IMC(?) is actually good for
https://github.com/brucehoult/k3_ai
Or my longer top level comment.
Unlike a GPU or NPU, you can just run all your normal RISC-V Linux programs on the AI cores. Bash, gcc, emacs, nodejs ... whatever you want. It's an extra 40% of scalar processing power, for free.
The A100 cores, all by themselves, give more normal processing power e.g. `gcc` than any previous RISC-V SBC except the $2500 64 core Milk-V Pioneer.NVMe reads were faster! (Some interesting potential wins there, assuming you can get data from NVMe onto the core without going through main memory, a feature available since Sandy Bridge-EP (2011), in the form of Data Direct IO aka DDIO). I crack jokes about "PCIe speed ahead", but that's seemingly real here (at huge cost to latency, which CXL promises to remedy).
There is a non-zero chance the main cores cannot saturate what the memory controller can do, that the AI cores have some reserved bandwidth to themselves. I doubt it's going to double the memory bna
One absolute ecosystem gem from this article that I didn't know before: the fact that Orange PI 6 uses CrosEC, the embedded controller for Chromebooks (RIP i guess?). I wonder if this is the newer Zephyr Iot (awesome, also underlies Framework's new embedded controllers) or the older legacy version of CrosEC. Not spoken of flatteringly in this implementation, but it's super notable to me the borrowing of firmware from this place I didn't expect it! But there's good upstream kernel support so makes sense! https://chromium.googlesource.com/chromiumos/platform/ec/+/H...
One architectural nit I need to dig into that's interesting: the shared AI processors on the AI cores appear to have shared AI units. This reminds me a lot of AMD Bulldozer (2011), which had semi-independent CPUs but shared FPU. It was an interesting chip (still haven't disposed of my old FX-8320 server), but not well loved.
Really appreciate the dive into the matrix cores. That's going to take more time for me to look at, but: thanks. I notice the architecture diagram says all cores have AI instructions, not just the A100's. Presumably it's the same instruction set/features?
The memory bandwidth situation here feels so off. We've lived in a world where it's a battle for cores, where how many cores one could ship made chip empires rise and fall. Today, the memory bandwidth wars are on, and supplies are scarce. This looks like a fascinating board with amazing capabilities, but wow, that lack of memory bandwidth here is most surprising.
I'm running Qwen3-Coder-30B-A3B-Instruct-Q5_K_M.gguf on mine ... picked kind of at random from a web page as I'm a complete n00b at running local LLMs, have never used anything other than ChatGPT or (mostly) Grok.
I get 6-7 tok/s, which is slow for throwing around program code, but fine for general knowledge queries. It's a reasonable speed to read along as it outputs. Simple queries start to give output in about 2 seconds.
e.g. I dunno ..
Output started in about 2 seconds. Again, output starts in about two seconds.This is offline, no internet, and uses 14W while running all 8 A100 "AI" cores at max.
Is this useful? I mean, for something, right?
I asked it to review https://github.com/brucehoult/trv which is a total of 320 lines of code (I used `/read` on a tar file containing the two code files). It thought for 22 minutes before output started and then spent 8 minutes outputting comments at just over 6.5 tok/s.
Nothing there to scare Claude, but 30 minutes total is still faster than asking a colleague for a code review, and probably more comprehensive too. And it did it on about 0.25 cents of electricity.
> Turns out getting a thread onto the A100 cores requires a two-step handshake: > > write the thread’s TID to /proc/set_ai_thread (a kernel interface that unlocks scheduling on cores 8–15 for that specific thread) > then call sched_setaffinity to pin it.
If you want to just run arbitrary Linux programs on the A100 cores, I wrote a small assembly language launcher which does the above PID writing and then EXECs the thing you really want.
https://github.com/brucehoult/k3_aiAs normal CPUs the eight 2-wide in-order A100 cores (like an A53 or A55 or Pentium or PPC603) add about 40% normal scalar processing power to the eight X100 cores.
That's better than Hyperthreading and well worth using for some additional processing power. Just kick off a background build, or CI or something there while you do something else on the X100 cores. If you ignore the special "AI" matrix processing extension they are just perfectly normal RISC-V RVA23 cores as far as user code is concerned — and in fact significantly faster than the previous generation K1 chip.
A Linux kernel build on just the A100 "AI" cores is faster than on any previous RISC-V SBC under $1000, including the HiFive Premier P550 or Milk-V Megrez. It's several times faster than the VisionFive 2 or Milk-V Jupiter / BPI-F3.
The K3 is also faster than using QEMU/Docker on my 24 core i9-13900 laptop, and while using 25W instead of 200W.Note the fastest time using a distccd on the X100 cores and another distccd on the A100 cores. This adds a lot of overhead in preprocessing and communication over the network (loopback, but still). But it still gives a pretty nice boost. But running independent tasks on each set of cores is more efficient. Or teaching `gmake` or `ninja` to distribute to two pools of cores using my `ai` launcher would be even better ...
People have made the NPU on that thing do LLMs, and sounds like around the same level (max 3Bish params, 5-6 tok/s last time I tried).
In terms of raw CPU performance, sounds slower?
But maybe has more cores?
Ouch the memory bandwidth sounds really bad.