Task Failed Successfully: Saturating NIC and Disk Bandwidth

serious_angel•about 1 hour ago

> Yes, it is true. NOT A SINGLE LINE!!

Considering that you "do not write a single line" and the, likely slop, article missing the actual script used for the benchmarking, it's impossible to know the actual benchmarking done, and the requirements to base on.

> But digging into those principles has always been one of my small obsessions as a programmer.

No, it hasn't been, I believe. It does no shows. What shows is that the article was written by your agent. The only part you done is named the flamegraph files, where is the typo in the title.

Why do you ask someone else to invest their precious finite human life time into this generative output surrounded by lies, fuss, indication of infant attitude towards technologies, possibly highlighting the fact that you still have no idea how wonderful and miraculous the technologies are...

You are not a developer, or if you are, please do consider your future of dependency on LLM-vendors, your lack of experience and in-depth knowledge, anxiety, accountability, and self-confidence.

I hope you'll reconsider your time you waste on sloppery slops instead of actually reading about the technologies and discuss subjects with accountable professionals to learn from and discover together... yet indeed... currently, you chose a lone life of generative output built on robbed articles like yours now defaced in the datasets of trained LLMs sold you by vendors for money... yet the actual genius people who are in the datasets of models are now unknown... The dear sorrow you could not care less...

Regardless, you do you, and I wish you safety, stability, and peace...

MasterScrat•10 minutes ago

This sounds like a strong statement with little backing. The author does infra at DeepSeek if his LinkedIn is to be trusted, and is the author of Foyer.

jeffbee•about 1 hour ago

It seems that you could have reached this conclusion faster by elaborating on your use of the profiler. Don't assume that cycles are spent on instructions. Look at your IPC and drill down into what CPU-bound means for your workload. In your case I think a standard top down analysis would have made the virtual memory management cost jump right out.

MrCroxx•4 days ago

Author here. This post is a write-up of a performance-debugging rabbit hole I hit while trying to saturate NICs with NVMe reads using io_uring and RDMA.

The short version: READ_FIXED fixed the obvious per-I/O GUP overhead in a small demo, but the larger deployment still got stuck at roughly half of line rate. After ruling out io-wq backlog, request splitting, fd lookup, and CRC arithmetic, the actual wall turned out to be dTLB misses from scanning 1,028 KiB buffers backed by 4 KiB pages. Moving the read arena to hugepages brought the system close to NIC saturation.

The funny part is that an AI agent suggested hugepages early and got the optimization right, but its explanation was wrong. This post is mostly about reconstructing the evidence for why it worked.

I’d be very interested in feedback from people who have used AI to debug performance issues in a complex system.

Task Failed Successfully: Saturating NIC and Disk Bandwidth

⚡ Community Insights

Discussion (4 Comments)Read Original on HackerNews