Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

40% Positive

Analyzed from 273 words in the discussion.

Trending Topics

#path#consumer#hot#between#producer#slow#byte#throughput#cpu#here

Discussion (11 Comments)Read Original on HackerNews

riyaneel•3 days ago
I am the author of this library. The goal was to reach RAM-speed communication between independent processes (C++, Rust, Python, Go, Java, Node.js) without any serialization overhead or kernel involvement on the hot path.

I managed to hit a p50 round-trip time of 56.5 ns (for 32-byte payloads) and a throughput of ~13.2M RTT/sec on a standard CPU (i7-12650H).

Here are the primary architectural choices that make this possible:

- Strict SPSC & No CAS: I went with a strict Single-Producer Single-Consumer topology. There are no compare-and-swap loops on the hot path. acquire_tx and acquire_rx are essentially just a load, a mask, and a branch using memory_order_acquire / release.

- Hardware Sympathy: Every control structure (message headers, atomic indices) is padded to 128-byte boundaries. False sharing between the producer and consumer cache lines is structurally impossible.

- Zero-Copy: The hot path is entirely in a memfd shared memory segment after an initial Unix Domain Socket handshake (SCM_RIGHTS).

- Hybrid Wait Strategy: The consumer spins for a bounded threshold using cpu_relax(), then falls back to a sleep via SYS_futex (Linux) or __ulock_wait (macOS) to prevent CPU starvation.

The core is C++23, and it exposes a C ABI to bind the other languages.

I am sharing this here for anyone building high-throughput polyglot architectures and dealing with cross-language ingestion bottlenecks.

zekrioca•about 3 hours ago
Why report p50 and not p95?
BobbyTables2•about 2 hours ago
Would be interesting to see performance comparisons between this and the alternatives considered like eventfd.

Sure, the “hot path” is probably very fast for all, but what about the slow path?

riyaneel•about 1 hour ago
eventfd always pays a syscall on both sides (~200-400ns) regardless of load. Tachyon slow path only kick in under genuine starvation: the consumer spins first, then FUTEX_WAIT, and the producer skips FUTEX_WAKE entirely if the consumer still spinning. At sustainable rates the slow path never activates.
JSR_FDED•about 3 hours ago
What would need to change when the hardware changes?
Fire-Dragon-DoL•about 2 hours ago
Wow, congrats!
riyaneel•about 1 hour ago
Thanks!
Fire-Dragon-DoL•43 minutes ago
I will be discussing this at work on monday, will let you know what they think.

I wouldn't be surprised if somebody develops a cross-language framework with this.

riyaneel•29 minutes ago
Would love to hear the feedback