Decoupling Compute and Memory for Async GPUs
7
yyiyingzhang about 6 hours ago 2 comments
ES version is available. Content is displayed in original English for accuracy.
Cool open-source project that introduces a new programming model for decoupling compute and memory for NVIDIA GPUs that supports asynchronous memory operations (e.g., Hopper). 12% perf improvement over SOTA and 67% less kernel code.
Paper: "VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU" arXiv:2605.03190

Discussion (2 Comments)Read Original on HackerNews