Back to News
Advertisement
MManyaGhobadi about 5 hours ago 8 commentsRead Article on systalyze.com

FR version is available. Content is displayed in original English for accuracy.

The standard GPU utilization metric reported by nvidia-smi, nvtop, Weights & Biases, Amazon CloudWatch, Google Cloud Monitoring, and Azure Monitor is highly misleading. It reports the fraction of time that any kernel is running on the GPU, which means a GPU can report 100% utilization even if only a small portion of its compute capacity is actually being used. In practice, we've seen workloads with ~1–10% real compute throughput while dashboards show 100%.

This becomes a problem when teams rely on that metric for capacity planning or optimization decisions, it can make underutilized systems look saturated.

We're releasing an open-source (Apache 2.0) tool, Utilyze, to measure GPU utilization differently. It samples hardware performance counters and reports compute and memory throughput relative to the hardware's theoretical limits. It also estimates an attainable utilization ceiling for a given workload.

GitHub link: https://github.com/systalyze/utilyze

We'd love to hear your thoughts!

Advertisement

⚡ Community Insights

Discussion Sentiment

100% Positive

Analyzed from 115 words in the discussion.

Trending Topics

#gpu#actually#load#power#usage#care#track#more#nvidia#smi

Discussion (8 Comments)Read Original on HackerNews

uberduperabout 1 hour ago
There's a few dimensions you can look at for gpu load. Probably the easiest indirect metric to watch for gpu load is power usage.

But if you really care about this, you should actually profile your application. nsight systems makes this pretty simple to do. Dunno how many actually care about having a TUI.

jhggabout 2 hours ago
We just track power utilization.
xtimecrystalabout 2 hours ago
One small suggestion: add more GPU stats to your tool.

At the moment (v0.1.3) it is more helpful for compute visualization but keeping track of memory usage/processes/temperature/fan speed/etc. prevent this from becoming a full-on drop-in replacement for `nvidia-smi` for me.

latchkeyabout 1 hour ago
You mention rocm-smi in your blog post, but you don't actually support AMD gpus?
nawiabout 2 hours ago
Hi, many thx, does the os can run on nvidia jetson and orin? Or just for server gpu?