Show HN: Llama.cpp Tutorial 2026: Run GGUF Models Locally on CPU and GPU

aanju-kushwaha about 15 hours ago 1 comments

ZH version is available. Content is displayed in original English for accuracy.

Complete llama.cpp tutorial for 2026. Install, compile with CUDA/Metal, run GGUF models, tune all inference flags, use the API server, speculative decoding, and benchmark your hardware.

https://vucense.com/dev-corner/llama-cpp-tutorial-run-gguf-m...

⚡ Community Insights

Discussion Sentiment

50% Positive

Analyzed from 55 words in the discussion.

Discussion (1 Comments)Read Original on HackerNews

CableNinja•25 minutes ago

Ive been trying to run local, effectively followed this guide (before the guide existed), and have not had any success. Llama builds fine, and then when i start it up, it just indefinitely spins its progress bar. I left it sit for 3 days and nada.

Running on an 8core 12gb ram vm, which has an amd rx5500xt (8gb) passed through. ROCm built, llama built with the correct flags.

What am i missing?