A few words on DS4

ccaust1c about 2 hours ago 28 commentsRead Article on antirez.com

DE version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

85% Positive

Analyzed from 666 words in the discussion.

Discussion (28 Comments)Read Original on HackerNews

minimaxir•about 1 hour ago

A relevant recent tweet from antirez: https://x.com/antirez/status/2054854124848415211

> Gentle reminder on how, in the recent DS4 fiesta, not just me but every other contributor found GPT 5.5 able to help immensely and Opus completely useless.

I've noticed the same for lower level squeezing-as-much-performance-as-possible code work.

sbinnee•about 1 hour ago

It is a big thing for sure to have a competitive local agentic model. I've replaced gemini 3 flash preview with DeepSeek v4 flash for all of my personal use cases. Starting from chat app, language learning, and even hobby coding. For coding, I couldn't get decent results no matter which sota latest models I used before. It's not close to Opus or Codex models. It's a flash model and makes mistakes here and there (I just saw `from opentele while import trace`, new Python syntax!)

But I found its tool calling is reliable than other oss models I tried. I assume that it attributes to interleaved thinking. Its reasoning effort is adjusted automatically by queries. I enjoy reading these reasoning traces from open models because you can't see them from proprietary models.

I would love to try DS4 so bad. Well, I don't have a machine for it. I will just stick to openrouter. I wish I can run a competitive oss model on 32GB machine in 3 years.

0xbadcafebee•about 1 hour ago

I don't see an explanation of why they would make a model-specific inference engine vs just using llamacpp. There are already lots of people working on the llamacpp integration. This is a lot of effort spent on a single model which is likely to become obsolete when a different model comes out that does better. In some discussions, people are now making PRs against both the llamacpp branches and ds4... so it's taking a rare commodity (people investing development time in this model) and fragmenting it

zozbot234•about 1 hour ago

Author has mentioned many times that the llama.cpp maintainers don't want code that's prevalently written by AI with no human revision. If anyone wants to try and get the support upstreamed into that project, they're quite free to do that: the code is MIT licensed.

flakiness•about 1 hour ago

I believe the assumption is: The code is cheap. The collaboration (eg. upstreaming) is expensive.

Is it true? We'll see, in a few years.

simonw•about 2 hours ago

I got this running on a 128GB M5 the other day - pretty painless, model runs in about 80GB of RAM and it seemed to be very capable at writing code and tool execution.

perfmode•about 2 hours ago

How’s the token throughput / response time?

simonw•about 2 hours ago

Healthy!

  prefill: 30.91 t/s, generation: 29.58 t/s

From https://gist.github.com/simonw/31127f9025845c4c9b10c3e0d8612...

embedding-shape•about 1 hour ago

Comparison with a RTX Pro 6000, with DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.gguf:

prefill: 121.76 t/s, generation: 47.85 t/s

Main target seems to be Apple's Metal, so makes sense. Might be fun to see how fast one could make it go though :) The model seems really good too, even though it's in IQ2.

xienze•about 1 hour ago

I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation. A mere 10k in context and you're sitting there for 5+ minutes before the first token is generated.

bjconlan•about 2 hours ago

This is great! I feel the same way about the deepseek v4 architecture for commodity hardware.

Also have enjoyed playing with https://huggingface.co/HuggingFaceTB/nanowhale-100m-base (but early days for me understanding this space)

kamranjon•about 1 hour ago

Very cool! I had no idea that HF was doing this - I really love their small model experiments.

kamranjon•about 1 hour ago

Just want to mention that I've been pulling down and using DwarfStar locally and it's incredible. I actually have it running on my personal macbook m4 max with 128gb of ram and I am running the server to share it through tailscale with my work laptop and just have pi running there.

The long context reasoning is something I haven't even seen in frontier models - I was running at 124k tokens earlier and it was still just buzzing along with no issues or fatigue.

I am amazed at how well it works, I'm using it right now for some pretty complex frontend work, and it is much much faster than, for example running a dense 27b or 31b model (like qwen or gemma) for me (The benefits of MoE) - but the long context capabilities have been what have been absolutely flooring me.

Super excited about this project and hope Antirez can keep himself from burning out - i've been following the repo pretty closely and there are a ton of PR's flooding in and it seems like he's had to do a lot of filtering out of slop code.

le-mark•about 1 hour ago

Is DS4 dwarf star 4 or deep seek 4?

kamranjon•about 1 hour ago

Just updated! Sorry I meant Dwarf Star - it's the only way I've actually managed to run DeepSeek flash on my local hardware

wolttam•about 1 hour ago

DwarfStar 4 is DeepSeek 4 (check the repo)

codedokode•about 1 hour ago

I thought DeepSeek was closed-weights and proprietary? I wonder how it compares against Western open-weight models. The hugging face page contains the comparison only with proprietary models for some reason.

itishappy•about 1 hour ago

DeepSeek has always been open-weight, and the DeepSeek HuggingFace page does not contain any comparisons. Where did you form these opinions?

zozbot234•about 1 hour ago

Nemotron would be a comparable Western open model AIUI.

brcmthrowaway•30 minutes ago

This guy is falling deep into Yegge-tier psychosis.

linkregister•1 minute ago

Empirically, DS4 is hosting the DeepSeek v4 Flash model with good performance on home hardware. I'm curious how you came to this conclusion.