DeepSeek v4
626
ZH version is available. Content is displayed in original English for accuracy.
ZH version is available. Content is displayed in original English for accuracy.
Discussion Sentiment
Analyzed from 7094 words in the discussion.
Trending Topics
Discussion (309 Comments)Read Original on HackerNews
Also, note that there's zero CUDA dependency. It runs entirely on Huawei chips. In other words, Chinese ecosystem has delivered a complete AI stack. Like it or not, that's a big news. But what's there not to like when monopolies break down?
China is not perfect but a bit of competition is healthy and needed
https://api-docs.deepseek.com/guides/thinking_mode
No BS, just a concise description of exactly what I need to write my own agent.
Western Models are optimizing to be used as an interchangeable product. Chinese models are being optimizing to be built upon.
Why? It sounds like the stupidest idea ever. Interchangeability = no lock-in = no moot.
Pretty cool, I think they're the first to guarantee determinism with the fixed seed or at the temperature 0. Google came close but never guaranteed it AFAIK. DeepSeek show their roots - it may not strictly be a SotA model, but there's a ton of low-level optimizations nobody else pays attention to.
I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.
edit: $1.74/M input $3.48/M output on OpenRouter
Aka: everyone who uses Nvidia isn't selling at cost, because Nvidia is so expensive.
But seriously, it just stems from the fact some people want AI to go away. If you set your conclusion first, you can very easily derive any premise. AI must go away -> AI must be a bad business -> AI must be losing money.
One answer - Chinese Communist Party. They are being subsidized by the state.
Back in Nov 2025, Opus 4.5 (80.9%) was the first proprietary model to do so.
Edit: it seems "open source" was edited out of the parent comment.
no one is ever going to release their training data because it contains every copyrighted work in existence. everyone, even the hecking-wholesome safety-first Anthropic, is using copyrighted data without permission to train their models. there you go.
It is very much a valuable thing already, no need to taint it with wrong promise.
Though I disagree about being used if it was indeed open source: I might not do it inside my home lab today, but at least Qwen and DeepSeek would use and build on what eg. Facebook was doing with Llama, and they might be pushing the open weights model frontier forward faster.
1. Training data is the source. 2. Training is compilation/compression. 3. Weights are the compiled source akin to optimized assembly.
However it's an imperfect analogy on so many levels. Nitpick away.
This “no harm to me” meme about a foreign totalitarian government (with plenty of incentive to run influence ops on foreigners) hoovering your data is just so mind-bogglingly naive.
Relatively speaking, DeepSeek is less untrustworthy than Grok.
When I try ChatGPT on current events from the White House it interprets them as strange hypotheticals rather than news, which is probably more a problem with DC than with GPT, but whatever.
Even for minor stuff like beeing addicted to drugs.
Looks pretty totalitarian to me.
That would be a great argument if the American models weren’t so heavily censored.
The Chinese model might dodge a question if I ask it about 1-2 specific Chinese cultural issues but then it also doesn’t moralize me at every turn because I asked it to use a piece of security software.
yes, this is exactly what I'm saying.
This is why I’ve been urging everyone I know to move away from American based services and providers. It’s slow but honest work.
My country’s per capita income is $2500 a year. We can’t pay perpetual rent to OAI/Anthropic
This sounds whole lot like potatoh potahto. I think the former argument is very much the correct one: China can undercut everyone and win, even at a loss. Happened with solar panels, steel, evs, sea food - it's a well tested strategy and it works really well despite the many flavors it comes in.
That being said a job well done for the wrong reasons is still a job well done so we should very much welcome these contributions, and maybe it's good to upset western big tech a bit so it's remains competitive.
Just this week they published a serious foundational library for LLMs https://github.com/deepseek-ai/TileKernels
Others worth mentioning:
https://github.com/deepseek-ai/DeepGEMM a competitive foundational library
https://github.com/deepseek-ai/Engram
https://github.com/deepseek-ai/DeepSeek-V3
https://github.com/deepseek-ai/DeepSeek-R1
https://github.com/deepseek-ai/DeepSeek-OCR-2
They have 33 repos and counting: https://github.com/orgs/deepseek-ai/repositories?type=all
And DeepSeek often has very cool new approaches to AI copied by the rest. Many others copied their tech. And some of those have 10x or 100x the GPU training budget and that's their moat to stay competitive.
The models from Chinese Big Tech and some of the small ones are open weights only. (and allegedly benchmaxxed) (see https://xcancel.com/N8Programs/status/2044408755790508113). Not the same.
So you can’t see what facts are pruned out, what biases were applied, etc. Even more importantly, you can’t make a slightly improved version.
This model is as open source as a windows XP installation ISO.
https://openrouter.ai/deepseek/deepseek-v4-flash
`https://openrouter.ai/api/messages with model=deepseek/deepseek-v4-pro, OR returns an error because their Anthropic-compat translator doesn't cover V4 yet. The Claude CLI dutifully surfaces that error as "model...does not exist"
In other words, things are hotting up in the GPU war!
At this point I would just pick the one who's "ethics" and user experience you prefer. The difference in performance between these releases has had no impact on the meaningful work one can do with them, unless perhaps they are on the fringes in some domain.
Personally I am trying out the open models cloud hosted, since I am not interested in being rug pulled by the big two providers. They have come a long way, and for all the work I actually trust to an LLM they seem to be sufficient.
New model comes out, has some nice benchmarks, but the subjective experience of actually using it stays the same. Nothing's really blown my mind since.
Feels like the field has stagnated to a point where only the enthusiasts care.
Since then it's just been a cycle of the old model being progressively lobotomised and a "new" one coming out that if you're lucky might be as good as the OG Opus 4.5 for a couple of weeks.
Subjective but as far as I can tell no progress in almost a year, which is a lifetime in 2022-25 LLM timelines
And we got new base models, wonderful, truly wonderful
Model was released and it's amazing. Frontier level (better than Opus 4.6) at a fraction of the cost.
As a non-Opus user, I'll continue to use the cheapest fastest models that get my job done, which (for me anyway) is still MiniMax M2.5. I occasionally try a newer, more expensive model, and I get the same results. I have a feeling we might all be getting swindled by the whole AI industry with benchmarks that just make it look like everything's improving.
The tricky part is that the "number of tokens to good result" does absolutely vary, and you need a decent harness to make it work without too much manual intervention, so figuring out which model is most cost-effective for which tasks is becoming increasingly hard, but several are cost-effective enough.
Substantially worse at following instructions and overoptimized for maximizing token usage
Codex is just so much better, or the genera GPT models.
https://github.blog/news-insights/company-news/changes-to-gi...
I do some stuff with gemini flash and Aider, but mostly because I want to avoid locking myself into a walled garden of models, UIs and company
If you're feeling frisky, Zed has a decent agent harness and a very good editor.
Opencode was getting there, but it seems the founders lost interest. Pi could be it, but its very focused on OpenClaw. Even Codex cli doesnt have all of it.
which harness works well with Deepseek v4 ?
So while I agree mixed model is the way to go, opus is still my workhorse.
This is free... as in you can download it, run it on your systems and finetune it to be the way you want it to be.
In theory, sure, but as other have pointed out you need to spend half a million on GPUs just to get enough VRAM to fit a single instance of the model. And you’d better make sure your use case makes full 24/7 use of all that rapidly-depreciating hardware you just spent all your money on, otherwise your actual cost per token will be much higher than you think.
In practice you will get better value from just buying tokens from a third party whose business is hosting open weight models as efficiently as possible and who make full use of their hardware. Even with the small margin they charge on top you will still come out ahead.
It's about 2 months behind GPT 5.5 and Opus 4.7.
As long as it is cheap to run for the hosting providers and it is frontier level, it is a very competitive model and impressive against the others. I give it 2 years maximum for consumer hardware to run models that are 500B - 800B quantized on their machines.
It should be obvious now why Anthropic really doesn't want you to run local models on your machine.
Doesn't mean Deepseek v4 isn't great, just benchmarks alone aren't enough to tell.
If its coding abilities are better than Claude Code with Opus 4.6 then I will definitely be switching to this model.
It's still a "preview" version atm.
Claude4.6 was almost 10pp better at at answering questions from long contexts ("corpuses" in CorpusQA and "multiround conversations" in MRCR), while DSv4 was a staggering 14pp better at one math challenge (IMOAnswerBench) and 12pp better at basic Q&A (SimpleQA-Verified).
There we go again :) It seems we have a release each day claiming that. What's weird is that even deepseek doesn't claim it's better than opus w/ thinking. No idea why you'd say that but anyway.
Dsv3 was a good model. Not benchmaxxed at all, it was pretty stable where it was. Did well on tasks that were ood for benchmarks, even if it was behind SotA.
This seems to be similar. Behind SotA, but not by much, and at a much lower price. The big one is being served (by ds themselves now, more providers will come and we'll see the median price) at 1.74$ in / 3.48$ out / 0.14$ cache. Really cheap for what it offers.
The small one is at 0.14$ in / 0.28$ out / 0.028$ cache, which is pretty much "too cheap to matter". This will be what people can run realistically "at home", and should be a contender for things like haiku/gemini-flash, if it can deliver at those levels.
LMAO
I have no idea why you'd think that, but this is straight from their announcement here (https://mp.weixin.qq.com/s/8bxXqS2R8Fx5-1TLDBiEDg):
> According to evaluation feedback, its user experience is better than Sonnet 4.5, and its delivery quality is close to Opus 4.6's non-thinking mode, but there is still a certain gap compared to Opus 4.6's thinking mode.
This is the model creators saying it, not me.
That's literally what the I Ching calls "good fortune."
Competition, when no single dragon monopolizes the sky, brings fortune for all.
The website now has a link to the announcement on Twitter here https://x.com/deepseek_ai/status/2047516922263285776
Copying text of that below
DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.
DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice.
Try it now at http://chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today!
Tech Report: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...
Open Weights: https://huggingface.co/collections/deepseek-ai/deepseek-v4
But in this case, it's more likely just to be a tooling issue.
input: $0.14/$0.28 (whereas gemini $0.5/$3)
Does anyone know why output prices have such a big gap?
https://simonwillison.net/2026/Apr/24/deepseek-v4/
Both generated using OpenRouter.
For comparison, here's what I got from DeepSeek 3.2 back in December: https://simonwillison.net/2025/Dec/1/deepseek-v32/
And DeepSeek 3.1 in August: https://simonwillison.net/2025/Aug/22/deepseek-31/
And DeepSeek v3-0324 in March last year: https://simonwillison.net/2025/Mar/24/deepseek/
As in have the model consider its generated SVG, and gradually refine it, using its knowledge of the relative positions and proportions of the shapes generated, and have it spin for a while, and hopefully the end result will be better than just oneshotting it.
Or maybe going even one step further - most modern models have tool use and image recognition capabilities - what if you have it generate an SVG (or parts/layers of it, as per the model's discretion) and feed it back to itself via image recognition, and then improve on the result.
I think it'd be interesting to see, as for a lot of models, their oneshot capability in coding is not necessarily corellated with their in-harness ability, the latter which really matters.
I should try it again with the more recent models.
Let me tell you how much the Pro one sucks... It looks like failed Pedersen[1]. The rear wheel intersects with the bottom bracket, so it wouldn't even roll. Or rather, this bike couldn't exist.
The flash one looks surprisingly correct with some wild fork offset and the slackest of seat tubes. It's got some lowrider[2] aspirations with the small wheels, but with longer, Rivendellish[3], chainstays. The seat post has different angle than the seat tube, so good luck lowering that.
[1] https://en.wikipedia.org/wiki/Pedersen_bicycle
[2] https://en.wikipedia.org/wiki/Lowrider_bicycle
[3] https://www.rivbike.com/
I wonder which model will try some more common spoke lacing patterns. Right now there seems to be a preference for radial lacing, which is not super common (but simple to draw). The Flash and Pro one uses 16 spoke rims, which actually exist[1] but are not super common.
The Pro model fails badly at the spokes. Heck, the spokes sit on the outside of the drive side of the rim and tire. Have a nice ride riding on the spokes (instead of the tire) welded to the side of your rim.
Both bikes have the drive side on the left, which is very very uncommon. That can't exist in the training data.
[1] https://cicli-berlinetta.com/product/campagnolo-shamal-16-sp...
1) LLM is not AGI. Because surely if AGI it would imply that pro would do better than flash?
2) and because of the above, Pelican example is most likely already being benchmaxxed.
How much does the drawing change if you ask it again?
at the top of the linked pages.
For context, for an agent we're working on, we're using 5-mini, which is $2/1m tokens. This is $0.30/1m tokens. And it's Opus 4.6 level - this can't be real.
I am uncomfortable about sending user data which may contain PII to their servers in China so I won't be using this as appealing as it sounds. I need this to come to a US-hosted environment at an equivalent price.
Hosting this on my own + renting GPUs is much more expensive than DeepSeek's quoted price, so not an option.
As a European I feel deeply uncomfortable about sending data to US companies where I know for sure that the government has access to it.
I also feel uncomfortable sending it to China.
If you'd asked me ten years ago which one made me more uncomfortable. China.
But now I'm not so sure, in fact I'm starting to lean towards the US as being the major risk.
Gemini-3.1-Pro at 91.0
Opus-4.6 at 89.1
GPT-5.4, Kimi2.6, and DS-V4-Pro tied at 87.5
Pretty impressive
If AI was so good at coding, why can’t it actually make a usable Gemini/AI Studio app?
dang, probably the two should be merged and that be the link
"Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly."
So it's going to be even cheaper
A mac with 256 GB memory would run it but be very slow, and so would be a 256GB ram + cheapo GPU desktop, unless you leave it running overnight.
The big model? Forget it, not this decade. You can theoretically load from SSD but waiting for the reply will be a religious experience.
Realistically the biggest models you can run on local-as-in-worth-buying-as-a-person hardware are between 120B and 200B, depending on how far you’re willing to go on quantization. Even this is fairly expensive, and that’s before RAM went to the moon.
The flash version here is 284B A13B, so it might perform OK with a fairly small amount of VRAM for the active params and all regular ram for the other params, but I’d have to see benchmarks. If it turns out that works alright, an eBay server plus a 3090 might be the bang-for-buck champ for about $2.5K (assuming you’re starting from zero).
Was expecting that the release would be this month [1], since everyone forgot about it and not reading the papers they were releasing and 7 days later here we have it.
One of the key points of this model to look at is the optimization that DeepSeek made with the residual design of the neural network architecture of the LLM, which is manifold-constrained hyper-connections (mHC) which is from this paper [2], which makes this possible to efficiently train it, especially with its hybrid attention mechanism designed for this.
There was not that much discussion around it some months ago here [3] about it but again this is a recommended read of the paper.
I wouldn't trust the benchmarks directly, but would wait for others to try it for themselves to see if it matches the performance of frontier models.
Either way, this is why Anthropic wants to ban open weight models and I cannot wait for the quantized versions to release momentarily.
[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...
[1] https://news.ycombinator.com/item?id=47793880
[2] https://arxiv.org/abs/2512.24880
[3] https://news.ycombinator.com/item?id=46452172
Do you have a source?
Keep an eye on https://huggingface.co/unsloth/models
Update ten minutes later: https://huggingface.co/unsloth/DeepSeek-V4-Pro just appeared but doesn't have files in yet, so they are clearly awake and pushing updates.
I have never tried one yet but I am considering trying that for a medium sized model.
As I understand it if DeepSeek v4 Pro is a 1.6T, 49B active that means you'd need just 49B in memory, so ~100GB at 16 bit or ~50GB at 8bit quantized.
v4 Flash is 284B, 13B active so might even fit in <32GB.
My Mac can fit almost 70B (Q3_K_M) in memory at once, so I really need to try this out soon at maybe Q5-ish.
V4 is natively mixed FP4 and FP8, so significantly less than that. 50 GB max unquantized.
Streaming weights from RAM to GPU for decode makes no sense at all because batching requires multiple parallel streams.
Streaming weights from SSD _never_ makes sense because the delta between SSD and RAM is too large. There is no situation where you would not be able to fit a model in RAM and also have useful speeds from SSD.
Note: these were just two that I starred when I saw them posted here. I have not looked seriously at it at the moment,
https://github.com/danveloper/flash-moe
https://github.com/t8/hypura
https://news.ycombinator.com/item?id=47885014
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
OMG ITS HAPPENING
But if it does, then in the following week we'll see DeepSeek4 floods every AI-related online space. Thousands of posts swearing how it's better than the latest models OpenAI/Anthropic/Google have but only costs pennies.
Then a few weeks later it'll be forgotten by most.
If one finds it difficult to set up OpenCode to use whatever providers they want, I won't call them 'dev'.
The only real friction (if the model is actually as good as SOTA) is to convince your employer to pay for it. But again if it really provides the same value at a fraction of the cost, it'll eventually cease to be an issue.