DeepSeek v4

585

iimpact_sy about 4 hours ago 281 commentsRead Article on api-docs.deepseek.com

FR version is available. Content is displayed in original English for accuracy.

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

⚡ Community Insights

Discussion Sentiment

91% Positive

Analyzed from 2796 words in the discussion.

Discussion (281 Comments)Read Original on HackerNews

throwa356262•about 1 hour ago

Seriously, why can't huge companies like OpenAI and Google produce documentation that is half this good??

https://api-docs.deepseek.com/guides/thinking_mode

No BS, just a concise description of exactly what I need to write my own agent.

kubb•1 minute ago

Western orgs have been captured by Silicon Valley style patrimonialism, and aren’t based on merit anymore.

lykr0n•32 minutes ago

It's because they're optimizing for a different problem.

Western Models are optimizing to be used as an interchangeable product. Chinese models are being optimizing to be built upon.

raincole•29 minutes ago

> Western Models are optimizing to be used as an interchangeable product

Why? It sounds like the stupidest idea ever. Interchangeability = no lock-in = no moot.

peepee1982•17 minutes ago

If you want other people to know whether you're being genuine or sarcastic, you'll have to put a bit more effort into your comments. Your comment just adds noise.

vitorgrs•32 minutes ago

Meanwhile, they don't actually say which model you are running on Deepseek Chat website.

Alifatisk•43 minutes ago

You might enjoy Z.ais api docs aswell

jari_mustonen•33 minutes ago

Open Source as it gets in this space, top notch developer documentation, and prices insanely low, while delivering frontier model capabilities. So basically, this is from hackers to hackers. Loving it!

Also, note that there's zero CUDA dependency. It runs entirely on Huawei chips. In other words, Chinese ecosystem has delivered a complete AI stack. Like it or not, that's a big news. But what's there not to like when monopolies break down?

ifwinterco•12 minutes ago

As a Brit I'm here for it to be honest, I'm tired of America with everything that's going on.

China is not perfect but a bit of competition is healthy and needed

sudo_cowsay•5 minutes ago

I sometimes wonder if there are any security risks with using Chinese LLMs. Is there?

slekker•28 minutes ago

But remember to not ask about Taiwan!

spiderfarmer•5 minutes ago

Just ask it for a summary of the USA’s role in Iran, Gaza, Lebanon and its recent threats against Panama, Cuba and Greenland! It might be able to keep track.

orbital-decay•32 minutes ago

>we implement end-to-end, bitwise batch-invariant, and deterministic kernels with minimal performance overhead

Pretty cool, I think they're the first to guarantee determinism with the fixed seed or at the temperature 0. Google came close but never guaranteed it AFAIK. DeepSeek show their roots - it may not strictly be a SotA model, but there's a ton of low-level optimizations nobody else pays attention to.

primaprashant•41 minutes ago

While SWE-bench Verified is not a perfect benchmark for coding, AFAIK, this is the first open-weights model that has crossed the threshold of 80% score on this by scoring 80.6%.

Back in Nov 2025, Opus 4.5 (80.9%) was the first proprietary model to do so.

revolvingthrow•about 1 hour ago

> pricing "Pro" $3.48 / 1M output tokens vs $4.40

I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.

edit: $1.74/M input $3.48/M output on OpenRouter

menzoic•3 minutes ago

API prices may be profitable. Subscriptions may still be subsidized for power users. Free tiers almost certainly are. And frontier labs may be subsidizing overall business growth, training, product features, and peak capacity, even if a normal metered API call is profitable on marginal inference.

schneehertz•about 1 hour ago

This price is high even because of the current shortage of inference cards available to DeepSeek; they claimed in their press release that once the Ascend 950 computing cards are launched in the second half of the year, the price of the Pro version will drop significantly

Bombthecat•5 minutes ago

In six month deepseek won't be sota anymore und usage will be wayyyy down.

amunozo•19 minutes ago

I was thinking the same. How can it be than other providers can offer third-party open source models with roughly the similar quality like this, Kimi K2.6 or GLM 5.1 for 10 times less the price? How can it be that GPT 5.5 is suddenly twice the price as GPT 5.4 while being faster? I don't believe that it's a bigger, more expensive model to run, it's just they're starting to raise up the prices because they can and their product is good (which is honest as long as they're transparent with it). Honestly the movement about subscription costing the company 20 times more than we're paying is just a PR movement to justify the price hike.

peepee1982•7 minutes ago

I'm pretty sure OpenAI and Anthropic are overpricing their token billed API usage mainly as an incentive to commit to get their subscriptions instead.

m00x•about 1 hour ago

They are profitable to opex costs, but not capex costs with the current depreciation schedules, though those are now edging higher than expected.

vitorgrs•26 minutes ago

And they actually say the prices will be "significantly" lower in second semester when Huawei 650 chips comes in.

jimmydoe•23 minutes ago

They’ve also announced Pro price will further drop 2H26 once they have more HUAWEI chips.

mirzap•about 1 hour ago

My thoughts exactly. I also believe that subscription services are profitable, and the talk about subsidies is just a way to extract higher profit margins from the API prices businesses pay.

Bombthecat•3 minutes ago

Google stated a while back, that with tpus they are able to sell at cost / with profit.

Aka: everyone who uses Nvidia isn't selling at cost, because Nvidia is so expensive.

dminik•26 minutes ago

I mean, not one "bleeding edge" lab has stated they are profitable. They don't publish financials aside from revenue. And in Anthropic's case, they fuck with pricing every week. Clearly something is wrong here.

raincole•about 1 hour ago

Insert always has been meme.

But seriously, it just stems from the fact some people want AI to go away. If you set your conclusion first, you can very easily derive any premise. AI must go away -> AI must be a bad business -> AI must be losing money.

zarzavat•42 minutes ago

Before the AI bubble that will burst any time now, there was the AI winter that would magically arrive before the models got good enough to rival humans.

masafej536•about 1 hour ago

Point taken but there isnt any western providers there yet. Power is cheaper in china.

NitpickLawyer•about 1 hour ago

As this is a new arch with tons of optimisations, it'll take some time for inference engines to support it properly, and we'll see more 3rd party providers offer it. Once that settles we'll have a median price for an optimised 1.6T model, and can "guesstimate" from there what the big labs can reasonably serve for the same price. But yeah, it's been said for a while that big labs are ok on API costs. The only unknown is if subscriptions were profitable or not. They've all been reducing the limits lately it seems.

3uler•about 1 hour ago

These models are open and there are tons of western providers offering it at comparable rates.

sekai•36 minutes ago

> I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.

One answer - Chinese Communist Party. They are being subsidized by the state.

fblp•about 3 hours ago

There's something heartwarming about the developer docs being released before the flashy press release.

necovek•about 3 hours ago

Where's the training data and training scripts since you are calling this open source?

onchainintel•about 3 hours ago

Insert obligatory "this is the way" Mando scene. Indeed!

yanis_t•about 3 hours ago

Already on Openrouter. Pro version is $1.74/m/input, $3.48/m/output, while flash $0.14/m/input, 0.28/m/output.

sidcool•about 3 hours ago

Truly open source coming from China. This is heartwarming. I know if the potential ulterior motives.

I_am_tiberius•about 2 hours ago

Open weight!

mchusma•about 2 hours ago

For comparison on openrouter DeepSeek v4 Flash is slightly cheaper than Gemma 4 31b, more expensive than Gemma 4 26b, but it does support prompt caching, which means for some applications it will be the cheapest. Excited to see how it compares with Gemma 4.

MillionOClock•24 minutes ago

I wonder why there aren't more open weights model with support for prompt caching on OpenRouter.

aquir•26 minutes ago

It is great! I asked the question what I always ask of new models ("what would Ian M Banks think about the current state of AI") and it gave me a brilliant answer! Funny enough the answer contained multiple criticisms of his own creators ("Chinese state entities", "Social Credit System").

gbnwl•about 3 hours ago

I’m deeply interested and invested in the field but I could really use a support group for people burnt out from trying to keep up with everything. I feel like we’ve already long since passed the point where we need AI to help us keep up with advancements in AI.

wordpad•about 3 hours ago

The players barely ever change. People don't have problems following sports, you shouldn't struggle so much with this once you accept top spot changes.

seanobannon•about 3 hours ago

Weights available here: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

nthypes•about 3 hours ago

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

Model was released and it's amazing. Frontier level (better than Opus 4.6) at a fraction of the cost.

0xbadcafebee•about 2 hours ago

[delayed]

NitpickLawyer•about 3 hours ago

> (better than Opus 4.6)

There we go again :) It seems we have a release each day claiming that. What's weird is that even deepseek doesn't claim it's better than opus w/ thinking. No idea why you'd say that but anyway.

Dsv3 was a good model. Not benchmaxxed at all, it was pretty stable where it was. Did well on tasks that were ood for benchmarks, even if it was behind SotA.

This seems to be similar. Behind SotA, but not by much, and at a much lower price. The big one is being served (by ds themselves now, more providers will come and we'll see the median price) at 1.74$ in / 3.48$ out / 0.14$ cache. Really cheap for what it offers.

The small one is at 0.14$ in / 0.28$ out / 0.028$ cache, which is pretty much "too cheap to matter". This will be what people can run realistically "at home", and should be a contender for things like haiku/gemini-flash, if it can deliver at those levels.

onchainintel•about 3 hours ago

How does it compare to Opus 4.7? I've been immersed in 4.7 all week participating in the Anthropic Opus 4.7 hackathon and it's pretty impressive even if it's ravenous from a token perspective compared to 4.6

greenknight•about 3 hours ago

The thing is, it doesnt need to beat 4.7. it just needs to do somewhat well against it.

This is free... as in you can download it, run it on your systems and finetune it to be the way you want it to be.

p1esk•about 3 hours ago

Do you think a lot of people have “systems” to run a 1.6T model?

onchainintel•about 3 hours ago

Completely agree, not suggesting it needs ot just genuinely curious. Love that it can be run locally though. Open source LLMs punching back pretty hard against proprietary ones in the cloud lately in terms of performance.

kelseyfrog•about 3 hours ago

What's the hardware cost to running it?

johnmaguire•about 3 hours ago

... if you have 800 GB of VRAM free.

rvz•about 3 hours ago

It is more than good enough and has effectively caught up with Opus 4.6 and GPT 5.4 according to the benchmarks.

It's about 2 months behind GPT 5.5 and Opus 4.7.

As long as it is cheap to run for the hosting providers and it is frontier level, it is a very competitive model and impressive against the others. I give it 2 years maximum for consumer hardware to run models that are 500B - 800B quantized on their machines.

It should be obvious now why Anthropic really doesn't want you to run local models on your machine.

colordrops•about 3 hours ago

What's going to change in 2 years that would allow users to run 500B-800B parameter models on consumer hardware?

doctoboggan•about 3 hours ago

Is it honestly better than Opus 4.6 or just benchmaxxed? Have you done any coding with an agent harness using it?

If its coding abilities are better than Claude Code with Opus 4.6 then I will definitely be switching to this model.

madagang•about 3 hours ago

Their Chinese announcement says that, based on internal employee testing, it is not as good as Opus 4.6 Thinking, but is slightly better than Opus 4.6 without Thinking enabled.

mchusma•about 3 hours ago

I appreciate this, makes me trust it more than benchmarks.

sergiotapia•about 3 hours ago

The dragon awakes yet again!

rapind•about 3 hours ago

Pop?

coderssh•28 minutes ago

Feels like the real story here is cost/performance tradeoff rather than raw capability. Benchmarks keep moving incrementally, but efficiency gains like this actually change who can afford to build on top.

Imanari•about 1 hour ago

Just tested it via openrounter in the Pi Coding agent and it regularly fails to use the read and write tool correctly, very disappointing. Anyone know a fix besides prompting "always use the provided tools instead of writing your own call"

abstracthinking•about 1 hour ago

They have just released it, give it some time, they probably haven't pretested it with Pi

Imanari•about 1 hour ago

How can they fix it after the release? They would have to retrain/finetune it further, no?

zargon•41 minutes ago

It's only in preview right now. And anyway, yes, models regularly get updated training.

But in this case, it's more likely just to be a tooling issue.

zkmon•about 1 hour ago

They released 1.6 T pro base model on huggingface. First time I'm seeing a "T" model here.

bandrami•about 1 hour ago

I don't mind that High Flyer completely ripped off Anthropic to do this so much as I mind that they very obviously waited long enough for the GAB to add several dozen xz-level easter eggs to it.

CJefferson•about 2 hours ago

What's the current best framework to have a 'claude code' like experience with Deepseek (or in general, an open-source model), if I wanted to play?

deaux•about 1 hour ago

https://pi.dev/

Alifatisk•36 minutes ago

You can use CC with other models, you aren’t forced to use Claude model.

TranquilMarmot•about 2 hours ago

https://opencode.ai/

whoopdeepoo•about 2 hours ago

You can use deepseek with Claude code

esperent•36 minutes ago

You can, but does it work well? I assume CC has all kinds of Claude specific prompts in it, wouldn't you be better with a harness designed to be model agnostic like pi.dev or OpenCode?

0x142857•about 2 hours ago

claude-code-cli/opencode/codex

zargon•about 3 hours ago

The Flash version is 284B A13B in mixed FP8 / FP4 and the full native precision weights total approximately 178 GB. KV cache is said to take 10% as much space as V3. This looks very accessible for people running "large" local models. It's a nice follow up to the Gemma 4 and Qwen3.5 small local models.

sbinnee•about 3 hours ago

Price is appealing to me. I have been using gemini 3 flash mainly for chat. I may give it a try.

input: $0.14/$0.28 (whereas gemini $0.5/$3)

Does anyone know why output prices have such a big gap?

simonw•about 3 hours ago

I like the pelican I got out of deepseek-v4-flash more than the one I got from deepseek-v4-pro.

Flash: https://gist.github.com/simonw/4a7a9e75b666a58a0cf81495acddf...

Pro: https://gist.github.com/simonw/9e8dfed68933ab752c9cf27a03250...

Both generated using OpenRouter.

JSR_FDED•about 2 hours ago

No way. The Pro pelican is fatter, has a customized front fork, and the sun is shining!

rohanm93•about 2 hours ago

This is shockingly cheap for a near frontier model. This is insane.

For context, for an agent we're working on, we're using 5-mini, which is $2/1m tokens. This is $0.30/1m tokens. And it's Opus 4.6 level - this can't be real.

I am uncomfortable about sending user data which may contain PII to their servers in China so I won't be using this as appealing as it sounds. I need this to come to a US-hosted environment at an equivalent price.

Hosting this on my own + renting GPUs is much more expensive than DeepSeek's quoted price, so not an option.

esperent•33 minutes ago

> I am uncomfortable about sending user data which may contain PII to their servers in China

As a European I feel deeply uncomfortable about sending data to US companies where I know for sure that the government has access to it.

I also feel uncomfortable sending it to China.

If you'd asked me ten years ago which one made me more uncomfortable. China.

But now I'm not so sure, in fact I'm starting to lean towards the US as being the major risk.

fractalf•about 1 hour ago

Right now Im much more worried about sending data to the US and A.. At least theres a less chanse it will be missused against -me-

jessepcc•about 3 hours ago

At this point 'frontier model release' is a monthly cadence, Kimi 2.6 Claude 4.6 GPT 5.5 — the interesting question is which evals will still be meaningful in 6 months.

xnx•about 1 hour ago

Such different time now than early 2025 when people thought Deepaeek was going to kill the market for Nvidia.

Ifkaluva•29 minutes ago

They might still kill the market for NVIDIA, if future releases prioritize Huawei chips

storus•about 2 hours ago

Oh well, I should have bought 2x 512GB RAM MacStudios, not just one :(

Aliabid94•about 3 hours ago

MMLU-Pro:

Gemini-3.1-Pro at 91.0

Opus-4.6 at 89.1

GPT-5.4, Kimi2.6, and DS-V4-Pro tied at 87.5

Pretty impressive

apexalpha•about 1 hour ago

This FLash model might be affordable for OpenClaw. I run it on my mac 48gb ram now but it's slowish.

gardnr•about 1 hour ago

865 GB: I am going to need a bigger GPU.

jdeng•about 3 hours ago

Excited that the long awaited v4 is finally out. But feel sad that it's not multimodal native.

clark1013•about 2 hours ago

Looking forward to DeepSeek Coding Plan

m_abdelfattah•about 1 hour ago

I came here to say the same :) !

gigatexal•24 minutes ago

Has anyone used it? How does it compare to gpt 5.5 or opus 4.7?

sibellavia•about 2 hours ago

A few hours after GPT5.5 is wild. Can’t wait to try it.

jfxia•30 minutes ago

Is V4 still not a multi-modal model?

vitorgrs•20 minutes ago

Not yet... Which is a shame.

taosx•about 3 hours ago

MErge? https://news.ycombinator.com/item?id=47885014

luyu_wu•about 3 hours ago

For those who didn't check the page yet, it just links to the API docs being updated with the upcoming models, not the actual model release.

talim•about 3 hours ago

Weights are on Huggingface FWIW. https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/tree/main

cmrdporcupine•about 3 hours ago

My submission here https://news.ycombinator.com/item?id=47885014 done at the same time was to the weights.

dang, probably the two should be merged and that be the link

culi•about 3 hours ago

there's no pinging. Someone's gotta email dang

WhereIsTheTruth•about 1 hour ago

Interesting note:

"Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly."

So it's going to be even cheaper

augment_me•37 minutes ago

Amaze amaze amaze

tcbrah•about 1 hour ago

giving meta a run for its money, esp when it was supposed to be the poster child for OSS models. deepseek is really overshadowing them rn

coolThingsFirst•31 minutes ago

I got an API key without credit card details I didn’t know they had a free plan.

tariky•about 2 hours ago

Anyone tried with make web UI with it? How good is it? For me opus is only worth because of it.

KaoruAoiShiho•about 3 hours ago

SOTA MRCR (or would've been a few hours earlier... beaten by 5.5), I've long thought of this as the most important non-agentic benchmark, so this is especially impressive. Beats Opus 4.7 here

reenorap•about 3 hours ago

Which version fits in a Mac Studio M3 Ultra 512 GB?

simonw•about 3 hours ago

The Flash one should - it's 160GB on Hugging Face: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/tree/ma...

aliljet•about 3 hours ago

How can you reasonably try to get near frontier (even at all tps) on hardware you own? Maybe under 5k in cost?

mariopt•about 2 hours ago

Does deepseek has any coding plan?

jeffzys8•about 2 hours ago

luew•about 2 hours ago

We will be hosting it soon at getlilac.com!

swrrt•about 3 hours ago

Any visualised benchmark/scoreboard for comparison between latest models? DeepSeek v4 and GPT-5.5 seems to be ground breaking.

rvz•about 3 hours ago

The paper is here: [0]

Was expecting that the release would be this month [1], since everyone forgot about it and not reading the papers they were releasing and 7 days later here we have it.

One of the key points of this model to look at is the optimization that DeepSeek made with the residual design of the neural network architecture of the LLM, which is manifold-constrained hyper-connections (mHC) which is from this paper [2], which makes this possible to efficiently train it, especially with its hybrid attention mechanism designed for this.

There was not that much discussion around it some months ago here [3] about it but again this is a recommended read of the paper.

I wouldn't trust the benchmarks directly, but would wait for others to try it for themselves to see if it matches the performance of frontier models.

Either way, this is why Anthropic wants to ban open weight models and I cannot wait for the quantized versions to release momentarily.

[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

[1] https://news.ycombinator.com/item?id=47793880

[2] https://arxiv.org/abs/2512.24880

[3] https://news.ycombinator.com/item?id=46452172

jeswin•about 3 hours ago

> this is why Anthropic wants to ban open weight models

Do you have a source?

namegulf•about 3 hours ago

Is there a Quantized version of this?

sergiotapia•about 1 hour ago

Using it with opencode sometimes it generates commands like:

    bash({"command":"gh pr create --title "Improve Calendar module docs and clean up idiomatic Elixir" --body "$(cat <<'EOF'
    Problem
    The Calendar modu...

like generating output, but not actually running the bash command so not creating the PR ultimately. I wonder if it's a model thing, or an opencode thing.

punkpeye•about 1 hour ago

Incredible model quality to price ratio

ls612•about 3 hours ago

How long does it usually take for folks to make smaller distills of these models? I really want to see how this will do when brought down to a size that will run on a Macbook.

simonw•about 2 hours ago

Unsloth often turn them around within a few hours, they might have gone to bed already though!

Keep an eye on https://huggingface.co/unsloth/models

inventor7777•about 3 hours ago

Weren't there some frameworks recently released to allow Macs to stream weights from fast SSDs and thus fit way more parameters than what would normally fit in RAM?

I have never tried one yet but I am considering trying that for a medium sized model.

the_sleaze_•about 3 hours ago

Do you have the links for those? Very interested

frozenseven•about 3 hours ago

Better link:

https://news.ycombinator.com/item?id=47885014

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

nickandbro•about 3 hours ago

Very impressive throughput performance

hongbo_zhang•about 3 hours ago

congrats

dhruv3006•about 2 hours ago

Ah now !

slopinthebag•about 2 hours ago

OMG

OMG ITS HAPPENING

shafiemoji•about 3 hours ago

I hope the update is an improvement. Losing 3.2 would be a real loss, it's excellent.

raincole•about 3 hours ago

History doesn't always repeat itself.

But if it does, then in the following week we'll see DeepSeek4 floods every AI-related online space. Thousands of posts swearing how it's better than the latest models OpenAI/Anthropic/Google have but only costs pennies.

Then a few weeks later it'll be forgotten by most.

sbysb•about 3 hours ago

It's difficult because even if the underlying model is very good, not having a pre-built harness like Claude Code makes it very un-sticky for most devs. Even at equal quality, the friction (or at least perceived friction) is higher than the mainstream models.

raincole•about 3 hours ago

OpenCode? Pi?

If one finds it difficult to set up OpenCode to use whatever providers they want, I won't call them 'dev'.

The only real friction (if the model is actually as good as SOTA) is to convince your employer to pay for it. But again if it really provides the same value at a fraction of the cost, it'll eventually cease to be an issue.

cmrdporcupine•about 3 hours ago

They have instructions right on their page on how to use claude code with it.