DeepSeek v4

2041

iimpact_sy 2 days ago 1555 commentsRead Article on api-docs.deepseek.com

DE version is available. Content is displayed in original English for accuracy.

https://api-docs.deepseek.com/

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

⚡ Community Insights

Discussion Sentiment

78% Positive

Analyzed from 26010 words in the discussion.

Discussion (1555 Comments)Read Original on HackerNews

hodgehog11•2 days ago

There are quite a few comments here about benchmark and coding performance. I would like to offer some opinions regarding its capacity for mathematics problems in an active research setting.

I have a collection of novel probability and statistics problems at the masters and PhD level with varying degrees of feasibility. My test suite involves running these problems through first (often with about 2-6 papers for context) and then requesting a rigorous proof as followup. Since the problems are pretty tough, there is no quantitative measure of performance here, I'm just judging based on how useful the output is toward outlining a solution that would hopefully become publishable.

Just prior to this model, Gemini led the pack, with GPT-5 as a close second. No other model came anywhere near these two (no, not even Claude). Gemini would sometimes have incredible insight for some of the harder problems (insightful guesses on relevant procedures are often most useful in research), but both of them tend to struggle with outlining a concrete proof in a single followup prompt. This DeepSeek V4 Pro with max thinking does remarkably well here. I'm not seeing the same level of insights in the first response as Gemini (closer to GPT-5), but it often gets much better in the followup, and the proofs can be _very_ impressive; nearly complete in several cases.

Given that both Gemini and DeepSeek also seem to lead on token performance, I'm guessing that might play a role in their capacity for these types of problems. It's probably more a matter of just how far they can get in a sensible computational budget.

Despite what the benchmarks seem to show, this feels like a huge step up for open-weight models. Bravo to the DeepSeek team!

segmondy•2 days ago

They have had the best math models for about a year most folks just didn't know about it. You can't find inference on APIs, but I run these at home, this is also the advantage of open models.

https://huggingface.co/deepseek-ai/DeepSeek-Math-V2 https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-671B

simonjgreen•about 21 hours ago

You are of course specifically referring to the math optimised models, not the chat ones folks would generally encounter. Not that I’m trying to contradict you, your point is super valid and I agree with you! But I’m supplementing to help anyone following along who may make choices.

This is when it happened for anyone interested: https://binaryverseai.com/deepseek-math-v2-benchmarks-review...

jug•about 17 hours ago

Shouldn't one use e.g a Wolfram Alpha MCP endpoint for math in AI? From what I've seen on even premium non-quantized models, I would never ever trust the innate ability of a LLM to calculate.

lowbloodsugar•1 day ago

You run a 671B model at home?

segmondy•1 day ago

Yes, and plenty of others do too. Quantizied. Join us at r/localllama

My largest models

   318G    /llmzoo/models/Qwen3.5-397B
   377G    DeepSeekv3.2-nolight
   380G    /llmzoo/models/DeepSeek-V3.2-UD
   400G    /llmzoo/models/Qwen3.5-397B-Q8
   443G    DeepSeek-Math-v2
   443G    DeepSeek-V3-0324-Q5
   522G    /llmzoo/models/GLM5.1
   545G    /llmzoo/models/kimi2.6
   546G    /llmzoo/models/KimiK2.5

tclancy•1 day ago

It's a big house.

UncleOxidant•1 day ago

Maybe if there was a 1-bit quant.

verdverm•1 day ago

Vertex AI has had deep seek available via API for a while

segmondy•1 day ago

I'm talking about their specialized math models, not the general model.

PhilippGille•2 days ago

When you say "Gemini", which exact model do you mean? You know there are several and they vary a lot in how capable they are? Pro 3.1 Preview, 2.5 Pro (their latest non-preview pro model), Flash 3 Preview, ...

Same with GPT-5: Latest 5.5, prior 5.4, or actually the original 5 (.0)?

You can't talk about model performance without specifying the exact model.

hodgehog11•1 day ago

My apologies, I thought it would be implicit that I am using the top-tier model of the time given the challenge of the tasks. GPT-5.5 was too new in this top comment (although I did test it a bit in a comment below), so I was using GPT-5.4. Gemini is Pro 3.1 Preview.

WarmWash•1 day ago

High bet on 3.1 pro. I use it a lot for math and classic engineering, it's very strong.

ozgune•2 days ago

I reviewed how DeepSeek V4-Pro, Kimi 2.6, Opus 4.6, and Opus 4.7 across the same AI benchmarks. All results are for Max editions, except for Kimi.

Summary: Opus 4.6 forms the baseline all three are trying to beat. DeepSeek V4-Pro roughly matches it across the board, Kimi K2.6 edges it on agentic/coding benchmarks, and Opus 4.7 surpasses it on nearly everything except web search.

DeepSeek V4-Pro Max shines in competitive coding benchmarks. However, it trails both Opus models on software engineering. Kimi K2.6 is remarkably competitive as an open-weight model. Its main weakness is in pure reasoning (GPQA, HMMT) where it trails Opus.

Speculation: The DeepSeek team wanted to come out with a model that surpassed proprietary ones. However, OpenAI dropped 5.4 and 5.5 and Anthropic released Opus 4.6 and 4.7. So they chose to just release V4 and iterate on it.

Basis for speculation? (i) The original reported timeline for the model was February. (ii) Their Hugging Face model card starts with "We present a preview version of DeepSeek-V4 series". (iii) V4 isn't multimodal yet (unlike the others) and their technical report states "We are also working on incorporating multimodal capabilities to our models."

solenoid0937•1 day ago

I feel like people suck at promoting Opus. Baseline, it's pretty on par with GPT 5.5.

But if you prompt it well - give it the reasoning behind why you're asking it to do something - it pulls far ahead.

hodgehog11•1 day ago

That's fine for procedural tasks, and I understand its value there. But these particular tasks I'm referring to occur on the front lines of research. You can't expect the prompts to be incredibly detailed, since those details are the whole challenge of the problem. I think there is value in having models that are capable of making really good preliminary insights to help guide the research.

cultofmetatron•about 22 hours ago

I really wanted to get excited about opus but in my own real world usage, I wasn't getting much out of it before hitting my limits. meanwhile i can abuse codex on 5.5 for hours getting a whole lot of work done. Plus, open code and PI are much more fun and interesting harnesses to work from than claude code imho.

I will however say that claude work and design are really great up until i blow its limit.

arcanemachiner•about 22 hours ago

Would love to know how GLM 5.1 stacks up in this ranking. Seems like it's on par with Kimi K2.6.

bbertelsen•1 day ago

I'd be interested to know when that Opus 4.6 baseline is from given their recent recognition of performance issues. Do you have a paper posted on this review?

ozgune•1 day ago

Ack. I took the benchmark results that AI Labs themselves published for their models. So the Opus 4.6 baseline would be from the time that Anthropic released the model.

lifty•2 days ago

Wondering how gpt 5.5 is doing in your test. Happy to hear that DeepSeek has good performance in your test, because my experience seems to correlate with yours, for the coding problems I am working on. Claude doesn't seem to be so good if you stray away from writing http handlers (the modern web app stack in its various incarnations).

hodgehog11•2 days ago

Very cool to hear there is agreement with (probably quite challenging?) coding problems as well.

Just ran a couple of them through GPT 5.5, but this is a single attempt, so take any of this with a grain of salt. I'm on the Plus tier with memory off so each chat should have no memory of any other attempt (same goes for other models too).

It seems to be getting more of the impressive insights that Gemini got and doing so much faster, but I'm having a really hard time getting it to spit out a proper lengthy proof in a single prompt, as it loves its "summaries". For the random matrix theory problems, it also doesn't seem to adhere to the notation used in the documents I give it, which is a bit weird. My general impression at the moment is that it is probably on par with Gemini for the important stuff, and both are a bit better than DeepSeek.

I can't stress how much better these three models are than everything else though (at least in my type of math problems). Claude can't get anything nontrivial on any of the problems within ten (!!) minutes of thinking, so I have to shut it off before I run into usage limits. I have colleagues who love using Claude for tiny lemmas and things, so your mileage may vary, but it seems pretty bad at the hard stuff. Kimi and GLM are so vague as to be useless.

wohoef•about 21 hours ago

Do you an idea of how well these models perform on set theory problems or more niche fields in mathematics? So the model would have to both understand a paper that’s not in its training data, and use this to write proofs.

lifty•2 days ago

My work is on a p2p database with quite weird constraints and complex and emergent interactions between peers. So it's more a system design problem than coding. Chatgpt 5.x has been helping me close the loop slowly while opus did help me initially a lot but later was missing many of the important details, leading to going in circles to some degree. Still remains to be seen if this whole endeavour will be successful with the current class of models.

giwook•1 day ago

Doesn't the Plus tier not have access to their best (Pro) model?

dataviz1000•about 21 hours ago

> there is no quantitative measure of performance here

Have them do multiplication or other complicated arithmetic. You say that isn't difficult. Then why do they burn 200k tokens in 20 minutes without converging? I did a deep exploration to help myself understand here [0].

[0] https://adamsohn.com/reliably-incorrect/

alansaber•2 days ago

Very interesting. I wonder how much of this is due to the context length. I am unclear on the implementation strategy, you ran this problem as a 1-shot using chat mode, or using each on an agent harness?

segmondy•2 days ago

Has nothing to do with context length, they have experience training math models, they have a model that would take gold in IMO and a lean prover. Both have been out for almost a year.

bnm04•2 days ago

Have you also tried the Pro versions of ChatGPT and Gemini (Deep Think)?

hodgehog11•1 day ago

Yes to both, I'm paying for them and use the top-tier thinking models.

nibbleyou•2 days ago

Curious to know what kind of problems you are talking about here

hodgehog11•2 days ago

I don't want to give away too much due to anonymity reasons, but the problems are generally in the following areas (in order from hardest to easiest):

- One problem on using quantum mechanics and C*-algebra techniques for non-Markovian stochastic processes. The interchange between the physics and probability languages often trips the models up, so pretty much everything tends to fail here.

- Three problems in random matrix theory and free probability; these require strong combinatorial skills and a good understanding of novel definitions, requiring multiple papers for context.

- One problem in saddle-point approximation; I've just recently put together a manuscript for this one with a masters student, so it isn't trivial either, but does not require as much insight.

- One problem pertaining to bounds on integral probability metrics for time-series modelling.

MinimalAction•2 days ago

Regarding the first problem: are you looking at NCP maps for non-Markovian processes given you mention C*-algebra? Or is it more of a continuous weak monitoring of a stochastic system that results in dynamics with memory effects?

I'd be very curious to know how any LLMs fare. I completely understand if you don't want to continue the discussion because of anonymity reasons.

pm2r•2 days ago

It would be wonderful to have a deeper insight, but I understand that you can disclose your identity (I understand that you work in applied research field, right ? )

fuddle•1 day ago

Any plans to publish the benchmark results?

hodgehog11•1 day ago

I have plans to publish the problems, not any plans to publish how well the LLMs perform on them. The standard for publishing benchmarks is very high, and I'm really just posting vibes here. Still, I hope my experiences are useful to some people, as others experiences have been useful to me.

throwa356262•2 days ago

Seriously, why can't huge companies like OpenAI and Google produce documentation that is half this good??

https://api-docs.deepseek.com/guides/thinking_mode

No BS, just a concise description of exactly what I need to write my own agent.

u_sama•2 days ago

I am very partial to Mistral's API docs https://docs.mistral.ai/api

eshack94•1 day ago

Agreed, they also have great documentation. There's something to be said for documentation that is so concise, well laid out, and immediately actionable for those looking to get started quickly.

madduci•about 20 hours ago

For me, DeepSeek has been the best so far, in terms of coding skills, performance and documentation all together. Too bad this is flagged as 'concerning' when it comes to privacy, while on the other hand Gemini, ChatGPT and Claude are way beyond that, especially their mobile apps requiring a lot of permissions.

lykr0n•2 days ago

It's because they're optimizing for a different problem.

Western Models are optimizing to be used as an interchangeable product. Chinese models are being optimizing to be built upon.

Barbing•2 days ago

>Western Models are optimizing to be used as an interchangeable product.

But so much investment in their platforms, not just their APIs?

raincole•2 days ago

[flagged]

setr•2 days ago

First you clone the API of the winner, because you want to siphon users from its install-base and offer de-risked switch over cost.

Now that you’re winning, others start cloning your API to siphon your users.

Now that you’re losing, you start cloning the current winner, who is probably a clone of your clone.

Highly competitive markets tend to normalize, because lock-in is a cost you can’t charge and remain competitive. The customer holds power here, not the supplier.

Thats also why everyone is trying to build into the less competitive spaces, where they could potentially moat. Tooling, certs, specialized training data, etc

hunter67•2 days ago

Our (western) economic model forces competing individual companies to be profitable quickly. China can ignore DeepSeek losing money, because they know developing DeepSeek will help China. Not every institution needs to be profitable.

FuckButtons•2 days ago

yes, they want to win the same way they won more or less every other economic competition in the last 30 years, scale out, drop prices and asphyxiate the competition.

simonjgreen•2 days ago

Yeah, it’s an interesting one. I think inertia and expectations at this point? I don’t think the big labs anticipated how low the model switching costs would be and how quickly their leads would be eroded (by each other and the upstarts)

They are developing their moats with the platform tooling around it right now though. Look at Anthropic with Routines and OpenAI with Agents. Drop that capability in to a business with loose controls and suddenly you have a very sticky product with high switching costs. Meanwhile if you stick with purely the ‘chat’ use cases, even Cowork and scheduled tasks, you maintain portability.

tick_tock_tick•2 days ago

They are all racing to AGI. They aren't designing them to be interchangeable they just happen to be.

peepee1982•2 days ago

If you want other people to know whether you're being genuine or sarcastic, you'll have to put a bit more effort into your comments. Your comment just adds noise.

vitorgrs•2 days ago

Meanwhile, they don't actually say which model you are running on Deepseek Chat website.

alansaber•2 days ago

Because they produce revenue from products which abstract this away

Alifatisk•2 days ago

You might enjoy Z.ais api docs aswell

kubb•2 days ago

Western orgs have been captured by Silicon Valley style patrimonialism, and aren’t based on merit anymore.

kccqzy•2 days ago

I spent only two minutes reading their documentation and it’s clear no one did any proofreading and it’s full of mistakes made by non-native speakers.

Example: the second sentence on the first page says “softwares” but “software” is a mass noun that cannot be pluralized.

Example: the third page about tokens has some zipped code to “calculate the token usage for your intput/output” and obviously “intput” should be “input” but misspelled.

As a company that produces LLMs, they could have even used their own LLM to edit their documentation to fix grammar issues, and yet they did not.

Maybe I’m just extra sensitive to grammar and spelling issues but this kind of lack of attention to detail is a huge subconscious turnoff. I had to fight my urge to close the tab.

Maxatar•1 day ago

Yeah I think those details are the least of most peoples concerns. I can't vouch one way or another for DeepSeek's documentation but for me what matters most when reading documentation is being able to get the information I want efficiently, not whether someone spelled "software" as "softwares", which is a very common spelling in Asia as an FYI.

I read OpenAI or Anthropic's documentation nowadays and it's just so full of useless junk and self-congratulation that makes it a miserable experience to go through. It's a real shame because OpenAI used to write stellar documentation and publish really lucid papers just few years ago.

aprdm•1 day ago

No one cares about this kind of stuff. 99% of the devs are not English native speakers, what do you expect ? It works and we all can understand it

kccqzy•1 day ago

I try hard not to care but subconsciously spelling errors and grammar issues scream low-quality work to me. It’s the kind of mistake that’s the easiest to correct, and they didn’t bother.

amluto•2 days ago

The tool calling Python example would have benefitted from actually parsing the tool call. As is, it explains almost nothing.

dackdel•1 day ago

i dont think deepseek will ever recover from this. huge loss for them. they will stop the pursuit of agi cause of one hn user and a comma.

squirrellous•1 day ago

This tells me a real developer wrote the docs, instead of someone with good English writing skills but is less technical.

> they could have even used their own LLM to edit their documentation to fix grammar issues

In my experience companies who do this rarely stop at using LLMs to fix grammar issues. It becomes full on LLM speak quite fast, especially if there isn’t a native English speaker in the room who can discern what’s good and bad writing.

replwoacause•1 day ago

pedantry

slopinthebag•1 day ago

i prefer it cuz it indicates they didnt use an LLM to write their documentations and that its human generated

jen20•1 day ago

> Example: the second sentence on the first page says “softwares” but “software” is a mass noun that cannot be pluralized.

I constantly see and hear this mistake from actual humans too.

It's fairly ironic that your own comment contains run-on sentences, speculative claims and phrasing peculiarities like "could have even" instead of "could even have". Perhaps you are less sensitive to this than you think!

angry_octet•1 day ago

There is a difference between conversational speech and formal speech like documentation. It isn't rational to criticise use of the first when such speech is complaining about errors in the latter.

It's strange that you criticise "could have even" when it is a phrasing clearly being used for emphasis. "Could even have" makes no clearer sense in context.

No irony detected.

ChrisClark•1 day ago

Nobody cares, we're talking about quality documentation here, not a couple spelling mistakes

orbital-decay•2 days ago

>we implement end-to-end, bitwise batch-invariant, and deterministic kernels with minimal performance overhead

Pretty cool, I think they're the first to guarantee determinism with the fixed seed or at the temperature 0. Google came close but never guaranteed it AFAIK. DeepSeek show their roots - it may not strictly be a SotA model, but there's a ton of low-level optimizations nobody else pays attention to.

whatreason•1 day ago

There have been others for sure, but I'm not sure who was first https://vllm-website-pdzeaspbm-inferact-inc.vercel.app/blog/...

oofbey•about 17 hours ago

Nobody does it because it’s expensive. If you remove the requirement for perfect reproducibility you open the door to lots of optimizations. Most people prefer faster cheaper results over perfect reproducibility. When the model is intrinsically statistical the value of perfect reproducibility is … limited.

orbital-decay•about 15 hours ago

Yeah, of course. Making it cheap/compatible with heavy batching is exactly what they did, that's what I mean. ("with minimal performance overhead")

chenzhekl•2 days ago

It's interesting that they mentioned in the release notes:

"Limited by the capacity of high-end computational resources, the current throughput of the Pro model remains constrained. We expect its pricing to decrease significantly once the Ascend 950 has been deployed into production."

https://api-docs.deepseek.com/zh-cn/news/news260424#api-%E8%...

XCSme•2 days ago

Yup, I tried to benchmark it, but harder questions time out or get rate-limited...

nsoonhui•2 days ago

Sorry, but exactly where in the article that you linked contains the mention of " Ascend 950"?

chenzhekl•2 days ago

it's in the footnote text of the first figure of the section the link points to, where "昇腾950" means "Ascend 950"

nsoonhui•2 days ago

OK, strange that it doesn't appear on my version of the webpage

https://api-docs.deepseek.com/zh-cn/news/news260424#api-%E8%...

This is the first figure of the section that the above links point to (https://api-docs.deepseek.com/zh-cn/img/v4-spec.png).

And I can read Chinese.

gertlabs•1 day ago

Objective, detailed benchmark results at https://gertlabs.com

Early takeaways: from this release, DeepSeek V4 Flash is the model to pay attention to here. It's cheap, effective, and REALLY fast.

The Pro model is slow, not much better in coding reasoning so far when it works, and honestly too unreliable and rate limited to be of much use, currently. Hopefully that improves as new providers host the model. Flash is working fine, and is currently performing competitively with recent releases, but only on agentic workflows. Check back in 24 hours for full combined scoring with tool use and long context for both models.

Many of the frontier Chinese AI labs have released near-frontier models that are just a little bit behind Opus 4.6 in terms of speed, tool use ability, or long context handling. Open weights are winning the AI race, led by China. Crazy couple weeks of releases.

Mimo V2.5 Pro by Xiaomi (not open weights) is actually the best performer of the latest string of Chinese releases in our combined, comprehensive benchmarks, despite getting less attention. Kimi K2.6 is the most interesting open weights release, still. DeepSeek is not the leader in the space anymore.

An interesting pattern with the latest string of Chinese releases is the much better agentic boost (models are not as smart out of the box, but their ability to iterate in a loop with tools makes up most of the difference). Deepseek V4 Flash exemplifying this -- not a smart model on the first try, but it makes up for it over the course of a session.

Squarex•1 day ago

I would say all benchmarks are inherently subjective. How is yours better? It seems to produce a little bit strange results. Opus 4.6 being worse than 4.5 for example. Or chinese models being rated too high. Kimi, Deepseek or GLM are all great in open source world, but I don't believe they are ahead of SOTA models from Anthropic, OpenAI or Google.

gertlabs•1 day ago

No, some benchmarks are definitely objective, but most can be easily gamed. For example, most of the benchmarks on the model cards: they have measurable answers that don't rely on a human judge (a human made the question, but the answers are measuring some uncontroversial knowledge or capability). But because there is a single, correct answer, and those answer leak (or are randomly discovered and optimized for in training), they lose value over time, and regardless, they have a ceiling on the intelligence they can measure.

Others are purely subjective, like LMArena, which really only measures the personality and style preferences of the masses at this point, because frontier LLM technical answers are too hard for the average person to judge.

Then there are some interesting one-off benchmarks, but they lack enough rigor, breadth, and samples to draw larger conclusions from.

So we designed our benchmark with 3 goals: objective measurements (individual submissions not dependent on a human or LLM judge), no known correct answer (so simulations can scale to much higher levels of intelligence), and enough variety over important aspects of intelligence. We do this by running multiple models in cooperative/competitive environments with very complex action spaces and objective scoring, where model performance is relative and affected by the actions of other participants.

And yeah, there are some interesting results when you have a more objective benchmark. It should raise eyebrows when every single sub-release of every company's model is better across the board than its predecessor -- that isn't reality.

Squarex•1 day ago

The word "objective" just seems too authoritative to me.

segmondy•1 day ago

you are arguing with your belief instead of an objective truth. benchmark is more objective, if you don't agree with it, come up with a better one. but what you believe doesn't matter.

Squarex•1 day ago

It was not a confrontational take. But all benchmarks are designed by humans, we are not that great at measuring intelligence. So it is somewhat subjective. I was just arguing with the word "objective". Not with the results per se.

tw1984•about 21 hours ago

I agree that benchmarks are inherently subjective.

but the fact that you cite your brief as your main argument is funny - you don't even have any inherently subjective numbers to justify what you believe, you only have "I don't believe".

Squarex•about 14 hours ago

Sure, I have mixed up two things together. I don't think this benchmark is bad, I just did not like it is presented as the ultimate objective truth. The other thing I have mentioned is that it delivers different results from other benchmarks, so the "believe" stems from other benchmarks.

dandaka•1 day ago

Interesting that you rate Claude Opus 4.6 lower than 4.5 and 4.7, while community consensus puts it on top.

nostrebored•1 day ago

I think most hardcore people I know are still sticking with 4.5 for coding workflows

kamranjon•1 day ago

I'm particularly interested in it being REALLY fast - do you have any rough tok/s numbers for the flash model? I'm excited for unsloth to drop some quants that I can try and run locally, but really curious how it's been performing speed wise. In general I actually over-index on speed over intelligence. I'd rather a model make mistakes quickly and correct in a follow-up than take forever to get a slightly better initial result.

gertlabs•1 day ago

Take a look at the Time column in https://gertlabs.com/?mode=oneshot_coding -- this is the total time to complete a solution for a reasonably complex problem end-to-end (you would have to divide by avg submission size to estimate tok/s). It's fast in the sense that most of the smart, recent Chinese releases are quite slow, especially the DeepSeek Pro variant. Opus 4.7 is also quite fast.

If pure speed is most important for your use case, GPT-5.3 Chat is the fastest model we've tested and it's still reasonably smart. Not meant for agentic tool usage / long context, though.

So it might be more useful for business applications or non-engineering usage where you don't need exceptional intelligence, but it's useful to get fast, cheap responses.

Lord_Zero•1 day ago

Why no mention of GPT-5.5?

gertlabs•1 day ago

Waiting on public API release. Once it drops, results will be up within 24 hours.

gertlabs•1 day ago

Results are up. GPT 5.5 is a beast.

latentframe•43 minutes ago

The 1.6T number is nice but also eye-catching and what matters most is how few parameters are active in practice, that’s what brings the most of the efficiency

revolvingthrow•2 days ago

> pricing "Pro" $3.48 / 1M output tokens vs $4.40

I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.

edit: $1.74/M input $3.48/M output on OpenRouter

schneehertz•2 days ago

This price is high even because of the current shortage of inference cards available to DeepSeek; they claimed in their press release that once the Ascend 950 computing cards are launched in the second half of the year, the price of the Pro version will drop significantly

Bombthecat•2 days ago

In six month deepseek won't be sota anymore und usage will be wayyyy down.

randomgermanguy•2 days ago

Only comparing on SOTA scores (ignoring price etc.) is like choosing your daily-driver by looking at who makes the fastest sports-car...

2ndorderthought•2 days ago

A huge proportion of those scores are gamed anyways. Use whatever works for you at the price and availability you can afford

Palmik•2 days ago

Or there will be DSv4.1/2/3 ;)

Barbing•2 days ago

Well, if they distilled once…

menzoic•2 days ago

API prices may be profitable. Subscriptions may still be subsidized for power users. Free tiers almost certainly are. And frontier labs may be subsidizing overall business growth, training, product features, and peak capacity, even if a normal metered API call is profitable on marginal inference.

dannyw•2 days ago

Research and training costs have to be amortized from somewhere; and labs are always training. I'm definitely keen for the financials when the two files for IPO though, it would be interesting to see; although I'm sure it won't be broken down much.

m00x•2 days ago

They are profitable to opex costs, but not capex costs with the current depreciation schedules, though those are now edging higher than expected.

nl•2 days ago

Amazingly, the current depreciation overestimates the retained value of GPUs.

In 2023, the depreciation schedule for H100s was 2 years, but they are still oversubscribed and generating signficant income.

Coreweve has upped their depreciation for GPUs to 6 years(!) now, which seems more realistic.

https://www.silicondata.com/blog/h100-rental-price-over-time

amunozo•2 days ago

I was thinking the same. How can it be than other providers can offer third-party open source models with roughly the similar quality like this, Kimi K2.6 or GLM 5.1 for 10 times less the price? How can it be that GPT 5.5 is suddenly twice the price as GPT 5.4 while being faster? I don't believe that it's a bigger, more expensive model to run, it's just they're starting to raise up the prices because they can and their product is good (which is honest as long as they're transparent with it). Honestly the movement about subscription costing the company 20 times more than we're paying is just a PR movement to justify the price hike.

peepee1982•2 days ago

I'm pretty sure OpenAI and Anthropic are overpricing their token billed API usage mainly as an incentive to commit to get their subscriptions instead.

simonjgreen•2 days ago

Anthropic recently dropped all inclusive use from new enterprise subscriptions, your seat sub gets you a seat with no usage. All usage is then charged at API rates. It’s like a worst of both worlds!

weird-eye-issue•2 days ago

The target audience for the APIs is third party apps which are not compatible with the subscriptions.

adam_patarino•2 days ago

Prices are not just hard cost of inference. Training costs are not equal. Chinese labs have cheaper access to large data centers. I also suspect they operate far more efficiently than orgs like openAI.

mirzap•2 days ago

My thoughts exactly. I also believe that subscription services are profitable, and the talk about subsidies is just a way to extract higher profit margins from the API prices businesses pay.

Bombthecat•2 days ago

Google stated a while back, that with tpus they are able to sell at cost / with profit.

Aka: everyone who uses Nvidia isn't selling at cost, because Nvidia is so expensive.

LinXitoW•2 days ago

They got loans to buy inference hardware on the promise of potential AGI, or at least something approaching ASI, all leading to stupid amounts of profit for those investors.

We therefore cannot just look at inference costs directly, training is part of the pitch. Without the promises of continuous improvement and chasing the elusive AGI, money for investments for inference evaporates.

WarmWash•1 day ago

Because you are comparing China to the US.

In China you need to appease state goals. In the US you need to appease investor goals.

China will keep funding them regardless of their income, because the goal is (ostensibly) a state AGI/ASI. In the US, the goal is an ROI which may or may not come with AGI/ASI.

They are different economies with different goals. We can look at past Chinese national projects and see that they are fine with burning $50 to get [social goal] that's worth $5.

ting0•1 day ago

This is nonsense. The real reason is because the US companies are scamming the public, as per usual.

vitorgrs•2 days ago

And they actually say the prices will be "significantly" lower in second semester when Huawei 650 chips comes in.

raincole•2 days ago

Insert always has been meme.

But seriously, it just stems from the fact some people want AI to go away. If you set your conclusion first, you can very easily derive any premise. AI must go away -> AI must be a bad business -> AI must be losing money.

louiereederson•2 days ago

It is possible to question the sustainability of the AI buildout and not have a dogmatic position on AI development.

There are still major unanswered questions here. For instance, all of the incremental data capacity build out is going to businesses that have totally unknown LT unit economics and that today are burning obscene amounts of cash.

evilos•1 day ago

The people who doubted the sustainability of dot com era bubbles were correct even though the tech was actually transformational. Personally I expect roughly the same outcome.

zarzavat•2 days ago

Before the AI bubble that will burst any time now, there was the AI winter that would magically arrive before the models got good enough to rival humans.

jimmydoe•2 days ago

They’ve also announced Pro price will further drop 2H26 once they have more HUAWEI chips.

masafej536•2 days ago

Point taken but there isnt any western providers there yet. Power is cheaper in china.

3uler•2 days ago

These models are open and there are tons of western providers offering it at comparable rates.

NitpickLawyer•2 days ago

As this is a new arch with tons of optimisations, it'll take some time for inference engines to support it properly, and we'll see more 3rd party providers offer it. Once that settles we'll have a median price for an optimised 1.6T model, and can "guesstimate" from there what the big labs can reasonably serve for the same price. But yeah, it's been said for a while that big labs are ok on API costs. The only unknown is if subscriptions were profitable or not. They've all been reducing the limits lately it seems.

ithkuil•2 days ago

Is there evidence that frontier models at anthropic, openai or google or whatnot are not using comparable optimizations to draw down their coats and that their markup is just higher because they can?

persedes•2 days ago

not soooo much though. It's heavily subsidized for residential consumption, but industrial power rates are almost comparable to the US (depends on the state you go to etc).

ting0•1 day ago

They don't make sense, they're a lie that these AI companies keep spamming using bots so that useful idiots perpetuate it, so that they can keep draining us of money. Straight out of the Anthropic handbook. They've always been cheap to run. I wouldn't be surprised if Anthropic is running for <$1 for 1M/tok.

dminik•2 days ago

I mean, not one "bleeding edge" lab has stated they are profitable. They don't publish financials aside from revenue. And in Anthropic's case, they fuck with pricing every week. Clearly something is wrong here.

npn•2 days ago

you know, if you don't have to pay insane salary for your top engineers, and don't have to pay billions for internet shills to control the narrative, then all of the labs will be insane profitable.

crazylogger•2 days ago

I haven't seen anyone claiming that API prices are subsidized.

At some point (from the very beginning till ~2025Q4) Claude Code's usage limit was so generous that you can get roughly $10~20 (API-price-equivalent) worth of usage out of a $20/mo Pro plan each day (2 * 5h window) - and for good reason, because LLM agentic coding is extremely token-heavy, people simply wouldn't return to Claude Code for the second time if provided usage wasn't generous or every prompt costs you $1. And then Codex started trying to poach Claude Code users by offering even greater limits and constantly resetting everyone's limit in recent months. The API price would have to be 30x operating cost to make this not a subsidy. That would be an extraordinary claim.

nl•2 days ago

The claim that APIs are subsidized is very common.

eg:

Token prices are significantly subsidized and anyone that does any serious work with AI can tell you this.

https://news.ycombinator.com/item?id=47684887

(the claims don't make any sense, but they are widely held)

vessenes•2 days ago

I’ll note that it’s common and dangerous, in that there’s a generation of engineers who are at risk of leading each-other astray as to the economics and therefore probability distribution of outcomes for some firms that will massively impact their careers.

I think I understand the major reasons for this meme, but I find it really worrying; there were lots of incorrect ‘it’s a bubble’ conversations here in 2012-2015, but I don’t think they had the pervasive nature and “obvious” conclusion that a whole generation of engineering talent should just, you know, leave.

Meanwhile I am hearing rational economic modeling from the companies selling inference; Jensen, (a polished promoter, I grant you) says it really well — token value is increasing radically, in that new models -> better quality, and therefore revenues and utilization are increasing, and therefore contrary to the popular financial and techbro modeling of 2023, things like A100s still cost quite a lot whether hourly or to purchase. (!) Basically the economic value is so strong that it has actually radically extended the life of hardware.

I just hate to imagine like half of the world’s (or US’s) engineering talent quitting, spending ten years afraid, or wrongly convinced of some ‘inevitable’ market outcome. Feels like it will be bad for people’s personal lives, and bad for progress simultaneously.

dannyw•2 days ago

Yeah, subscriptions used to be extraordinarily generous. I miss those days, but the reinvigoration of open weight models is super exciting.

I'm still playing with the new Qwen3.6 35B and impressed, now DeepSeek v4 drops; with both base and instruction-tuned weights? There goes my weekend :P

Flavius•2 days ago

It's because investors in OpenAI/Anthropic want to get their money back in 10 months, not in 10 years.

casey2•2 days ago

It's the decades of performance doesn't matter SV/web culture. I'd be surprised if over 1% of OpenAI/Anthropic staff know how any non-toy computer system works.

sekai•2 days ago

> I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.

One answer - Chinese Communist Party. They are being subsidized by the state.

lbreakjai•2 days ago

When China does it it's communism. When companies in the west get massive tax cuts, rebates, incentives and subsidies, that's just supporting the captains of industry.

jari_mustonen•2 days ago

Open Source as it gets in this space, top notch developer documentation, and prices insanely low, while delivering frontier model capabilities. So basically, this is from hackers to hackers. Loving it!

Also, note that there's zero CUDA dependency. It runs entirely on Huawei chips. In other words, Chinese ecosystem has delivered a complete AI stack. Like it or not, that's a big news. But what's there not to like when monopolies break down?

nabakin•1 day ago

> Also, note that there's zero CUDA dependency. It runs entirely on Huawei chips.

That is a huge claim to make with no evidence.

I researched what you said, and I have found no statement to that effect in their paper[0], on huggingface[1], twitter[2], WeChat[3], or in their news release[4].

They only mention as a footnote in only the Chinese version of their news release that they plan to reduce inference costs with the Ascend 950 supernode when it releases[5]. The only mention of Huawei in their paper is that they validated a technique to lower interconnect bandwidth on Ascend NPUs and Nvidia GPUs[6].

[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

[1] https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

[2] https://xcancel.com/deepseek_ai/status/2047516922263285776

[3] https://mp.weixin.qq.com/s/8bxXqS2R8Fx5-1TLDBiEDg

[4] https://api-docs.deepseek.com/news/news260424

[5] https://api-docs.deepseek.com/zh-cn/img/v4-price.png

[6] Page 16

glenstein•1 day ago

Comments like this are why I go to the comments! I never would have thought to check.

And while I'm here I want to note that I feel there's a big misunderstanding of what is and isn't demonstrated by DeepSeek. So far as I can tell the major (and important!) innovation is reproducing near-frontier level capabilities at a fraction of the cost, but it may be the case that iterating forward at the frontier is the costly thing and is a cost borne by Western companies and that nuance seems to get lost with DeepSeek. Which is not to say that as a matter of principle that non Western companies aren't sometimes capable of jumping into the lead (Kimi has been super impressive) but if GPT/Claude/etc "only" lead at the frontier with more expensive models, that's still a moat.

kybernetikos•1 day ago

If you can get something almost as capable for a fiftieth of the price, in most cases you'll do that. You might still send a few tokens to the more expensive option for the exceptional, difficult cases, but that's maybe 10% of the tokens at most. I don't see how it'll be possible to keep spending what anthropic, openai, google etc are spending if they're only going to see the trickiest 10% of tokens.

Scipio_Afri•1 day ago

Thank you for this due diligence, I was just reading through the technical report and couldn’t find any references to the software stack or hardware mentioning Huawei either and came back here wondering about this comment that I had read earlier.

jari_mustonen•1 day ago

Here's a note about running entirely on Huawei chips:

https://finance.yahoo.com/sectors/technology/articles/deepse...

tadfisher•1 day ago

> DeepSeek indicated that current service capacity for the V4 Pro series is constrained by a computing crunch, though pricing could fall after new clusters powered by Huawei's Ascend 950 chips come online in the second half of the year.

Only mention of Huawei in that article (as of now).

selectodude•1 day ago

Did you read any part of the link you posted? Huawei is mentioned once and not in the context of the model being trained or currently running on Huawei chips.

chvid•1 day ago

Not long ago the story was this:

DeepSeek’s next AI model delayed by attempt to use Chinese chips

https://www.ft.com/content/eb984646-6320-4bfe-a78d-a1da2274b...

czk•1 day ago

They mention it uses MXFP4 quant which is a blackwell capability but it looks like this is also supported by ascend 950 series according to marketing material

kappi•1 day ago

DeepSeek is planning to use Huawei extensively for inference

“Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly.”

https://x.com/jukan05/status/2047516566149816627

nabakin•1 day ago

Yes, that's the footnote from citation [5].

nsoonhui•1 day ago

I said the same thing as you and I got summarily downvoted (https://news.ycombinator.com/item?id=47888227).

That HN is quick to upvote an unsubstantiated comment ( the grandparent one, because it aligns with the anti US bias? ) and downvote fact finding one doesn't bode too well for the community as a whole. I have seen enough how polticial ideology colors everything in my home country( Malaysia), and the decline of the country is palpable, and I don't expect to find such a thing here. We are supposed to be impassioned and rational, right ?

Render to Jesus what's due to him, ditto for Caeser.

nabakin•1 day ago

Probably because you said you used DeepSeek. People don't want to see AI in the comments and don't trust AI responses.

dzonga•2 days ago

Jensen Huang said this in his recent interview - that China has the best/most engineers, it has the chip making ability, it's a good thing they wanna build on a Nvidia stack - but if you push them they will build on an all Chinese stack - but the interviewer was being a numb head who kept parroting the propaganda of Western tech supremacy

zdragnar•1 day ago

They would have moved to their own stack regardless. They've got the people and resources for it, and they've witnessed the fallout of globalization and experienced dependency on semi-hostile political powers enough to know that it's the smart move.

It's also more or less the same move that they've been using pretty much since the WTO entry: take on foreign manufacturing, copy the products, sell knockoffs as their own, build new products on top of the that knowledge.

arcticfox•2 days ago

Referring to the Dwarkesh interview clearly.

Jensen came across as incredibly defensive and intentionally close-minded, shows that even billionaires suffer from "a man can't understand something if his paycheck depends on him not understanding it."

Your assertion is silly: did Tesla selling electric cars into China stop them from delivering their own industry? They were going to develop their domestic industry regardless.

We simply don't know the counterfactual, if they had unlimited access to Nvidia chips, how far ahead would their models be?

awongh•2 days ago

I thought Jensen’s comparison to Huawei’s cell phone hardware infra (towers and networking) to be an interesting comparison- that shutting them out of a market was one of the causes of their current position in the market. It made them more dominant in the end.

ionelaipatioaei•2 days ago

"close-minded" are the stupid people that unironically believe in the EA crap

solenoid0937•1 day ago

> but if you push them they will build on an all Chinese stack

That's alright. It delays them at least.

throwaw12•1 day ago

Sure, then hopefully makes them stronger

ifwinterco•2 days ago

As a Brit I'm here for it to be honest, I'm tired of America with everything that's going on.

China is not perfect but a bit of competition is healthy and needed

chrsw•2 days ago

I'm American. If the choice is between the current US direction or China, then no, I don't think the word "healthy" should be anywhere near this discussion.

eejjs•2 days ago

I’m also a Brit and agree 100%.

We need to accept that being too close to America is harming us and start funding projects to protect our assets e.g talent leaking out to American entities.

nipponese•2 days ago

It’s a shame your country couldn’t get back its technical edge.

falkenstein•2 days ago

america is a continent. let’s take back our vocabulary (fellow european here). the little orange man shows very well what i mean when he started giving names to the gulf of mexico.

0xDEAFBEAD•2 days ago

"In English, North America is its own continent as is South America. The two can be collectively labeled the Americas or the Western hemisphere. Canadians frequently refer to themselves as North Americans and never as Americans. To insist this change is to demand the entire world’s lingua franca redefine words and thereby cause mass confusion for its speakers simply because doing so would be consistent with an arbitrary definition found in a foreign language."

https://scrupulouspessimism.substack.com/p/america-means-the...

cg5280•1 day ago

It's also a country. Not sure what insisting we change our demonym accomplishes.

hsiudh•2 days ago

"not perfect" is a _very_ big simplification of what China is though

rglullis•2 days ago

Isn't that the same to every major superpower?

IsTom•2 days ago

You can say the same about the US

hunter67•2 days ago

they compare it to fascist USA though

hart_russell•1 day ago

Americans are also tired with what’s going on.

lifeisstillgood•2 days ago

As a different Brit I do not accept such moral relativism.

China’s governments actions are on a completely different level - for example:

“””

Since 2014, the government of the People's Republic of China has committed a series of ongoing human rights abuses against Uyghurs and other Turkic Muslim minorities in Xinjiang which has often been characterized as persecution or as genocide.

“”” https://en.wikipedia.org/wiki/Persecution_of_Uyghurs_in_Chin...

https://www.amnesty.org/en/location/asia-and-the-pacific/eas...

Yes Trump is clearly trying Totalitarianism in America, but it is orders of magnitude different from what is happening in China.

amunozo•2 days ago

Why do we ignore all the human right abuses the US perform abroad? Iraq, Afghanistan, now Iran, Gaza and Lebanon through Israel, support to Saudi Arabia (which would not exist without the US), El Salvador... And inside it's also horrible with its treatment to immigrant.

That should be at least comparable (if not worse) than what China is doing.

phatfish•2 days ago

The US supports the genocide in Gaza, it supports the bombing of Lebanon. The US itself has now started (another) war and bombed Iran.

China is repressing the Uyghur and threatening Taiwan. I don't agree with these actions but is really "orders of magnitude" worse than the destruction the US facilitates in the Middle East?

With Trump they are now openly hostile to European democracies, and ICE and doing their best at repression within the US.

tw1984•2 days ago

It is just shocking to hear such stuff from someone in the UK.

cedws•2 days ago

There’s little to no evidence of such “genocide”, but I can go on YouTube to watch videos of the US bombing civilians in the Middle East.

timmmk•2 days ago

Fellow countryman here. I came here to say the same thing

jurgenburgen•2 days ago

I don’t know if we’re ahead of the curve but that tired feeling has started turning into hate here in the EU. I guess being threatened with invasion does that to you.

The next decade is going to look very different with America Alone.

koe123•2 days ago

I grew up in the states when I was younger, always feeling some closeness to Americans even after I moved back to Europe.

With all that goes on it has changed. Recently I sat on a plane near some Americans discussing their holidays here, and I noticed I felt contempt. Sitting their with insane privilege as their government torches the world.

Individuals remain individuals, and one really ought not to be prejudice. However the lack of resistance I see in in the “land of the free” as their “democratic” institutions collapse just makes me believe they never cared at all. In France cars are torched if the pension age is raised. In America the rise facism apparently doesnt matter to them.

ejpir•1 day ago

yup. I was taught in school in Europe to admire Americans and their might. Only in the last few years I've come to understand they are maybe one of the worst western countries there is. Countless wars, even under Obama, so it's not a president x or y thing. It's culcture. I would go as far as to say I'd rather visit Russia than America at this point. America is great at hiding their true colors and we've been properly brainwashed in the West by this.

nailer•2 days ago

As someone that lived in Britain for 15 years until 2024, I'm not sure a nation with a GDP per capita lower than Poland, that is now poorer than every state in America, with a gang rape epidemic the government tried to suppress investigating should really concern itself with how other countries are ran.

jbkkd•1 day ago

> GDP per capita lower than Poland Not true. While Poland's GDP per capita has been on the rise , it is still nowhere near the UK's.

https://en.wikipedia.org/wiki/List_of_countries_by_GDP_%28no...

ifwinterco•1 day ago

I'm not arguing that the UK is a particularly well run country, I just provided the context that I am British because it felt relevant

ogogmad•2 days ago

> GDP per capita lower than Poland

> now poorer than every state in America

You've confused the mean with the median. GDP Per Capita is not a measure of how well-off the people in a country are.

American states have a lot more income inequality than the UK does, which (due to positive "non-parametric skewness", I think) pulls their GDP Per Capita upwards.

stronglikedan•1 day ago

> I'm tired of America with everything that's going on.

Yeah, me too. All that pesky saving the world stuff that we do on the regular is so exhausting sometimes.

fn-mote•1 day ago

“Saving the world” recently has meant being involved in wars in Afghanistan, Iraq, and now Iran.

None of those have brought me a feeling of being part of saving someone.

geon•about 18 hours ago

Who has been saved? The US has been doing much more harm than good.

jug•1 day ago

Prices are also expected to drop significantly in H2 as they move to Huawei Ascend 950 super nodes.

Yes, even compared to this low price point.

As before, the headline news with DeepSeek isn't in the benchmarks, but that they're competitive there while being gut churningly cheap for the Western AI industry.

TrackerFF•2 days ago

Let's see how long it takes before the big US AI companies start lobbying to outright ban use of Chinese AI, even the open source / local models. For "national security" reasons, of course.

chronc6393•2 days ago

> Let's see how long it takes before the big US AI companies start lobbying to outright ban use of Chinese AI, even the open source / local models. For "national security" reasons, of course.

Already do on EVs.

05•2 days ago

..and drones https://www.fcc.gov/document/fcc-updates-covered-list-add-ce...

Scroll_Swe•about 23 hours ago

? Every company should lock down AI and only whitelist allowed tools.

We are "only" allowed Claude and MS Copilot for security reasons and cost reasons.

wookmaster•2 days ago

This is already happening. My company just went through this

barnabee•2 days ago

Hopefully the US’ self imposed isolation will mean that when they do, they aren’t able to force the rest of the world to follow suit.

zrn900•about 20 hours ago

They already did - State Dept. launched global campaign against Deepseek.

resters•1 day ago

Just looked into buying some Chinese GPUs and it turns out it's not easy or even legal! Big WTF moment.

OsrsNeedsf2P•1 day ago

Same thing for Chinese EVs. America can't compete in the free market anymore

AlanYx•1 day ago

In the US, yes, but Huawei has been gaining ground selling its SuperPod/Ascend turnkey solutions internationally, with some major recent wins in Thailand, Brazil, Egypt and Morocco.

khalic•2 days ago

Open weight and open source are not the same

SquareWheel•2 days ago

This is a pretty banal comment at this point. Open source is the term used in the LLM community. It's common and understood. Nobody is going to release petabytes of copyrighted training data, so the distinction between open source vs weights is a rather pointless one.

8note•1 day ago

its still a pointed one.

"open source" keeps being redefined by people with wealth and power to restrict our computing rights.

eventually its just gonna be "proprietary microsoft code that runs on microsoft servers, but you can see a portion of the results"

khalic•1 day ago

Tell this to the Allen project, Apertus Project, SmoLLM, etc, etc, etc

stefan_•2 days ago

First you steal all the code, then you want to redefine the term? Is it never enough with you AI guys? Where's the humility, where's the good?

digitaltrees•1 day ago

I am all for monopoly breakdown. But there is an argument that this is anticompetitive strategy designed to undercut the commercial viability of the other labs. In free trade negotiations this is called “dumping”: selling a product below cost at a high volume to gain market share by driving competition out of the market and then raising prices when you’ve outlasted them.

otagekki•1 day ago

Unfortunately Uber and the likes have been doing this but nobody batted an eye

digitaltrees•1 day ago

I am a critic.

shiftingleft•1 day ago

Is it really the full pipeline running on Huawei hardware? That is training and inference?

The report only talks about validating the "fine-grained EP scheme" on Huawei hardware.

ibic•2 days ago

"Open Source" is the ultimate romance understood by software engineers.

laurentiurad•2 days ago

not a full AI stack. Training still runs on NVIDIA chips.

sudo_cowsay•2 days ago

I sometimes wonder if there are any security risks with using Chinese LLMs. Is there?

dalemhurley•2 days ago

Theoretically yes. It is entirely possible to poison the training data for a supply chain attack against vibe coders. The trick would be to make it extremely specific for a high value target so it is not picked up by a wide range of people. You could also target a specific open source project that is used by another widely used product.

However there is so many factors involved beyond your control that it would not be a viable option compared to other possible security attacks.

2ndorderthought•2 days ago

I believe this is possible but unlikely. I don't think a Chinese company trying to break down the US's stronghold in this field would do this short term. I think it is in their best interest to be cheaper, better, easier, and more trust worthy until competition looks silly.

It's like suggesting BYD has a high likelihood of making their cars into weapons or something. It's not in the company or their countries interest to do that.

Sure it could happen but I bet it would only happen in a targeted way. Why risk all credibility right now and engage in cyber warfare?

mazurnification•2 days ago

But propaganda or non ethical marketing - why not? (That is bias toward pointing to certain provider(s)).

wallst07•2 days ago

or more obvious like TikTok.

Meaning Tiktok in the us is complete garbage for kids, almost like a virus. Whereas in China it's more educational.

_blk•2 days ago

Would be interesting to hook up a much simpler LLM as fact checker to see when errors are introduced.

If I had to place a hidden target it'd probably be around RNGs or publicly exposed services..

rapind•2 days ago

All China (or anyone) has to do is deliver a close to equal product at a much cheaper price and make it scaleable / usable... which is what they're doing. It doesn't have to be malicious at all. Just a good product at a good price. The US is basically in a recession that's hiding behind insane AI investments.

oliwarner•2 days ago

If there is, couldn't they exist in any model?

I don't mean that flippantly. These things are dumped in the wild, used on common (largely) open source execution chains. If you find a software exploit, it's going to affect your population too.

Wet exploits are a bit harder to track. I'd assume there are plenty of biases based on training material but who knows if these models have a MKUltra training programme integrated into them?

cassianoleal•2 days ago

What about LLMs from other origins? What makes them less risky?

rhubarbtree•2 days ago

Backdooring software at scale.

Spearphishing.

Building reliance and exploiting it, through state subsidies, dumping, and market manipulation.

Handicapping provision to the west for competitive advantage.

2ndorderthought•2 days ago

Do you think doing any of those things with in the next year does more to forward China as a super power then say, dethroning all of the US hype around LLMs?

Tech ceos are going around talking about how they will rule over employees and they will be unable to work in the future except for intelligence tokens. What if China commoditizes that without spending nearly as much resources? Kind of makes the trillions of dollars invested in the US a literal joke.

gmerc•2 days ago

Anyone can do that via the scrapers. The model developers actually have something to lose tho

seniorThrowaway•1 day ago

Are you implying only one country does these things?

Hamuko•2 days ago

There must be. The executives at my company wouldn't have banned them all for no reason after all.

surgical_fire•2 days ago

I sometimes wonder is there are any security risks with using LLMs from the US.

eucyclos•2 days ago

From my experience, kinda the opposite? It's like Chinese software is... Harder to weaponize or hurt yourself on. Deepseek is definitely censored, but I've never caught it being dishonest in a sneaky way.

SXX•1 day ago

If you run local Deepseek, quant or distill its answer just fine on this prompt " What happened on 4 june 1989 on Tianamen Square?".

Even on my phone via Edge Gallery Deepseek to Qwen 1.5B distill able to answer it. It's mess up facts a little, but certainly becauae its small model not because censorship.

I really unsure how it get less censored than this. API is obviously much more censored because they operate from China, but it have nothing to do with model itself.

baal80spam•2 days ago

Is this a serious comment? It honestly reads like the last famous words.

Of course there are risks.

accountofthaha•2 days ago

Does the 'zero CUDA dependency' also count for running it on my own device? I have an AMD card, older model. Would love to have a small version of this running for coding purposes.

Really nice to see the Chinese are competing this strongly with the rest of the world. Competition is always nice for the end-consumer.

adrian_b•2 days ago

The model is open weights, so you can download it from the link given at the top.

Then you can run it using some inference backend, e.g. llama.cpp, on any hardware supported by it.

However, this is a big model so even if you quantize it you need a lot of memory to be able to run it.

The alternative is to run it much more slowly, by storing the weights on an SSD. There have already been published some results about optimizing inference to work like this, and I expect that this will become more common in the future.

There are cases when running slowly a better model can still be preferable to running quickly a model that gives poor results, especially when you do not use it conversationally, but to do some work with agents.

d3Xt3r•2 days ago

> Also, note that there's zero CUDA dependency.

So does this mean I can run this on AMD? And on a consumer 9000 series card?

HarHarVeryFunny•2 days ago

If you don't have the source code then it makes no difference. If you have the weights and are running some model via llama.cpp, then you are using whatever API llama.cpp is using, not the API that was used to train the model or that anyone else may be using to serve it.

randomgermanguy•2 days ago

If you found a rare 9000 card with 200+ GB of VRAM, sure

Eisenstein•2 days ago

If the card supports vulkan and the model has gguf weights. llamacpp has excellent vulkan support that is being actively developed and is not that far behind CUDA where speed is concerned.

* https://github.com/ggml-org/llama.cpp/releases

frankdenbow•2 days ago

Jensen was saying this in that interview last week and the interviewer dismissed it.

melenaboija•2 days ago

The funniest thing is how Americans have been fooled with this stuff.

This version of AI is mostly taking a public paper from 2017, investing in GPUs, and feeding it as much data as possible. So with a few computer scientists, no respect for intellectual property, and tons of money to burn, you have all the ingredients to create this technology.

Sam Altman and friends did it, as did the Chinese. The difference is that the Americans have been hyping it up to the extreme with all these dramatic scenarios about what would happen if someone else got its hands on it.

The Chinese made it public, among other things to show how fragile this is as a business and as a large part of the US stock market

wookmaster•2 days ago

The response from US corporations has been banning Chinese models claiming they’re spying or something.

shimman•1 day ago

Yes, it's been widely well known that US corporations cannot compete fairly but require corporate welfare or US government to enforce military might over competitors.

pb7•1 day ago

>mostly taking a public paper from 2017

I love the implication that this paper just dropped out of thin air and not decades of private AI research funded by a US company.

>The Chinese made it public, among other things to show how fragile this is as a business

The Chinese distill US models, that's why they keep trailing close but never exceeding. It's easy to make things public when you didn't take on any of the cost of developing the technology. Stealing US IP and selling cheap copies has been China's MO for decades now.

wener•2 days ago

As a Chinese, I feel tiered, it's like the cold war, what is takes to keep competitive with every aspect, it's just another win for the country and the corp

scronkfinkle•2 days ago

> Also, note that there's zero CUDA dependency

Where did you read this? From what I read in the paper it appears to explicitly state that they used NVIDIA GPU's and their MegaMOE code, which is written in CUDA.

segmondy•2 days ago

My guess is Chinese govt is going to mandate that labs switch all future training and inference to Huawei. DeepSeek has shown it's possible. Once they are done, the rest of the world is going to be buying Huawei! I for one can't wait for a cheap Huawei GPU!

kitd•2 days ago

I can't find any info on what exactly is open sourced.

And in any case what does open source actually mean for an llm? It's not like you can look inside it to see what it's doing.

adrian_b•2 days ago

The model is not "open source", but it is an open weights model.

You can download it from the link given here at the top and you can run it on your own hardware, with whichever open-source harness you prefer, without having to worry about token cost or about subscription limits or about any future degradation in performance that you cannot control.

The recent history has demonstrated that such risks are very significant.

Being open weights is important for anyone who wants to use an LLM. Being open source is important only for a subset of those, who have the will, the knowledge and the means to train a model from its training data.

Having access to the training data used by a model would be very nice, but the reality is that for a normal LLM user it is very beneficial to use an open-weights model with an open-source harness, but it would be much harder to exploit the advantage of having access to all the information about how the LLM has been created.

gommm•2 days ago

For me open source means that the entire training data is open sourced as well as the code used for training it otherwise it's open weight. You can run it where you like but it's a black box. Nomic's models are good example of opensource.

adammarples•2 days ago

Yes the weights are basically compiled code, compiled from the source data and the training code.

SkyBelow•1 day ago

Even with all training data provided, won't it still be a black box? Unless one trains it exactly the same, in the exact same order for each piece of data, potentially requiring the exact same hardware with specific optimizations disabled due to race conditions, etc., the final weights will be different, and so knowing if the original weights actually contain anything extra still leaves any released weights as a black box, no? There isn't an equivalent of reproducible builds for LLM weights, even if all of this was provided, right?

verdverm•2 days ago

Look up Olmo 3, where the have open weights, checkpoints, training data, and training process.

AllenAi is the fullest open ai I know of

nailer•2 days ago

It's also not fake open source like Metas models - https://huggingface.co/deepseek-ai/DeepSeek-R1-0528, the weights are actually under a real open source license, (MIT), see https://huggingface.co/deepseek-ai/DeepSeek-R1-0528

slekker•2 days ago

But remember to not ask about Taiwan!

tigrezno•2 days ago

you talk like there isn't censorship in american AIs, like Israel topics.

unclejuan•2 days ago

To be fair I prefer the Chinese models censorship (yes, seriously) because if you ask certain topics they just don't answer instead of giving skewed answers.

swingboy•2 days ago

Ask a Chinese model about Taiwan, get denied. Ask an American model about Israel, get citizenship revoked and deported.

spaceman_2020•2 days ago

I can't wait for Taiwan to peacefully reunify with the mainland so the west with its constant war waging won't even have this talking point

wallst07•2 days ago

Are you Taiwanese? If not, your statement is a slap in the face to those citizens.

eunos•2 days ago

> China asks other country not to meddle with internal separatism > They also dont support separatism in my country

Understandable.

Lionga•2 days ago

Quit a bit better then made to bomb little girl schools in Iran.

spiderfarmer•2 days ago

Just ask it for a summary of the USA’s role in Iran, Gaza, Lebanon and its recent threats against Panama, Cuba and Greenland! It might be able to keep track.

teiferer•2 days ago

Does all this insane behavior from the US justify the Chinese censorship?

libertine•2 days ago

Are you implying that western models were manipulated to hide and distort those events, like they do with the Tiananmen Square event, and Taiwan?

Markoff•2 days ago

pretty sure you can ask whatever you want and it will tell you official stance agreed by almost all countries in the world that Taiwan is part of China as it's recognized by your own country (I don't even know where are you from, but there is like 98% chance I'm right)

nsoonhui•2 days ago

Sorry, but exactly where did you get the idea that DS V4 runs entirely on Huawei?

I asked DS itself and it denied this. It says: 'Nvidia chips are absolutely used for DeepSeek V4. The reality is a pragmatic "both-and" strategy, not an "either-or."'

And based on the DS V4 technical report (https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...), it is mentioned that:

  We validated the fine-grained EP scheme on both NVIDIA GPUs and HUAWEI Ascend NPUs platforms. Compared against strong non-fused baselines, it achieves 1.50 ~ 1.73× speedup for general inference workloads, and up to 1.96× for latency-sensitive scenarios such as RL rollouts and high-speed agent serving.

(In all honesty I relied on DS to give me the above, so I haven't vetted the information in full.)

It mentions that Nvidia is still used. It doesn't even mention that Huawei chips are used in production — only in testing and validation, yes.

taytus•2 days ago

>I asked DS itself and it denied this

Bro, seriously?

fblp•2 days ago

There's something heartwarming about the developer docs being released before the flashy press release.

taurath•2 days ago

Their audience is people who build stuff, techs audience is enterprise CEOs and politicians, and anyone else happy to hype up all the questionably timed releases and warnings of danger, white collar irrelevence, or promises of utopian paradise right before a funding round.

arrty88•2 days ago

now that we can use AI to write the docs , test the docs , proof read the docs , it really isn't that much of a feat right?

SV_BubbleTime•about 18 hours ago

If those docs were written by Deepseek, it’s also a pretty positive review of the model.

onchainintel•2 days ago

Insert obligatory "this is the way" Mando scene. Indeed!

necovek•2 days ago

Where's the training data and training scripts since you are calling this open source?

Edit: it seems "open source" was edited out of the parent comment.

b65e8bee43c2ed0•2 days ago

doesn't it get tiring after a while? using the same (perceived) gotcha, over and over again, for three years now?

no one is ever going to release their training data because it contains every copyrighted work in existence. everyone, even the hecking-wholesome safety-first Anthropic, is using copyrighted data without permission to train their models. there you go.

necovek•2 days ago

There is an easy fix already in widespread use: "open weights".

It is very much a valuable thing already, no need to taint it with wrong promise.

Though I disagree about being used if it was indeed open source: I might not do it inside my home lab today, but at least Qwen and DeepSeek would use and build on what eg. Facebook was doing with Llama, and they might be pushing the open weights model frontier forward faster.

Tepix•2 days ago

Nvidia did with Nemo.

fragmede•2 days ago

it's not a gotcha but people using words in ways others don't like.

bl4ckneon•2 days ago

Aww yes, let me push a couple petabytes to my git repo for everyone to download...

necovek•2 days ago

An easier thing would be to say "open weights", yes.

woctordho•2 days ago

They are exactly open source. The training data is the internet. Don't say it's on the internet. It IS the internet.

The training scripts are in Megatron and vLLM.

0-_-0•2 days ago

Weights are the source, training data is the compiler.

injidup•2 days ago

You got it the wrong way round. It's more akin to.

1. Training data is the source. 2. Training is compilation/compression. 3. Weights are the compiled source akin to optimized assembly.

However it's an imperfect analogy on so many levels. Nitpick away.

sho•2 days ago

So, this is the version that's able to serve inference from Huawei chips, although it was still trained on nVidia. So unless I'm very much mistaken this is the biggest and best model yet served on (sort of) readily-available chinese-native tech. Performance and stability will be interesting to see; openrouter currently saying about 1.12s and 30tps, which isn't wonderful but it's day one after all.

For reference, the huawei Ascend 950 that this thing runs on is supposed to be roughly comparable to nVidia's H100 from 2022. In other words, things are hotting up in the GPU war!

alpineman•2 days ago

Can't see how NVIDA justifies its valuation/forward P/E ratio with these developments and on-device also becoming viable for 98% of people's needs when it comes to AI

aurareturn•2 days ago

On-device is incredibly far away from being viable. A $20 ChatGPT subscription beats the hell out of the 8B model that a $1,000 computer can run.

Nvidia's forward PE ratio is only 20 for 2026. That's much lower than companies like Walmart and Costco. It's also growing nearly 100% YoY and has a $1 trillion backlog.

I think Nvidia is cheap.

vibe42•2 days ago

I run both MoE and dense models on laptops.

One set of models run on 8GB VRAM / 16GB RAM and another set runs on 24GB VRAM / 64GB RAM. Both are very useful for easy and easy-to-moderate complex code, respectively.

The latest open, small models are incredibly useful even at smaller sizes when configured properly (quant size, sampling params, careful use of context etc).

2ndorderthought•2 days ago

8b models can run on laptops. Of course a 1.8T model is more capable, but for a lot of tasks it really isn't 1000x

midwain•2 days ago

This is an assessment of the moment. When rate of AI data center construction slows down, then P/E will start to grow. Or are we saying that the pace will only grow forever? There are already signs of a slowdown in construction.

littlestymaar•2 days ago

> On-device is incredibly far away from being viable. A $20 ChatGPT subscription beats the hell out of the 8B model that a $1,000 computer can run.

That's a very strange comment. Why would anyone run a dense model on a low-end computer? A 8B model is only going to make sense if you have a dGPU. And a Qwen3.6 or Gemma4 MoE aren't going to be “beaten the hell out” for most tasks especially if you use tools.

Finally, over the lifetime of your computer, your ChatGPT subscription is going to cost more than the cost of your reference computer! So the real question should be whether you're better off with a $1000 computer and a ChatGPT subscription or with a $2000 computer (assuming a conservative lifetime of 4 years for the computer).

My Strix Halo desktop (which I paid ~1700€ before OpenAI derailed the RAM market) paired with Qwen3.5 is a close replacement for a $200/month subscription, so the cost/benefit ratio is strongly in favor of the local model in my use case.

The complexity of following model releases and installing things needed for self-hosting is a valid argument against local models, but it's absolutely not the same thing as saying that local models are too bad to use (which is complete BS).

alpineman•2 days ago

I think you overestimate what most people are doing with AI. A 2B model can give out relationship advice and tell you how long to boil an egg.

dannyw•2 days ago

I do think Nvidia isn't that badly priced; they still have the dominance in training and the proven execution

Biggest risk I see is Nvidia having delays / bad luck with R&D / meh generations for long enough to depress their growth projections; and then everything gets revalued.

npodbielski•2 days ago

Great! Can't wait to buy decent GPU for interference for <1k$

gbnwl•2 days ago

I’m deeply interested and invested in the field but I could really use a support group for people burnt out from trying to keep up with everything. I feel like we’ve already long since passed the point where we need AI to help us keep up with advancements in AI.

satvikpendem•2 days ago

Don't keep up. Much like with news, you'll know when you need to know, because someone else will tell you first.

vessenes•2 days ago

This is only good advice if you don’t have the need to understand what’s happening on the edge of the frontier. If you do, then you’ll lose on compounding the knowledge from staying engaged with the major developments.

satvikpendem•2 days ago

Not all developments are equal. Many are experimental branches of testing things out that usually get merged back into the core, so to speak. For example, I knew someone who was full into building their own harness and implementing the Ralph loop and various other things, spending a lot of time on it and now, guess what? All of that is in Claude Code or another harness and I didn't have to spend any amount of time on it because ultimately they're implementation details.

It's like ricing your Linux distro, sure it's fun to spend that time but don't make the mistake of thinking it's productive, it's just another form of procrastination (or perhaps a hobby to put it more charitably).

roughly•1 day ago

This one’s been particularly hard to sit out because the executive and managerial class are absolutely mainlining this stuff and pushing it hard on the rest of the organization, and so whether or not I want to keep up, I need to, because my job is to actually make stuff work and this stuff is a borderline existential risk to the quality of the systems I’m responsible for and rely on.

hnfong•1 day ago

Thus, in the situation you described, "someone else will tell you first" is your boss.

wordpad•2 days ago

The players barely ever change. People don't have problems following sports, you shouldn't struggle so much with this once you accept top spot changes.

gbnwl•2 days ago

I didn't express this well but my interest isn't "who is in the top spot", and is more _why and _how various labs get the results they do. This is also magnified by the fact that I'm not only interested in hosted providers of inference but local models as well. What's your take on the best model to run for coding on 24GB of VRAM locally after the last few weeks of releases? Which harness do you prefer? What quants do you think are best? To use your sports metaphor it's more than following the national leagues but also following college and even high school leagues as well. And the real interest isn't even who's doing well but WHY, at each level.

yorwba•2 days ago

The technical report discussing the why and how is here: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

renticulous•2 days ago

Follow the AI newsletters. They bundle the news along with their Op-Ed and summarize it better.

ehnto•2 days ago

It is funny seeing people ping pong between Anthropic and ChatGPT, with similar rhetoric in both directions.

At this point I would just pick the one who's "ethics" and user experience you prefer. The difference in performance between these releases has had no impact on the meaningful work one can do with them, unless perhaps they are on the fringes in some domain.

Personally I am trying out the open models cloud hosted, since I am not interested in being rug pulled by the big two providers. They have come a long way, and for all the work I actually trust to an LLM they seem to be sufficient.

dannyw•2 days ago

Their financial projections that to a big part their valuation and investor story is built on involves actually making money, and lots of money, at some point. That money has to come from somewhere.

DiscourseFan•2 days ago

I find ChatGPT annoying mostly

notatoad•1 day ago

I’m very satisfied with being three months behind everything in AI. That’s a level that’s useful, the overhyped nonsense gets found out before I need to care, and it’s easy enough to keep up with.

dnnddidiej•2 days ago

https://commoncog.com/how-to-make-sense-of-ai/

vrganj•2 days ago

It honestly has all kinda felt like more of the same ever since maybe GPT4?

New model comes out, has some nice benchmarks, but the subjective experience of actually using it stays the same. Nothing's really blown my mind since.

Feels like the field has stagnated to a point where only the enthusiasts care.

ifwinterco•2 days ago

For coding Opus 4.5 in q3 2025 was still the best model I've used.

Since then it's just been a cycle of the old model being progressively lobotomised and a "new" one coming out that if you're lucky might be as good as the OG Opus 4.5 for a couple of weeks.

Subjective but as far as I can tell no progress in almost a year, which is a lifetime in 2022-25 LLM timelines

_air•1 day ago

Opus 4.5 was released on Nov 24 last year. It’s only been 5 months!

dannyw•2 days ago

Another annoyance (for more API use) is summarized/hidden reasoning traces. It makes prompt debugging and optimization much harder, since you literally don't have much visibility into the real thinking process.

hnfong•1 day ago

I don't trust the benchmarks either, so I maintained a set of benchmarks myself. I'm mostly interested in local models, and for the past 2 years they have steadily gotten better.

Can't argue with subjective experience, but if there were some tasks that you thought LLMs can't do two years ago, maybe try again today. You might be surprised.

trueno•2 days ago

holy shit im right there with you

maxloh•1 day ago

They published model weights on Hugging Face. Both of them are MIT-licensed.

DeepSeek-V4-Flash: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash

DeepSeek-V4-Pro: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

primaprashant•2 days ago

While SWE-bench Verified is not a perfect benchmark for coding, AFAIK, this is the first open-weights model that has crossed the threshold of 80% score on this by scoring 80.6%.

Back in Nov 2025, Opus 4.5 (80.9%) was the first proprietary model to do so.

stared•2 days ago

SWE-bench Verified is, at this point, contaminated https://openai.com/index/why-we-no-longer-evaluate-swe-bench...

So it os hard to tell how much of a model gain is due to skill, and how much - overfitting.

seanobannon•2 days ago

Weights available here: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

BoorishBears•2 days ago

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-Base https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-Base

And we got new base models, wonderful, truly wonderful

yanis_t•2 days ago

Already on Openrouter. Pro version is $1.74/m/input, $3.48/m/output, while flash $0.14/m/input, 0.28/m/output.

nl•2 days ago

The Pro model is giving 429 Overload errors

XCSme•2 days ago

Yup, can't really be used in production atm.

esafak•2 days ago

https://openrouter.ai/deepseek/deepseek-v4-pro

https://openrouter.ai/deepseek/deepseek-v4-flash

77ko•2 days ago

Its on OR - but currently not available on their anthropic endpoint. OR if you read this, pls enable it there! I am using kimi-2.6 with Claude Code, works well, but Deepseek V4 gives an error:

`https://openrouter.ai/api/messages with model=deepseek/deepseek-v4-pro, OR returns an error because their Anthropic-compat translator doesn't cover V4 yet. The Claude CLI dutifully surfaces that error as "model...does not exist"

astrod•2 days ago

Getting 'Api Error' here :( Every other model is working fine.

poglet•2 days ago

Try interacting with it through the website, it will give an error and some explanation on the issue. I had to relax my guardrail settings.

vinhnx•2 days ago

The king is back! I remember vividly being very amazed and having a deep appreciation reading DeepSeek's reasoning on Chat.DeepSeek.com, even before the DeepSeek moment in January later that year. I can't quite remember the date, but it's the most profound moment I have ever had. After OpenAI O1, no other model has “reasoning” capability yet. And DeepSeek opens the full trace for us. Seeing DeepSeek's “wait, aha…” moments is something hard to describe. I learned strategy and reasoning skills for myself also. I am always rooting for them.

buenolot•2 days ago

Instead of King DeepSeek we got DeepShit Clown

mchusma•2 days ago

For comparison on openrouter DeepSeek v4 Flash is slightly cheaper than Gemma 4 31b, more expensive than Gemma 4 26b, but it does support prompt caching, which means for some applications it will be the cheapest. Excited to see how it compares with Gemma 4.

MillionOClock•2 days ago

I wonder why there aren't more open weights model with support for prompt caching on OpenRouter.

mzl•2 days ago

It is tricky to build good infrastructure for prompt caching.

jatora•2 days ago

Its as simple as telling your claude code to implement prompt caching!

sidcool•2 days ago

Truly open source coming from China. This is heartwarming. I know if the potential ulterior motives.

b65e8bee43c2ed0•2 days ago

American companies want a scan of your asshole for the privilege of paying to access their models, and unapologetically admit to storing, analyzing, training on, and freely giving your data to any authorities if requested. Chinese ulteriority is hypothetical, American is blatant.

elefanten•2 days ago

It’s not remotely hypothetical you’d have to be living under a rock to believe that. And the fusion with a one-party state government that doesn’t tolerate huge swathes of thoughtspace being freely discussed is completely streamlined, not mediated by any guardrails or accountability.

This “no harm to me” meme about a foreign totalitarian government (with plenty of incentive to run influence ops on foreigners) hoovering your data is just so mind-bogglingly naive.

ben_w•2 days ago

As a non-American, everything you wrote other than "one party" applies to the current US regime.

Relatively speaking, DeepSeek is less untrustworthy than Grok.

When I try ChatGPT on current events from the White House it interprets them as strange hypotheticals rather than news, which is probably more a problem with DC than with GPT, but whatever.

oceanplexian•2 days ago

> And the fusion with a one-party state government that doesn’t tolerate huge swathes of thoughtspace being freely discussed

That would be a great argument if the American models weren’t so heavily censored.

The Chinese model might dodge a question if I ask it about 1-2 specific Chinese cultural issues but then it also doesn’t moralize me at every turn because I asked it to use a piece of security software.

randomNumber7•2 days ago

The USA has one of the highest percentages of their population in prison.

Even for minor stuff like beeing addicted to drugs.

Looks pretty totalitarian to me.

b65e8bee43c2ed0•2 days ago

>This “no harm to me” meme about a foreign totalitarian government (with plenty of incentive to run influence ops on foreigners) hoovering your data is just so mind-bogglingly naive.

yes, this is exactly what I'm saying.

danny_codes•2 days ago

It’s an open model? So you can run it yourself if you want to

theshackleford•2 days ago

> This “no harm to me” meme about a foreign totalitarian government (with plenty of incentive to run influence ops on foreigners) hoovering your data is just so mind-bogglingly naive.

This is why I’ve been urging everyone I know to move away from American based services and providers. It’s slow but honest work.

casey2•2 days ago

Thousands of years with no invasions, hundreds of years with thousands of invasions.

China is a nation built for peace, while western nations are built for war.

michaelt•2 days ago

The oppression of people in China like Uyghurs and Hong Kong, the complete lack of free speech, the saber-rattling at neighbours, and the lack of respect for intellectual property are indeed all well documented.

But for folks on the opposite side of the world, the threats are more like "they're selling us electric cars and solar panels too cheaply" and the hypothetical "these super cheap CCTV cameras could be used for remote spying"

t0lo•2 days ago

And you're saying Americans aren't banned from criticising their elites?

mwigdahl•2 days ago

I, personally, have never been asked for an asshole scan, but I'm interested in providing one if you can point me to a company that's offering.

simplesocieties•1 day ago

It's clear the OC was using hyperbole but we're honestly not too far off. Just a few examples:

- Sam Altman & Worldcoin collecting everyone's eyeball scan - Discord attempting to roll out worldwide age & id verification - LinkedIn collecting data on your web browser extensions - WhatsApp collecting browser data via a local server running on device

93po•1 day ago

GoatseAI - the type of open that OpenAI should have been from the start

thesmtsolver2•2 days ago

As someone with Tibetan friends and as someone from India, Chinese ulterior motives are way more clear.

mordae•2 days ago

Same as USA. Happy to see some competition.

Quothling•2 days ago

It's a little sad that tech now comes down to geopolitics, but if you're not in the USA then what is the difference? I'm Danish, would I rather give my data to China or to a country which recently threatened the kingdom I live in with military invasion? Ideally I'd give them to Mistral, but in reality we're probably going to continue building multi-model tools to make sure we share our data with everyone equally.

jatora•2 days ago

Lol EU pats you on the head

Its sad to see how you have regulated yourselves into a position where Mistral is your only claim.

spaceman_2020•2 days ago

I don’t care about whatever “ulterior motives” they might have

My country’s per capita income is $2500 a year. We can’t pay perpetual rent to OAI/Anthropic

djyde•2 days ago

Same

try-working•2 days ago

if you want to understand why labs open source their models: http://try.works/why-chinese-ai-labs-went-open-and-will-rema...

wraptile•2 days ago

> Internet comments say that open sourcing is a national strategy, a loss maker subsidized by the government. On the contrary, it is a commercial strategy and the best strategy available in this industry.

This sounds whole lot like potatoh potahto. I think the former argument is very much the correct one: China can undercut everyone and win, even at a loss. Happened with solar panels, steel, evs, sea food - it's a well tested strategy and it works really well despite the many flavors it comes in.

That being said a job well done for the wrong reasons is still a job well done so we should very much welcome these contributions, and maybe it's good to upset western big tech a bit so it's remains competitive.

try-working•2 days ago

It is not only that Chinese labs can undercut on price. It is that they must. They must give away their models for free by open sourcing them, and they must even give away free inference services for people to try them. That is the point of the post.

I_am_tiberius•2 days ago

Open weight!

alecco•2 days ago

Please don't slander the most open AI company in the world. Even more open than some non-profit labs from universities. DeepSeek is famous for publishing everything. They might take a bit to publish source code but it's almost always there. And their papers are extremely pro-social to help the broader open AI community. This is why they struggle getting funded because investors hate openness. And in China they struggle against the political and hiring power of the big tech companies.

Just this week they published a serious foundational library for LLMs https://github.com/deepseek-ai/TileKernels

Others worth mentioning:

https://github.com/deepseek-ai/DeepGEMM a competitive foundational library

https://github.com/deepseek-ai/Engram

https://github.com/deepseek-ai/DeepSeek-V3

https://github.com/deepseek-ai/DeepSeek-R1

https://github.com/deepseek-ai/DeepSeek-OCR-2

They have 33 repos and counting: https://github.com/orgs/deepseek-ai/repositories?type=all

And DeepSeek often has very cool new approaches to AI copied by the rest. Many others copied their tech. And some of those have 10x or 100x the GPU training budget and that's their moat to stay competitive.

The models from Chinese Big Tech and some of the small ones are open weights only. (and allegedly benchmaxxed) (see https://xcancel.com/N8Programs/status/2044408755790508113). Not the same.

patshead•2 days ago

DeepSeek's models are indeed open weight. Why do you feel that pointing this out would be considered slander?

kortilla•2 days ago

It’s not slander to say something true. These are open weights, not open source. They don’t provide the training data or the methodology requires to reproduce these weights.

So you can’t see what facts are pruned out, what biases were applied, etc. Even more importantly, you can’t make a slightly improved version.

This model is as open source as a windows XP installation ISO.

0-_-0•2 days ago

Weights are the source, training data is the compiler

crazylogger•2 days ago

Training data == source code, training algorithm == compiler, model weights == compiled binary.

ngruhn•2 days ago

isn't it more like the data is the source, the training process is the compiler, and the weights are the binary output.

zerr•2 days ago

Do they also open-source censoring filter rules? Like, you can't ask what happened at Tiananmen Square in 1989.

harladsinsteden•2 days ago

> I know if the potential ulterior motives.

And you think the US tech giants don't have any ulterior motives?!

FuckButtons•2 days ago

I think their motives are pretty transparent, as are china’s, as ever, you have to pick the lesser of two evils.

neonstatic•1 day ago

How are the "ulterior motives" of Chinese companies any worse than "ulterior motives" of US companies or European ones?

yanis_t•2 days ago

Assuming it is almost as good as Opus 4.6 (which benchmarks seem to give evidence for), and assuming we are having a good enough harness (PI, OpenCode), it's is now more than 5x cheaper.

I just want to remind you that this is happening at the same time as Anthropic A/B tests removal of Code from Pro Plan, and as OpenAI releases gpt-5.5 2x more expensive than gpt-5.4...

stingraycharles•2 days ago

> Assuming it is almost as good as Opus 4.6 (which benchmarks seem to give evidence for)

That’s a big if. It’s my experience that models that perform very well on benchmarks do not necessarily perform well in real life.

I’ve mostly started ignoring the benchmarks and run my own evals.

ting0•1 day ago

> It’s my experience that models that perform very well on benchmarks do not necessarily perform well in real life

Well, yeah... Like Opus 4.5, 4.6, 4.7. Top of the benchmarks and yet it's a pile of crap at the moment and has been for months.

jatora•2 days ago

If benchmarks are all to be believed then gemini 3.1 and grok 4.2 are still in the lead pack. A laughable notion to anyone who has actually tried to use them and compared.

LZ_Khan•1 day ago

It's easy to praise Deepseek for its results and generosity -- how they can keep up with frontier labs on Huawei chips for a fraction of the cost! -- but let's not forget a big part of their toolkit is heavy distillation of SoTA.

copypaper•1 day ago

Let's also not forget SoTA models stole from us.

gordonhart•1 day ago

True, and they're being tried in a federal court of law for it. NYT v. OpenAI is still very much alive, these things just take a while. Can the same be said about DeepSeek or any other open-source model provider performing distillation?

copypaper•1 day ago

Pandora's box has already been opened and there is no going back. I doubt OpenAI, et al will get anything but a slap on the wrist in court because punishing AI companies would have a negative effect on the US economy.

>Can the same be said about DeepSeek or any other open-source model provider performing distillation?

Open source models that distill from SoTA reminds me of the story of Robin Hood -- robbing the rich and giving it to the poor. So to answer your question: yes, but it's better than the alternative where only a select few companies have SoTA models.

riskd•1 day ago

You already know what the results of this “trial” will be. Let’s not pretend.

paweladamczuk•1 day ago

>these thing just take a while

Oh, so people might be forced to give back the AI earnings? Should I be worried about the last year's capital gains on my portfolio?

vatsachak•1 day ago

Literally.

Altman and Amodei are so mad about muhh model when they steal our data and pollute the Internet with slop.

93po•1 day ago

let's not forget that calling copyright infringement theft is hyperbole, and the claim that AI is even infringing is also dubious at best, and that the concept of intellectual property at all is also ethically dubious

seydor•about 11 hours ago

All AI software is built on open source. They are just giving back what they should

MiSeRyDeee•1 day ago

So they distill the sota model where OAI/Anthropic illegally stole from public, and open weights to us or sell their API at 1/50th of the price? I'd say keep up the good work and distill more!

hamdingers•1 day ago

I could not possibly care less if I tried. Every LLM is a distillation of something else.

orbital-decay•1 day ago

What's the evidence?

slopinthebag•1 day ago

Who cares? Also Anthropic does the same thing - if you ask it who it is in Chinese it says it's DeepSeek LOL

https://x.com/teortaxesTex/status/2026130112685416881

amunozo•2 days ago

For those who rely on open source models but don't want to stop using frontier models, how do you manage it? Do you pay any of the Chinese subscription plans? Do you pay the API directly? After GPT 5.5 release, however good it is, I am a bit tired of this price hiking and reduced quota every week. I am now unemployed and cannot afford more expensive plans for the moment.

azuanrb•2 days ago

I have $20 ChatGPT subscription. Stopped Anthropic $20 subscription since the limit ran out too fast. That's my frontier model(s).

For OSS model, I have z.ai yearly subscription during the promo. But it's a lot more expensive now. The model is good imo, and just need to find the right providers. There are a lot of alternatives now. Like I saw some good reviews regarding ollama cloud.

amunozo•2 days ago

I am thinking about getting some 1 year promotion as a student before defending my PhD.

regularfry•2 days ago

I've been on Kimi K2.5 on openrouter for a couple of months for anything I can't run locally. Really is dirt cheap for how good it is. Haven't assessed K2.6 yet but the price is higher so it needs to be more efficient, not just more capable.

But more broadly: openrouter solves the problem of making a broad range of models available with a single payment endpoint, so you can just switch around as much as you like.

eleventen•about 19 hours ago

How do you find the token speed of open router with kimi?

I have tasks that used to take ~3-5min with Sonnet 4.6. With OpenRouter Kimi, the same task takes 10+ min. It's also just obviously slower in opencode sessions. The results are good, and I love the lower cost, but the speed can be frustrating.

the_gipsy•2 days ago

Have you considered... not subscribing? You can ask the top models via chats for specific stuff, and then set up some free CLI like mistral.

If you're trying to make a buck while unemployed, sure get a subscription. Otherwise learn how to work again without AI, just focus on the interesting stuff.

amunozo•2 days ago

I just want to try to make something useful out of my time, that's why I'm subscribed to Codex at the moment. 20€ is affordable, not really a problem. But yes, maybe I would do me a favor unsubscribing and going back to the old ways to learn properly.

the_gipsy•2 days ago

I'm "working" on some open source stuff with minimal AI. But I will probably cave in at some point and get a subscription again, the moment I need to spin up a mountain of garbage, fast.

solarkraft•2 days ago

At home I currently use MiniMax via OpenRouter - it’s pretty good and very cheap. They have a subscription plan, but I’m not ready to commit to it yet.

Another way to keep the ability to try out new models is to buy a reseller subscription like Cursor’s.

amunozo•2 days ago

I tried OpenRouter but I feel the money flies even with these models, it is not comparable to a subscription but yes, it's very good for trying. Maybe I should test other models alongside GPT 5.5 to see which one fits me.

elbear•2 days ago

I'm also unemployed. So far the models that I've used the most are Kimi and GLM. I haven't done that much agentic coding though, I've mostly used them for studying math and general conversations and I'm generally happy with their performance.

never_inline•1 day ago

Gemini has a free tier for API but yeah just use chat.

cmrdporcupine•2 days ago

For DeepSeek you can use their API and if you ran it constantly you'd still be under what OpenAI or Anthropic charge for a coding plan.

anentropic•2 days ago

I had Claude make me a quick tool to combine my Claude Code token usage (via ccusage util) with OpenRouter pricing from the models API

I'm on Max x5 plan and any of the 'good' models like Kimi 2.6, GLM, DeepSeek would have cost 3-5x in per-token billing for what I used on my Claude plan the last three months

So unless my Claude fudged the maths to make itself look better, seems like I'm getting a good deal

amunozo•2 days ago

I am not so sure, credits fly when using any model trough API if I use it as much as I use Codex.

zargon•2 days ago

The Flash version is 284B A13B in mixed FP8 / FP4 and the full native precision weights total approximately 154 GB. KV cache is said to take 10% as much space as V3. This looks very accessible for people running "large" local models. It's a nice follow up to the Gemma 4 and Qwen3.5 small local models.

sbinnee•2 days ago

Price is appealing to me. I have been using gemini 3 flash mainly for chat. I may give it a try.

input: $0.14/$0.28 (whereas gemini $0.5/$3)

Does anyone know why output prices have such a big gap?

girvo•2 days ago

Output is what the compute is used for above all else; costs more hardware time basically than prompt processing (input) which is a lot faster

tokenmaxxinej•2 days ago

input tokens are processed at 10-50 times the speed of output tokens since you can process then in batches and not one at a time like output tokens

regularfry•2 days ago

I'm going to blow my bandwidth allowance again this month, aren't I.

nthypes•2 days ago

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

Model was released and it's amazing. Frontier level (better than Opus 4.6) at a fraction of the cost.

0xbadcafebee•2 days ago

I don't think we need to compare models to Opus anymore. Opus users don't care about other models, as they're convinced Opus will be better forever. And non-Opus users don't want the expense, lock-in or limits.

As a non-Opus user, I'll continue to use the cheapest fastest models that get my job done, which (for me anyway) is still MiniMax M2.5. I occasionally try a newer, more expensive model, and I get the same results. I have a feeling we might all be getting swindled by the whole AI industry with benchmarks that just make it look like everything's improving.

versteegen•2 days ago

Which model's best depends on how you use it. There's a huge difference in behaviour between Claude and GPT and other models which makes some poor substitutes for others in certain use cases. I think the GPT models are a bad substitute for Claude ones for tasks such as pair-programming (where you want to see the CoT and have immediate responses) and writing code that you actually want to read and edit yourself, as opposed to just letting GPT run in the background to produce working code that you won't inspect. Yes, GPT 5.4 is cheap and brilliant but very black-box and often very slow IME. GPT-5.4 still seems to behave the same as 5.1, which includes problems like: doesn't show useful thoughts, can think for half an hour, says "Preparing the patch now" then thinks for another 20 min, gives no impression of what it's doing, reads microscopic parts of source files and misses context, will do anything to pass the tests including patching libraries...

ind-igo•2 days ago

Agree with your assessment, I think after models reached around Opus 4.5 level, its been almost indistinguishable for most tasks. Intelligence has been commoditized, what's important now is the workflows, prompting, and context management. And that is unique to each model.

vidarh•2 days ago

Same for me. There are tasks when I want the smartest model. But for a whole lot of tasks I now default to Sonnet, or go with cheaper models like GLM, Kimi, Qwen. DeepSeek hasn't been in the mix for a while because their previous model had started lagging, but will definitely test this one again.

The tricky part is that the "number of tokens to good result" does absolutely vary, and you need a decent harness to make it work without too much manual intervention, so figuring out which model is most cost-effective for which tasks is becoming increasingly hard, but several are cost-effective enough.

wuschel•2 days ago

This is not true for some cases e.g. there are stark differences in the correctness of answers in certain type of case work.

spaceman_2020•2 days ago

I found Opus 4.7 to be actually worse than Opus 4.6 for my use case

Substantially worse at following instructions and overoptimized for maximizing token usage

sandos•2 days ago

Is Opus nerfed somehow in Copilot? Ive tried it numerous times, it has never reallt woved me. They seem to have awfully small context windows, but still. Its mostly their reasoning which has been off

Codex is just so much better, or the genera GPT models.

specproc•2 days ago

Opus just got killed in Copilot. I always found it great, FWIW.

https://github.blog/news-insights/company-news/changes-to-gi...

kmarc•2 days ago

This resonates with me a lot.

I do some stuff with gemini flash and Aider, but mostly because I want to avoid locking myself into a walled garden of models, UIs and company

post-it•2 days ago

What do you run these on? I've gotten comfortable with Claude but if folks are getting Opus performance for cheaper I'll switch.

oceanplexian•2 days ago

You can just use Claude Code with a few env vars, most of these providers offer an Anthropic compatible API

slopinthebag•2 days ago

Try Charm Crush first, it's a native binary. If it's unbearable, try opencode, just with the knowledge your system will probably be pwned soon since it's JS + NPM + vibe coding + some of the most insufferable devs in the industry behind that product.

If you're feeling frisky, Zed has a decent agent harness and a very good editor.

avereveard•2 days ago

eh idk. until yesterday opus was the one that got spatial reasoning right (had to do some head pose stuff, neither glm 5.1 nor codex 5.3 could "get" it) and codex 5.3 was my champion at making UX work.

So while I agree mixed model is the way to go, opus is still my workhorse.

gunalx•2 days ago

I find gemini pretty good ob spatial reasoning.

szundi•2 days ago

I don’t know what people are doing but Minimax produced 16 bugreports which of 15 was false positives (literally a mistake).

In contrast ChatGPT 5.3 and also Opus has a 90% rate at least on this same project. (Embedded)

All other tests were the same. What are you doing with these models?

sandGorgon•2 days ago

actually this is not the reason - the harness is significantly better. There is no comparable harness to Claude Code with skills, etc.

Opencode was getting there, but it seems the founders lost interest. Pi could be it, but its very focused on OpenClaw. Even Codex cli doesnt have all of it.

which harness works well with Deepseek v4 ?

darkwater•2 days ago

What's the issue with OC? I tried it a bit over 2 months ago, when I was still on Claude API, and it actually liked more that CC (i.e. the right sidebar with the plan and a tendency at asking less "security" questions that CC). Why is it so bad nowadays?

onchainintel•2 days ago

How does it compare to Opus 4.7? I've been immersed in 4.7 all week participating in the Anthropic Opus 4.7 hackathon and it's pretty impressive even if it's ravenous from a token perspective compared to 4.6

greenknight•2 days ago

The thing is, it doesnt need to beat 4.7. it just needs to do somewhat well against it.

This is free... as in you can download it, run it on your systems and finetune it to be the way you want it to be.

libraryofbabel•2 days ago

> you can download it, run it on your systems

In theory, sure, but as other have pointed out you need to spend half a million on GPUs just to get enough VRAM to fit a single instance of the model. And you’d better make sure your use case makes full 24/7 use of all that rapidly-depreciating hardware you just spent all your money on, otherwise your actual cost per token will be much higher than you think.

In practice you will get better value from just buying tokens from a third party whose business is hosting open weight models as efficiently as possible and who make full use of their hardware. Even with the small margin they charge on top you will still come out ahead.

p1esk•2 days ago

Do you think a lot of people have “systems” to run a 1.6T model?

onchainintel•2 days ago

Completely agree, not suggesting it needs ot just genuinely curious. Love that it can be run locally though. Open source LLMs punching back pretty hard against proprietary ones in the cloud lately in terms of performance.

kelseyfrog•2 days ago

What's the hardware cost to running it?

johnmaguire•2 days ago

... if you have 800 GB of VRAM free.

spaceman_2020•2 days ago

Tbh I was more productive with 4.6 than ever before and if AI progress locks in permanently at 4.6 tier, I’d be pretty happy

rvz•2 days ago

It is more than good enough and has effectively caught up with Opus 4.6 and GPT 5.4 according to the benchmarks.

It's about 2 months behind GPT 5.5 and Opus 4.7.

As long as it is cheap to run for the hosting providers and it is frontier level, it is a very competitive model and impressive against the others. I give it 2 years maximum for consumer hardware to run models that are 500B - 800B quantized on their machines.

It should be obvious now why Anthropic really doesn't want you to run local models on your machine.

deaux•2 days ago

Vibes > Benchmarks. And it's all so task-specific. Gemini 3 has scored very well in benchmarks for very long but is poor at agentic usecases. A lot of people prefering Opus 4.6 to 4.7 for coding despite benchmarks, much more than I've seen before (4.5->4.6, 4->4.5).

Doesn't mean Deepseek v4 isn't great, just benchmarks alone aren't enough to tell.

snovv_crash•2 days ago

With the ability of the Qwen3.6 27B, I think in 2 years consumers will be running models of this capability on current hardware.

colordrops•2 days ago

What's going to change in 2 years that would allow users to run 500B-800B parameter models on consumer hardware?

creamyhorror•2 days ago

No, the Deepseek V4 paper itself says that DS-V4-Pro-Max is close to Opus 4.5 in their staff evaluations, not better than 4.6:

> In our internal evaluation, DeepSeek-V4-Pro-Max outperforms Claude Sonnet 4.5 and approaches the level of Opus 4.5.

doctoboggan•2 days ago

Is it honestly better than Opus 4.6 or just benchmaxxed? Have you done any coding with an agent harness using it?

If its coding abilities are better than Claude Code with Opus 4.6 then I will definitely be switching to this model.

bokkies•2 days ago

Apparently glm5.1 and qwen coder latest is as good as opus 4.6 on benchmarks. So I tried both seriously for a week (glm Pro using CC) and qwen using qwen companion. Thought I could save $80 a month. Unfortunately after 2 days I had switched back to Max. The speed (slower on both although qwen is much faster) and errors (stupid layout mistakes, inserting 2 footers then refusing to remove one, not seeing obvious problems in screenshots & major f-ups of functionality), not being able to view URLs properly, etc. I'll give deepseek a go but I suspect it will be similar. The model is only half the story. Also been testing gpt5.4 with codex and it is very almost as good as CC... better on long running tasks running in background. Not keen on ChatGPT codex 'personality' so will stick to CC for the most part.

madagang•2 days ago

Their Chinese announcement says that, based on internal employee testing, it is not as good as Opus 4.6 Thinking, but is slightly better than Opus 4.6 without Thinking enabled.

mchusma•2 days ago

I appreciate this, makes me trust it more than benchmarks.

ibic•2 days ago

In case people wonder where the announcement is (you can easily translate it via browser if you don't read Chinese): https://mp.weixin.qq.com/s/8bxXqS2R8Fx5-1TLDBiEDg

It's still a "preview" version atm.

deaux•2 days ago

That's super interesting, isn't Deepseek in China banned from using Anthropic models? Yet here they're comparing it in terms of internal employee testing.

anentropic•2 days ago

Who uses Opus without thinking though...?

NitpickLawyer•2 days ago

> (better than Opus 4.6)

There we go again :) It seems we have a release each day claiming that. What's weird is that even deepseek doesn't claim it's better than opus w/ thinking. No idea why you'd say that but anyway.

Dsv3 was a good model. Not benchmaxxed at all, it was pretty stable where it was. Did well on tasks that were ood for benchmarks, even if it was behind SotA.

This seems to be similar. Behind SotA, but not by much, and at a much lower price. The big one is being served (by ds themselves now, more providers will come and we'll see the median price) at 1.74$ in / 3.48$ out / 0.14$ cache. Really cheap for what it offers.

The small one is at 0.14$ in / 0.28$ out / 0.028$ cache, which is pretty much "too cheap to matter". This will be what people can run realistically "at home", and should be a contender for things like haiku/gemini-flash, if it can deliver at those levels.

slopinthebag•2 days ago

Anthropic fans would claim God itself is behind Opus by 3-6 months and then willingly be abused by Boris and one of his gaslighting tweets.

LMAO

NitpickLawyer•2 days ago

> Anthropic fans ...

I have no idea why you'd think that, but this is straight from their announcement here (https://mp.weixin.qq.com/s/8bxXqS2R8Fx5-1TLDBiEDg):

> According to evaluation feedback, its user experience is better than Sonnet 4.5, and its delivery quality is close to Opus 4.6's non-thinking mode, but there is still a certain gap compared to Opus 4.6's thinking mode.

This is the model creators saying it, not me.

bbor•2 days ago

For the curious, I did some napkin math on their posted benchmarks and it racks up 20.1 percentage point difference across the 20 metrics where both were scored, for an average improvement of about 2% (non-pp). I really can't decide if that's mind blowing or boring?

Claude4.6 was almost 10pp better at at answering questions from long contexts ("corpuses" in CorpusQA and "multiround conversations" in MRCR), while DSv4 was a staggering 14pp better at one math challenge (IMOAnswerBench) and 12pp better at basic Q&A (SimpleQA-Verified).

Quasimarion•2 days ago

FWIW it's also like 10x cheaper.

sergiotapia•2 days ago

The dragon awakes yet again!

kindkang2024•2 days ago

There appears a flight of dragons without heads. Good fortune.

That's literally what the I Ching calls "good fortune."

Competition, when no single dragon monopolizes the sky, brings fortune for all.

rapind•2 days ago

Pop?

chvid•2 days ago

The incredible arrogance and hybris of the American initiated tech war - it is just a beautiful thing to see it slowly fall apart.

The US-China contest aside - it is in the application layer llms will show their value. There the field, with llm commoditization and no clear monopolies, is wide open.

There was a point in time where it looked like llms would the domain of a single well guarded monopoly - that would have been a very dark world. Luckily we are not there now and there is plenty of grounds for optimism.

sigmoid10•2 days ago

Still not sure how I feel about China of all places to control the only alternative AI stack, but I guess it's better than leaving everything to the US alone. If China ever feels emboldened enough to go for Taiwan and the US descends into complete chaos, the rest of the world running on AI will be at the mercy of authoritarian regimes. At the very least you can be sure noone is in this for the good of the people anymore. This is about who will dominate the world of tomorrow. And China has officially thrown their hat in the ring.

Ladioss•2 days ago

I always find it an illuminating experience about the power of mass propaganda every time I see an American believe they somewhat have the moral high ground over China, despite starting a new war somewhere around the globe either for petrol or on behalf of Israel every six months.

rfrey•2 days ago

Many of us (worldwide, I'm not American) watched China massacre thousands of its own children at Tiananmen Square. The US is descending into totalitarianism, but it hasn't reached that level yet.

And China may have changed in some ways but there have been no signals it would not repeat that event if it thought circumstances warranted.

kiba•2 days ago

Just because America is doing bad things doesn't mean China is good, or vice versa.

stickfigure•2 days ago

The difference is that - at least in the last 50 years - the US starts wars with brutal dictatorships. Whereas China is threatening war against a thriving democracy.

These are not equivalent.

scyclow•1 day ago

As an American, I can conclusively say that we absolutely have no moral high ground whatsoever. But bringing the topic back to LLMs, I don't feel great about using an LLM that has a panic attack any time I ask about Tiananmen Square or Taiwanese sovereignty.

vsgherzi•2 days ago

Not about moral high ground. Ones a democracy one isn’t.

foobiekr•1 day ago

There’s no organ harvesting of a religious subgroup in America.

srj•1 day ago

On the contrary, I find reading your own confused spin on morality here an interesting window into the effectiveness of propaganda. You're taking two oppressive authoritarian governments and elevating them above the US.

wiseowise•2 days ago

What makes you think they’re American?

nipponese•2 days ago

The U.S. is not the country conducting amoral behavior with terrorist regimes for oil, that’s China.

We conduct amoral behavior with terrorist regimes for dollars.

Scroll_Swe•1 day ago

I always find the China glazing online getting worse and worse.

TikTok and Hasan has really turned the West against itself.

philipallstar•2 days ago

China having killed up to 50m of its own population in the 20th century through socialism, while America led the world in funding NATO, global scientific research, and global aid for decades buys America a lot of good grace.

glenstein•1 day ago

And by contrast what I find stunning is the inability to engage in meaningful comparative analysis of relative harms. There's a lot of spectacularly insightful attention to detail in so far as it mobilizes what aboutism arguments and then that attention mysteriously falls away when we ask questions like the extent to which these sides allow free press or democratic elections with multiple parties or permit fair trials. You used to not have to explain these things.

mrkramer•2 days ago

All empires are to some degree evil because their agenda is to dominate weaker peoples and nations. They almost all committed crimes against humanity and genocides if you look retrospectively from the todays point of view. Even our beloved Roman Empire that the Western civilization is built upon was genocidal empire.

Rover222•1 day ago

Chinese citizens will go to jail if they are too critical of their own government. How hard is it for you to wrap your head around those implications?

SmirkingRevenge•1 day ago

The moral high ground claims here can be generalized:

Liberal democracies have moral high ground over authoritarian dictatorships (at least along that one dimension)

The US is backsliding tragically (and stupidly) and may lose that moral high ground, but the rest of the western democracies will still have it

nailer•2 days ago

> I see an American believe they somewhat have the moral high ground over China

The elected government of the US has the moral highground of over the regime that killed the KMT in it's weakened state after the KMT defeated Japan, went on a rampage against the educated classes, mowed down its own people with machineguns and tanks when they demanded a say in their own governments, and kidnaps people advocating for democracy to this day, including Jack Ma.

> despite starting a new war... on behalf of Israel every six months.

The war started when Hamas, funded by Iran, went on a murder and rape rampage against Israeli civilians.

OCASMv2•2 days ago

The Uyghur say hi.

hersko•2 days ago

Talks about "mass propaganda."

Thinks America is starting wars on behalf of Israel.

LMAO

Der_Einzige•2 days ago

One province of China has enough hellish nightmarish bullshit going on caused by the CCP that we maintain total moral superiority over them. It’s not even a question to anyone except “fellow travelers”.

annexrichmond•1 day ago

So you think the US should sit back and watch Iran develop nukes? Is that the “moral” thing to do?

chmod775•2 days ago

> Still not sure how I feel about China of all places to control the only alternative AI stack, but I guess it's better than leaving everything to the US alone.

Fully agree. From a US perspective, that sucks. For everyone else it's pretty great.

At this point the world's opinions of China are better than those of the US in some polls. One country invests and helps build infrastructure on a massive scale globally, the other alienates allies, causes countless conflicts, and openly threatens to end civilizations.

Indeed, even if one isn't partial to China, there's reasons to be glad that an increasingly hostile US has powerful competition.

> This is about who will dominate the world of tomorrow.

For this you'd need a technological moat. So far the forerunners have burned a lot of money with no moat in sight. Right now Europe is happy just contributing on research and doing the bare-minimum to maintain the know-how. Building a frontier model would be lobbing money into the incinerator for something that will be outdated tomorrow. European investors are too careful for that - and in this case seem to be right.

tensor•1 day ago

> Indeed, even if one isn't partial to China, there's reasons to be glad that an increasingly hostile US has powerful competition.

This is how I see it. The US has openly threatened multiple times to annex my country, and has repeatedly threatened every western nation. Letting the US have a monopoly on... well.. anything, is really bad for the world. The more countries that have their own production for various critical things like computer chips, medicine, etc, the better it is for the world at it distributes power.

People in the US don't seem to understand that with the current administration the US is seen as a potentially very hostile nation. While I don't think China is a friend to Canada or the west, at least it provides alternatives when the US tries to use it's monopolies against us. And vice versa too.

>Building a frontier model would be lobbing money into the incinerator for something that will be outdated tomorrow. European investors are too careful for that - and in this case seem to be right.

Strong disagree here. Mistral does great work, in the long term being a few months or even a year behind is a non-issue. Also Cohere just merged with Aleph Alpha to continue producing foundational models. It's extremely important that the middle powers continue to do this.

benterix•2 days ago

Yeah it's confusing. I mean China has work camps for Uighurs and is very brutal on Tibetans etc. OTOH, their leader is not setting the world on fire every second week and compared to Trump seems like the paragon of reason on the surface. Of course we know it's a facade but man what crazy times to live in.

2ndorderthought•2 days ago

I don't see the issue. China hosts the alternatives or the only game in town for lots of technologies. China has every interest and right to create products. Not everything that comes out of China is some devious plan to do terrible things. It's people trying to make money just like you and me.

I am not washing away the authoritarianism, but take a look at other economic super powers directionality. Or that of tech ceos as well. At least Chinese tech companies aren't going around praising wwii Germany, writing manifestos, and bombing children at school or fisherman on whims. It is difficult not to see more countries regardless of leadership putting their hat in the ring as a net positive. Especially if it increases sustainability and lowers the price, which this very clearly does. It's even open source...

Cthulhu_•2 days ago

Moral stances aside, I'd argue it's healthy that the US gets competition from abroad. I appreciate the boost that the world is getting from China - infrastructure and construction projects are a huge benefit to economies. Their focus on green energy has caused a huge influx of affordable solar panels, home batteries, EVs, etcetera, helping reduce the dependency on fossil fuels - while the US and especially the other big money spenders in the middle east would rather the world remain fully dependent on them. But for the past years Europe and now Asia are feeling the pain from being overly reliant on that.

China's policies and government aren't morally defensible and I do fear that they will become more aggressive in spreading their influence and policies onto other countries, but from an economic standpoint what they're doing is super effective. While the previous world power (the US) is stuck in infighting and going through cycles of fixing/undoing the previous administration's damages, instead of planning ahead.

amunozo•2 days ago

Competition with the Soviet Union gave all the workers in the world better conditions, also advances in science and technology... (And risk of mutual destruction ;)), even if the USSR wasn't good.

chvid•2 days ago

The important thing is that LLMs are well-dispersed and the technology is relative open, much more open than it could have been. Alternative worthwhile LLMs will emerge from Europe and other non-US western countries once the economic incentives are there.

thedelanyo•1 day ago

> mercy of authoritarian regimes.

Yet, it's the democratic regime which is causing all the chaos around the world and disrespecting the leadership of other jurisdictions, just to keep pushing the petrol dollar going up.

Do we ever think there's any subtle difference between authoritarian and democratic? Where democracy ultimately makes the world a better place?

LeFantome•1 day ago

Thankfully, DeepSeek is the most open of the model providers.

And in the hardware side, RISC-V is gaining a lot of traction in China. So the dependency on a single supplier is lower with the Chinese tech stack than with most western options.

parthdesai•2 days ago

> If China ever feels emboldened enough to go for Taiwan and the US descends into complete chaos, the rest of the world running on AI will be at the mercy of authoritarian regimes.

Alternative being the current reality and world being dominated by US. Let's ask people in Middle East/Asia/South America about how they feel about that. In this current day and age, how is this statement even relevant?

SgtBastard•2 days ago

Mistral (a French company) shouldn’t be discounted.

zobzu•1 day ago

bold to think half the comments here arent from deepseek itself :)

I personally love the bit "us initiated tech war" lol. thats right, they started making AI its their fault! bad imperialist US !

yeah, v5 will do better

mft_•2 days ago

You’re right… but that’s on the rest of the world not getting their shit together.

It’s this sort of example (and not properly supporting Ukraine, and not agreeing how to collectively deal with migrants, and not agreeing how to coordinate defence, and myriad other examples) that highlights what a pointless mess the EU is. It’s not a unified block - it’s 27 self-interested entities squabbling and playing petty power games, while totally failing to plan for the future with vision.

The EU could/should have ensured that a European equivalent to OpenAI or Anthropic could thrive, and had competitive frontier models already; instead, they’re years and countless billions behind.

simgt•2 days ago

The EU pouring even more billions in this would just have meant pouring billions on US tech. China is winning on all fronts at this game because of the embargo, they end up even more vertically integrated as a result of it.

cde-v•2 days ago

China doesn't even care about Taiwan anymore, their saber-rattling about it is a convenient distraction while they quietly make it completely irrelevant in the next few years.

8note•1 day ago

china is gonna care about taiwan as a means of ocean access til the end of time, or til the tectonic plates move to make different opportunities.

the people and industry arent what matter there

iso1631•2 days ago

It does seem the idea is to get the Taiwanese people to want to choose to rejoin China by making China far better for people to live than Taiwan. Maybe that will be via democracy (i.e. China manipulates the people of Taiwan), or perhaps it will be genuine (i.e. China provides a far better lifestyle for the average person than Taiwan)

Danox•2 days ago

Isn’t Mistral close in the ballpark?

2ndorderthought•2 days ago

Mistral has a different focus. They aren't taking on trillions in debt risking their entire economy to produce useful products.

I think they are leaders in the democratization of LLMs. Almost everyone has a computer right now that can run a useful variant of a Mistral model. I hope they keep their focus because what they are aiming for likely has the biggest impact on the average person and would be the best case scenario for the technology in general.

Lapel2742•2 days ago

AFAIK: Current Mistral models are not competitive with SOTA-models that come out of the USA or China. They are "good enough" for enterprise usage when you don't need SOTA performance.

Their main selling point is: They are neither US-American nor Chinese. That's a real moat in today's world. I think at the moment they feel quite comfortable.

techsystems•2 days ago

There are no European models that come close. It's Korean models, then a UAE model K2, then Mistral.

eunos•2 days ago

They arent. Benchmark wise they are quite apart.

AntiUSAbah•2 days ago

Come on... I was hoping that Mistral would do something and man that would be great as european but I hear NOTHING from them ever.

I don't know what the problem is. Are we europeans to stupid? Do we just not have enough money / VC money? Are we not proud enough?

victorbjorklund•2 days ago

Not worse than having our stuff built there. Is it great to be relying on them? No, but at least more stable than US under Trump.

Scroll_Swe•1 day ago

>- it is just a beautiful thing to see it slowly fall apart.

I feel uneasy over China dominance as much as the US.

I trust US more still as Europe has a post WW2 relationship. I notice many comments being pro China but they seem to be from the third world (one mentioned a very low salary) I feel the opening of the internet was a mistake.

China is a totilitarian dictatorship. This is a fact.

Look into Mistral AI too :)

For context, I am Swedish.

Yes this is a new account, please focus on the content.

riskd•1 day ago

Are third world users opinions of lesser value?

glenstein•1 day ago

Are there distinct third world opinions in one direction or the other? I've tended to assume they are non-unitary rather than broadly converging on one side or the other.

GorbachevyChase•1 day ago

Come on, Sweden isn’t quite a 3rd world country.

Scroll_Swe•1 day ago

When people from developing countries praise China and communism while criticizing the United States and claiming “Europe is the same,” I find it hard to take their views seriously.

I think their stance often comes from a strong anti-Western bias, and sometimes from feelings of resentment.

0x737368•1 day ago

And then westerners wonder why they're disliked in the rest of the world...

pb7•1 day ago

You are on a Western website.

torginus•1 day ago

The content we see here is that you are Swedish. I am not sure what sort of moral, technical, financial authority are we supposed to be deriving from this.

Dont get me wrong, Sweden is a cool country, but still my point stands.

bigyabai•1 day ago

FWIW, I am a lifelong American citizen and I exclusively use Chinese AI models for programming because I consider Claude and Codex to be highway robbery for the price.

Trust whoever you want, I just don't have the patience (or money) for American models.

platinumrad•1 day ago

> I notice many comments being pro China but they seem to be from the third world (one mentioned a very low salary) I feel the opening of the internet was a mistake.

Yeah, I also really hate when poor people think they're allowed to talk.

Der_Einzige•1 day ago

Opening of the internet WAS a mistake. During times when whole countries (you know which ones) get geoblocked, the internet (especially online gaming) gets a lot better.

shimman•1 day ago

Honestly the China scare mongering is borderline hilarious. The US has literally attacked two countries this year in the span of weeks and is blockading another causing needless deaths. Not too mention the last 50 years of US imperialism making the world a worse place for everyone, except to benefit the few (doesn't benefit Americans, only capitalists).

The idea that China is worse than America is laughable. LMK when China invades 5 countries in a span of 20 years unimpeded by anyone else in the world and maybe I'll be scared.

Until then it's quite clear how consumers benefit from actual competition and it's not because of the US.

Also you saying you trust the US when they just threatened to invade Greenland (a threat so credible that Denmark was planning a full scale resistance against US troops).

Sorry but the curtains are truly coming down and the US will become one of the most hated nations in the world while 100s of millions will needlessly starve and die because of the actions of Americans that simply don't give a fuck.

FWIW, I'm not just talking about Trump either. Democratic politicians are just as much to blame, they champion corporatism and imperialism as much as Republicans and the only issues D leadership seems to have is that the "right process" wasn't being followed.

I say this as someone who is a literal democratic operative within the party.

glenstein•1 day ago

One party authoritarian dictatorship with no free speech or democratic elections and no civil rights movement seems pretty bad to me. No amount of whataboutism is ever going to compete with that.

It also seems like clashes with India, every southeast asian country with internationally recognized territory rights in the South China sea, the forcible takeover of Hong Kong, arming and economically supporting Russia, Pakistan and Iran are bad, and the increasing probability of a hot war to take over Taiwan should count as bad, perhaps the most urgently dangerous threat to global peace in the 21st century.

The United States track record post WW2 is a complicated combination of monstrously immoral Kissenger and Bush style overthrows of democracies and genuinely valuable maintenance of a post WW2 democratic order focused on things like free speech and human rights. I stay with full sincerity that in the decade plus that I've been here on hn seeing whataboutism as a strategy for defending China, I'm yet to encounter anything that feels like a sincere engagement with United States role in the world as a combination of positives and negatives, it's always flatly one-sided messaging that feels like it's aimed at a favorable audience that already agree rather than like it's sincerely attempting to persuade.

scottyah•1 day ago

So you're good with the takeover of Hong Kong and what they're doing with the Uyghurs? I think you're getting a pretty biased feed of news. I'm not saying China is the devil, but the trite "USA bad because [overhyped recent news]" is a crazy take. There's plenty of bad stuff that has been done by Americans you could have called out.

avazhi•1 day ago

Did you miss the part where Iran spent the past 50 years promising to develop nuclear weapons and then use them on both Israel and America, or do just choose to conveniently ignore that when you go on rants like this? In one of the last rounds of talks before the war Kushner and Witkoff offered Iran free nuclear fuel in perpetuity in exchange for the weapons grade uranium and got turned down, so clearly the Iranians weren’t just bluffing.

This war could have been handled much differently and better, but acting like America attacked Iran for no reason is laughable. It is in fact America’s inexplicable reticence to kill Iranian civilians that is the reason this is going on for this long. America could have ended this in a few days if it had stopped worrying about being criticised by the rest of the world that hates it anyway.

https://www.nytimes.com/2026/04/07/us/politics/trump-iran-wa...

tigershark•1 day ago

China never threatened this: "A whole civilization will die tonight, never to be brought back again.". Also China never announced that it was going to attack Europe. I trust them much more compared to a malignant narcissist that doesn't care if the whole world burns now that he doesn't have much left to live.

nozzlegear•1 day ago

Xi Jinping really doesn't have to do anything but sit back and let Trump make a pro-China argument for him. It's like that "Do Nothing, win" meme:

https://i.kym-cdn.com/photos/images/original/002/352/212/95b...

pckd•1 day ago

There is no morality at Country level. The talk about values, morality, just world is only a lip service and pretty much every smart person knows this. If you still want to hold a country for moral standards, our own dear USA's standards would be pathetically low. One example - we force and demand every other country to use USD as the reserve currency. If anyone considers the alternative, we follow the usual routines (bombing, hacking, kidnapping, Tarrifs, coercing, currency sabotage, etc). If two countries in some remote corner of world want to exchange goods and transact in their own local currency, what legal rights does the USA have to stop it and force them to use USD? and punish them if they do not listen? Just because we are aligned with Western Europe, do not assume moral high grounds.

nozzlegear•1 day ago

> If two countries in some remote corner of world want to exchange goods and transact in their own local currency, what legal rights does the USA have to stop it and force them to use USD?

China and Russia trade in yuan and rubles. India and Russia do oil deals in rupees. China and Brazil trade in yuan. The US hasn't bombed any of them.

hnsdev•1 day ago

I don't see any sense in trusting the US more than China. There are arguably as many arguments to say the US is horrible as the current dominant country as China would be. If anything, a multipolar world would be more positive, specially to the EU, as currently the EU is just US's bitch, and has to live by appeasing Mr. Donny, as done in the stupid trade deal signed by Von Der Leyen.

Also, feeling the opening of the internet as a mistake show the degree of your ignorance, people from third world countries also have the right to speak as much as you do, your opinion is not more valid than anyone else's.

For context, I am Italian-Brazilian, so I pretty much have been exposed to both sides (western and non-western, even though we can argue that Brazil is more west aligned).

spaceman_2020•2 days ago

I've been baffled watching America double down on the same strategy even when it failed to produce results

They sanctioned the hell out of Huawei and now Huawei is bigger than ever

America is just not able to digest the idea that another country can be as good, if not better, at innovation

hirako2000•2 days ago

Deeper than the inability to digest. The incapability to comprehend it.

China's fall in the 19th century came at them for the same reason. How could these European savages be stronger, thus better than us? Our intelligence service must be out of their mind.

nipponese•2 days ago

Because it worked on Japan in the 80s and 90s and sometimes “Americans” have a hard time telling the two cultures apart.

segmondy•2 days ago

It's not about 2 cultures, but 2 timelines. China has seen the game and adapted, they will not respond with prior losing responses. Meanwhile, America is playing the same moves because it worked in the past.

spaceman_2020•1 day ago

Weird why Americans would think that the coercion that worked against an essentially vassal state with no independent military would work against a non-aligned nuclear powered state with a strong, independent military

Sovereign and non-sovereign nations have completely different decision matrices for dealing with external threats

jatora•2 days ago

I'm no huge fan of America, but claiming China is as good or better at innovation is asinine.

It costs 100-1000x less manpower, money, and time to hug the heels of innovators than to actually pioneer. Say what you will about America but they absolutely lead technological innovation and it's not even remotely close.

spaceman_2020•1 day ago

Yeah, because the Americans had a 150 year headstart

China had literally 60M people die in a famine when JFK was president and Elvis was the biggest thing. The country was basically farmland and basic industries 40 years ago

Why would you even compare their capabilities today vs a country that has been a sovereign nation for 250 years?

You look at trajectories, not the present

2ndorderthought•2 days ago

America has been making short term and short sighted moves to try to widen a gap that cannot sustain. They have chosen the wrong strategy out of fear and greed. Cooperation is the right strategy. Isolationism will not work in the long term except for maybe the handful that drove it. The irony is that it's an anticompetitive and anticapitalist move to do what they have been doing, so it's not even on principal.

srameshc•2 days ago

As much I apprecite the sentiment, I think it is too early to declare that the well guareded monopoly is over. Yes, these models have answers, but don't expect all the large enterprises to switch to these models. The other aspect is scaling to serve these models will need a lot of time even if Huawei succeeds. Not all the Governments trust China and there will be a lot of resistance to work with these models eventually, even if cheaper.

segmondy•2 days ago

Which Monopoly? Are all large enterprises in USA? There are tons of them outside and they will run the open ones and cheapest ones to infer and those are Chinese. I run Chinese models at home and don't bother with cloud. If I could call the shots at work, we will switch 100% to Chinese models so everyone could have "unlimited" tokens.

rapind•2 days ago

You might be underestimating how significantly cheaper this is and how much people care about price.

Walmart is a horrible company owned by horrible people and yet it’s cheap so it dominates.

If the quality really is in the Opus 4.6 range (considering how bad 4.7 is), then it’s a pretty big deal.

ai-x•1 day ago

Can you exactly point to me how US Tech firms are "falling apart".

Deepseek is a mid model. not SOTA.

nazgulsenpai•1 day ago

This thread really exploded into partisan geopolitics. Sad to see. And I agree. This whole ecosystem of tech monopolies is a negative from just about every POV except the government, the investor, and the companies themselves.

maxdo•2 days ago

This model is dead on arrival.

It’s a burned ccp money at this point . They will not be able to serve it until H2 2026 . Even at this point if you look at opus 4.7 and gpt 5.5 this model is just mediocre.

By the time they can serve it nobody will care at all.

michaelmrose•2 days ago

Multiple independent implementations inherently virtuous. After all each individual party may innovate in ways that benefit everyone ultimately.

Also it's tech they can be sure we can't cut them out of or tariff and money flowing from Chinese companies to other Chinese companies which we appreciate the benefits of when the shoe is on the other foot.

reactordev•2 days ago

I think you missed the bigger picture here. It’s that China has their own stack now, soon others will follow. It’s not about putting up the highest numbers, it’s about putting up the highest ROI. To them, this is it. Qwen too but being able to compete with today’s models means they are closer to competing with tomorrow’s.

scottyah•1 day ago

At this scale, it's purely quality. The better the model, the faster the advancements. If using a model half as smart as the best made us half as productive, people would pretty much all be using the current quantized models that can run on a decent laptop. The difference between Opus xHigh and Gemma4 is very different (at least in my job).

torginus•1 day ago

I'm kinda baffled by this whole belief system, that instead of seeing that other guys on the other side of the planet have managed to do what is generally though to be the pinnacle of Western engineering & investment with the fraction of the resources, and maybe improve upon it in some way, and their conclusion isn't 'maybe this stuff isn't as hard, and could do much better, or at least do the same thing the Deepseek guys did', but it devolves into this weird nationalist shtflinging great power competition thing, as if these models were the result of deliberate nation-state level coordination of government and industry like the space program.

For me as a consumer, competition is good - that means companies have less leverage over me, which is beneficial even if I decided to never use a Chinese model ever.

gxs•1 day ago

If you look at the past 3-4 decades, China has just played their cards so well

If/when they overtake the US, all things aside, they deserve it. There is no world where the US overtakes China but there’s a world where China overtakes the US. Best outcome for the US atm is parity.

Just remarkable the things they’ve accomplished in the time they’ve accomplished them.

jmyeet•2 days ago

These have been my predictions since at least the first release of DeepSeek-R1 over a year ago:

1. There will be no moat where one company "owns" AI. China will see to that. It's simply too much in their national interest for that not to happen;

2. This is incredibly bad news for OpenAI who have raised so much money with so (comparabley( little revenue that the only way they can get a return on that is to "win" and be that company that "owns" AI; and

3. China's chipmaking will catch up with Taiwan within the next decade (with commercial EUV at scale within 5 years). I liken this to American hubris over the development of the atomic bomb where in 1945 many American leaders and military thought the USSR would either never get the atomic bomb or it would take 20+ years. It took 4. And they USSR's first hydrogen bomb was detonated a year after the US's.

Whereas the USSR did this with espionage. times have changed. Now all China has to do is throw a few million dollars at hiring the right people froM ASML and elsewhere. China has the track record of delivering on long term projects. Closing the lithography gap will be no different.

scottyah•1 day ago

Espionage has changed wildly, and the ease of taking out key people in "accidents" has dramatically increased.

lanthissa•2 days ago

not really, china has gone domestic for everything as soon as it could.

its naive to think they would have stayed on a 'western' stack.

Most of the time 'losing' isn't making a bad choice its being put in a situation where you have no good choices.

IncreasePosts•1 day ago

Deepseek is distilled from other SOTA models. Without them, deepseek would not be possible.

AndrewKemendo•1 day ago

I just wished more Chinese companies would start setting up shop outside of China so that we could all work for them

I’ve talked to the folks over at Unitree multiple times and they say “yeah we’ll be hiring overseas soon” and then they never do and they only have five openings in China

shimman•1 day ago

They are, plenty of BYD factoring being built throughout South America and Southeast Asia as a condition of opening trade. Same is starting to happen in Europe too.

You just aren't going to this too much in the US or any countries fully aligned with the US for fear of competition. It doesn't benefit anyone really. It's not like I get richer when Ford says more vehicles or Meta makes more teenagers suicidal, so why should we care? It'll hurt the country in the long run too.

scottyah•1 day ago

You had a chance with Bytedance. It didn't sound too great though, there was a very hard glass ceiling for all non-chinese according to Blind.

GorbachevyChase•1 day ago

The PRC government operates extrajudicial police forces outside their borders to keep the diaspora in line. I think they disappeared Jack Ma for a while. I suspect there’s something like that that goes on in the US, but I don’t have strong evidence for that.

AndrewKemendo•1 day ago

I’d take my chances with that without issue

philipallstar•2 days ago

It's not a tech war. America built China's capability through outsourcing manufacturing. It's hardly a war.

rvz•2 days ago

The paper is here: [0]

Was expecting that the release would be this month [1], since everyone forgot about it and not reading the papers they were releasing and 7 days later here we have it.

One of the key points of this model to look at is the optimization that DeepSeek made with the residual design of the neural network architecture of the LLM, which is manifold-constrained hyper-connections (mHC) which is from this paper [2], which makes this possible to efficiently train it, especially with its hybrid attention mechanism designed for this.

There was not that much discussion around it some months ago here [3] about it but again this is a recommended read of the paper.

I wouldn't trust the benchmarks directly, but would wait for others to try it for themselves to see if it matches the performance of frontier models.

Either way, this is why Anthropic wants to ban open weight models and I cannot wait for the quantized versions to release momentarily.

[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

[1] https://news.ycombinator.com/item?id=47793880

[2] https://arxiv.org/abs/2512.24880

[3] https://news.ycombinator.com/item?id=46452172

jeswin•2 days ago

> this is why Anthropic wants to ban open weight models

Do you have a source?

louiereederson•2 days ago

More like he wants to ban accelerator chip sales to China, which may be about “national security” or self preservation against a different model for AI development which also happens to be an existential threat to Anthropic. Maybe those alternatives are actually one and the same to him.

HarHarVeryFunny•2 days ago

Annecotal, but I saw a tweet from someone who interviewed at Anthropic, and was explicity rejected because of cultural mismatch because they were not against open weight models.

It's hard not to see Anthropic's messaging of "this tech that we're pushing on you is going to take your job and maybe kill you" as being about anything other than regulatory capture, with the goal of the government shutting down competitors.

I think OpenAI and Anthropic are both really in a tough spot - spending so much on what is becoming a commodity product for which neither seems positioned to be low cost producer. Maybe a bit like the UK-France channel tunnel project where the product itself is a success but a bloodbath for those who invested to build it.

Kuyawa•1 day ago

I am using DeepSeek extensively to develop apps, three in the last month, with my own CLI coding agent [1] developed by DeepSeek itself line by line. I haven't spent $1 yet in well over 10 million tokens.

If I considered myself a 10X programmer, now I am 100X. Love DeepSeek.

[1] https://github.com/kuyawa/mecha-ai

edg5000•about 21 hours ago

Have you compared it against other coding agents? What is your general workflow with DeepSeek; do you write a spec and then have it implement and test? Very interesting to hear. Becuase your harness is adapted to DeepSeek, you probably prompt and it very differently; since its adapted to the model this may explain why it works well for you. Wiring up an existing harness that is not tested on DeepSeek may not yield optimal results.

lobo_tuerto•2 days ago

Glad to see most of the comments here were kept on-topic and didn't deviate at all into geopolitical discussion.

zkmon•2 days ago

They released 1.6 T pro base model on huggingface. First time I'm seeing a "T" model here.

mzl•2 days ago

Kimi K2.5 and K2.6 are both >1T

jessepcc•2 days ago

At this point 'frontier model release' is a monthly cadence, Kimi 2.6 Claude 4.6 GPT 5.5, the interesting question is which evals will still be meaningful in 6 months.

mixtureoftakes•2 days ago

more like weekly or almost daily, gpt 5.5 was literally 12 hours ago

dizhn•2 days ago

I like deepseek. It works very well. I haven't tried v4 yet but on their web chat interface, just typing "Taiwan" causes it to give you a lecture about how Taiwan is part of China. :)

jyscao•2 days ago

What a gotcha

RALaBarge•2 days ago

Jingoism: Its such a rush!

kroaton•2 days ago

Ask western models about Israel's genocides and mass rapes in Palestine, Lebanon, etc.

intrasight•2 days ago

It's open source, so just delete those parameters. /s

coderssh•2 days ago

Feels like the real story here is cost/performance tradeoff rather than raw capability. Benchmarks keep moving incrementally, but efficiency gains like this actually change who can afford to build on top.

rohanm93•2 days ago

This is shockingly cheap for a near frontier model. This is insane.

For context, for an agent we're working on, we're using 5-mini, which is $2/1m tokens. This is $0.30/1m tokens. And it's Opus 4.6 level - this can't be real.

I am uncomfortable about sending user data which may contain PII to their servers in China so I won't be using this as appealing as it sounds. I need this to come to a US-hosted environment at an equivalent price.

Hosting this on my own + renting GPUs is much more expensive than DeepSeek's quoted price, so not an option.

esperent•2 days ago

> I am uncomfortable about sending user data which may contain PII to their servers in China

As a European I feel deeply uncomfortable about sending data to US companies where I know for sure that the government has access to it.

I also feel uncomfortable sending it to China.

If you'd asked me ten years ago which one made me more uncomfortable. China.

But now I'm not so sure, in fact I'm starting to lean towards the US as being the major risk.

tiahura•2 days ago

The chances of my bank account getting hacked due to the PLA backdoor in Deepseek is higher than the CIA backdoor in OpenAI.

fractalf•2 days ago

Right now Im much more worried about sending data to the US and A.. At least theres a less chanse it will be missused against -me-

swiftcoder•2 days ago

> For context, for an agent we're working on, we're using 5-mini, which is $2/1m tokens. This is $0.30/1m tokens. And it's Opus 4.6 level - this can't be real.

It's doesn't seem all that out there compared to the other Chinese model price/performance? Kimi2.6 is cheaper even than this, and is pretty close in performance

rohanm93•2 days ago

Kimi is indeed somewhat cheap for frontier-level intelligence, but still is $4-5 per mm tokens. Deep Seek is at least an order of magnitude cheaper.

swiftcoder•2 days ago

Oh, right you are. I misread where the decimal place was in the Deepseek pricing. That is incredibly cheap

gordonhart•1 day ago

Since it's open weights it'll be available on AWS Bedrock soon(ish), likely at a higher price than the official API but still coming in under those GPT-5-mini prices.

rohanm93•1 day ago

Interesting, thanks. I'll keep an eye out.

Havoc•1 day ago

Tried running it over some code as a secondary review and so far very impressed. Will definitely keep using it for that. Seems to pick up different issues than other models.

With DS tech though the worry is generally more capacity. Haven't seen issues with v4 but in the past their combination of quality and pricing means they get overloaded.

quadruple•2 days ago

In their paper, point 5.2.5 talks about their sandboxing platform(DeepSeek Elastic Compute). It seems like they have 4 different execution methods: function calls, container, microVM and fullVM.

This is a pretty interesting thing they've built in my opinion, and not something I'd expect to be buried in the model paper like this. Does anyone have any details about it? Google doesn't seem to find anything of note, and I'd love to dive a bit deeper into DSec.

mrinterweb•1 day ago

I'm too concerned with data exfiltration to use many AI services unless their terms of service state they will not use your data for training or anything else. Zero retention is what I'm looking for. I care because I frequently work on proprietary code that I do not personally own (as most employed software devs do). So if I am using an AI service with proprietary code, I want assurances that there is no retention and no training happening. From my American perspective Chinese companies don't have the best track record of not training on proprietary information. I guess LLMs in general are trained on a lot of proprietary information. I just don't want to be responsible for unintentionally exfiltrating my employer's proprietary code.

XCSme•2 days ago

Something is odd with this model, their blog posts shows REALLY good results, but in most other third-party benchmarks, people realize it's not really SOTA, even bellow Kimi K2.6 and GLM-5/5.1

In my tests too[0], it doesn't reach top 10. One issue, which they also mentioned in their post, is that they can't really serve well the model at the moment, so V4-Pro is heavily rate-limited and gives a lot of timeout errors when I try to test it. This shouldn't be an issue though, considering the model is open-source, but it makes it hard to accurately test at the moment.

[0]: https://aibenchy.com/compare/deepseek-deepseek-v4-flash-high...

Oras•2 days ago

I used pro via API (DeepSeek API not OpenRouter) with Claude Code, and the planning, visual solution, understanding was fantastic.

I would say I wouldn't notice this wasn't Opus 4.6. What I asked was looking at a feature implemented recently, and how it could be improved. Consumed 3.3 million tokens and create a much better flow.

It had a bug when I started the implementation though related to the API, which I suppose it is something they didn't catch when making their API compatible with CC.

dannyw•2 days ago

Hmm, the Flash performs significantly better than Pro in the benchmark? That's very strange; could rate limiting cause that?

XCSme•2 days ago

Yes, Flash doesn't seem to have the same rate limits as Pro.

I expect once the API issues are fixed, for v4-pro to be around the same level as GLM-5.

wolttam•2 days ago

Why would your test be including scores of failed responses/runs? That seems confusing.

(I am confused by the results your website is presenting)

coder543•2 days ago

Your “benchmark” is invalid. Penalizing the model because the hosting environment is being DDoSed by users a few hours after launch is utter nonsense.

I see that you tried to justify this lower in the thread, but no… it completely invalidates your benchmark. You are not testing the model. You are conflating one specific model host and model performance, and then claiming you are benchmarking the model. All major models are hosted by multiple different services.

In the real world, clients will just retry if there is a server error, and that will not impact response quality at all, and the workflow the model is being used in will not fail. If a workflow is so poorly coded that it doesn’t even have retry logic, then that workflow is doomed no matter which host you use. But again, reliability of the host is separate from the model.

You can make your benchmark valid by having separate leaderboards for model quality and host reliability. I’m not saying to throw the whole thing away. But the current claim is not valid.

And you’re also making an unsourced claim that everyone else has already determined this model sucks? Nah. The first result from Artificial Analysis shows good things: https://x.com/ArtificialAnlys/status/2047547434809880611

But I am still waiting to see the results from the full suite of AA benchmarks.

BoorishBears•1 day ago

Their benchmark is full of nonsense like this and I'm amazed the fact most of their interactions on the site are promoting it hasn't gotten the account banned for spam.

They have Gemini 2.5 Flash ahead of Opus 4.6: https://aibenchy.com/compare/anthropic-claude-opus-4-6-mediu...

Absolutely worthless benchmark but every release has a comment linking to this nonsense.

embedding-shape•1 day ago

> V4-Pro is heavily rate-limited and gives a lot of timeout errors when I try to test it. This shouldn't be an issue though, considering the model is open-source

Why does it matter if the model/architecture/weights are open source or not, given it's their proprietary inference hardware they're currently having issues with? Proprietary or not, the same issue would still be there on their platform.

XCSme•1 day ago

It depends...

If the conclusion is: "DeepSeek v4 is this good, if you use it from DeepSeek" (which is how most people would use it anyway), then it makes sense to count API errors as failures.

But, if the conclusion must be "The DeepSeek v4 model is this good when self-hosted and ran at ideal conditions", then the model should be tested locally, and skipping all invalid calls.

I am still debating what should I do in this case, because showing a model as #1, and then people try to use it from their official provider and it fails half of the time, then that's also not a good leaderboard.

I am considering adding a "reliability" column. Retry API errors until the test completes, BUT track how many retries was needed and compute a separate reliability score. But here comes a different problem: reliability varies over time and providers, so that's tougher to test.

embedding-shape•1 day ago

Sounds like you're mixing and trying to measure two very different things, but placing them in the same category. One is the model itself, then there are reference conditions, and no such thing as "API failure". The other one is the reliability and uptime of a remote API endpoint for LLM inference.

If you want to measure their API, do so, but don't place it under the same category as testing the model itself, as they're two different metrics.

simonw•2 days ago

I like the pelican I got out of deepseek-v4-flash more than the one I got from deepseek-v4-pro.

https://simonwillison.net/2026/Apr/24/deepseek-v4/

Both generated using OpenRouter.

For comparison, here's what I got from DeepSeek 3.2 back in December: https://simonwillison.net/2025/Dec/1/deepseek-v32/

And DeepSeek 3.1 in August: https://simonwillison.net/2025/Aug/22/deepseek-31/

And DeepSeek v3-0324 in March last year: https://simonwillison.net/2025/Mar/24/deepseek/

JSR_FDED•2 days ago

No way. The Pro pelican is fatter, has a customized front fork, and the sun is shining! He’s definitely living the best life.

chronogram•2 days ago

The pro pelican is a work of art! It goes dimensions that no other LLM has gone before.

w4yai•2 days ago

yeah. look at these 4 feathers (?) on his bum too.

oliver236•2 days ago

a lot of dumplings

nickvec•2 days ago

The Flash one is pretty impressive. Might be my favorite so far in the pelican-riding-a-bicycle series

torginus•2 days ago

This is just a random thought, but have you tried doing an 'agentic' pelican?

As in have the model consider its generated SVG, and gradually refine it, using its knowledge of the relative positions and proportions of the shapes generated, and have it spin for a while, and hopefully the end result will be better than just oneshotting it.

Or maybe going even one step further - most modern models have tool use and image recognition capabilities - what if you have it generate an SVG (or parts/layers of it, as per the model's discretion) and feed it back to itself via image recognition, and then improve on the result.

I think it'd be interesting to see, as for a lot of models, their oneshot capability in coding is not necessarily corellated with their in-harness ability, the latter which really matters.

simonw•2 days ago

I tried that for the GPT-5 launch - a self-improving loop that renders the SVG, looks at it and tries again - and the results were surprisingly disappointing.

I should try it again with the more recent models.

torginus•2 days ago

I see, thanks. I guess most current models are not yet trained for this loop.

Could you please try with Opus 4.7? I think there's a chance of it doing well, considering the design/vision focus.

murkt•2 days ago

DeepSeek pelicans are the angriest pelicans I’ve seen so far.

kristopolous•2 days ago

they're just late for work.

muyuu•2 days ago

They're stressed pelicans from Hangzhou.

lazycatjumping•2 days ago

996 Pelican, lol

mikae1•2 days ago

Being a bicycle geometry nerd I always look at the bicycle first.

Let me tell you how much the Pro one sucks... It looks like failed Pedersen[1]. The rear wheel intersects with the bottom bracket, so it wouldn't even roll. Or rather, this bike couldn't exist.

The flash one looks surprisingly correct with some wild fork offset and the slackest of seat tubes. It's got some lowrider[2] aspirations with the small wheels, but with longer, Rivendellish[3], chainstays. The seat post has different angle than the seat tube, so good luck lowering that.

[1] https://en.wikipedia.org/wiki/Pedersen_bicycle

[2] https://en.wikipedia.org/wiki/Lowrider_bicycle

[3] https://www.rivbike.com/

simonw•2 days ago

This is an excellent comment. Thanks for this - I've only ever thought about whether the frame is the right shape, I never thought about how different illustrations might map to different bicycle categories.

mikae1•2 days ago

Some other reactions:

I wonder which model will try some more common spoke lacing patterns. Right now there seems to be a preference for radial lacing, which is not super common (but simple to draw). The Flash and Pro one uses 16 spoke rims, which actually exist[1] but are not super common.

The Pro model fails badly at the spokes. Heck, the spokes sit on the outside of the drive side of the rim and tire. Have a nice ride riding on the spokes (instead of the tire) welded to the side of your rim.

Both bikes have the drive side on the left, which is very very uncommon. That can't exist in the training data.

[1] https://cicli-berlinetta.com/product/campagnolo-shamal-16-sp...

jojobas•2 days ago

The Pedersen looks like someone failed the "draw a bicycle" test and decided to adjust the universe.

catelm•2 days ago

I think the pelican on a bike is known widely enough that of seizes to be useful as a benchmark. There is even a pelican briefly appearing in the promo video of GPT-5, if I'm not mistaken https://openai.com/gpt-5/. So the companies are apparently aware of it.

simonw•2 days ago

It was a bigger deal in the Gemini 3.1 launch: https://x.com/JeffDean/status/2024525132266688757

nsoonhui•2 days ago

To me this is the perfect proof that

1) LLM is not AGI. Because surely if AGI it would imply that pro would do better than flash?

2) and because of the above, Pelican example is most likely already being benchmaxxed.

brutal_chaos_•2 days ago

What was your prompt for the image? Apologies if this should be obvious.

shawn_w•2 days ago

>Generate an SVG of a pelican riding a bicycle

at the top of the linked pages.

chvid•2 days ago

Is it then Deepseek hosted by Deepseek?

How much does the drawing change if you ask it again?

ycui1986•2 days ago

I really like the pro version. The pelican is so cute.

theanonymousone•2 days ago

Where is the GPT 5.5 Pelican?

simonw•2 days ago

https://news.ycombinator.com/item?id=47879092#47880421

culopatin•2 days ago

In the 5.5 topic

lobochrome•2 days ago

Why they so angry?

EnPissant•2 days ago

This should not be the top comment on every model release post. It's getting tiring.

blitzar•2 days ago

This should be the bottom comment on the pelican comment on every model release post.

EnPissant•2 days ago

Clearly the top comment should be "Imagine a beowulf cluster of Deepseek v4!"

aquir•2 days ago

It is great! I asked the question what I always ask of new models ("what would Ian M Banks think about the current state of AI") and it gave me a brilliant answer! Funny enough the answer contained multiple criticisms of his own creators ("Chinese state entities", "Social Credit System").

cmitsakis•2 days ago

I just did some quick testing on my own benchmark that tests LLMs as customer support chatbots, and found out that deepseek-v4-flash (scored 90.2%) was better than qwen3.5-27b (89%) and qwen3.5-35b-a3b (89.1%) and roughly equal to gemini-3-flash-preview (90.5%), but deepseek-v4-flash had the lowest cost of all of them by far. Half the cost of gemini-3-flash and an order of magnitude less cost than the qwen models.

Have you noticed the deepseek-v4-pro performing worse than deepseek-v4-flash? It performed even worse than qwen3.5-27b. I found it surprising and I'm wondering if there is a bug on my software because I had to implement sending the `reasoning_content` otherwise the API failed with BadRequestError.

littlestymaar•2 days ago

How can a medium-sized model like Deepseek-V4-Flash be cheaper than a much smaller models like Qwen3.5-35B-A3B.

It's five times bigger in both total and active parameters!

Ancapistani•1 day ago

I don’t know for sure, but I believe those larger models must be run on nVidia hardware (CUDA), while Deepseek-V4-* can be run on Huawei chips. My assumption is that there is less demand pressure on non-nVidia chips.

sixhobbits•2 days ago

I know people don't like Twitter links here but the main link just goes to their main docs site generic 'getting started' page.

The website now has a link to the announcement on Twitter here https://x.com/deepseek_ai/status/2047516922263285776

Copying text of that below

DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.

DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.

DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice.

Try it now at http://chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today!

Tech Report: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

Open Weights: https://huggingface.co/collections/deepseek-ai/deepseek-v4

alpineman•2 days ago

Just use xcancel by adding 'cancel' to the link

https://xcancel.com/deepseek_ai/status/2047516922263285776

gardnr•2 days ago

865 GB: I am going to need a bigger GPU.

npodbielski•2 days ago

Or several bigger GPUs! :)

WhereIsTheTruth•2 days ago

Interesting note:

"Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly."

So it's going to be even cheaper

Aliabid94•2 days ago

MMLU-Pro:

Gemini-3.1-Pro at 91.0

Opus-4.6 at 89.1

GPT-5.4, Kimi2.6, and DS-V4-Pro tied at 87.5

Pretty impressive

ant6n•2 days ago

Funny how Gemini is theoretically the best -- but in practice all the bugs in the interface mean I don't want to use it anymore. The worst is it forgets context (and lies about it), but it's very unreliable at reading pdfs (and lies about it). There's also no branch, so once the context is lost/polluted, you have to start projects over and build up the context from scratch again.

spaceman_2020•2 days ago

The sheer number of bugs and lack of meaningful improvements in Google products is a clear counterargument to the AI bull thesis

If AI was so good at coding, why can’t it actually make a usable Gemini/AI Studio app?

barnabee•2 days ago

I think Google might just be institutionally incapable of making good UX

hodgehog11•2 days ago

Most of these tests are one-prompt in nature. I've also noticed issues with the PDF reader in Gemini which was very frustrating, although it is significantly better now than it was even two weeks ago. On the contrary, now GPT-5 seems to be giving me issues.

In my experience, Gemini is the most insightful model for hard problems (particularly math problems that I work on).

Alifatisk•1 day ago

You know, with a bit of prompting, you can instruct Gemini to output the state of the conversation into a prompt that you can enter in a new chat and continue where you left off. But now with a fresh context window.

lazycatjumping•2 days ago

I gave up on Gemini 3.1 Pro in VSCode after 2 hours. They fully refunded me.

esperent•2 days ago

Yeah if I could use Gemini with pi.dev that would be my choice. But Gemini CLI is just so, so bad.

Imanari•2 days ago

Just tested it via openrounter in the Pi Coding agent and it regularly fails to use the read and write tool correctly, very disappointing. Anyone know a fix besides prompting "always use the provided tools instead of writing your own call"

rane•2 days ago

FWIW, works great in Claude Code.

https://api-docs.deepseek.com/guides/coding_agents#integrate...

tariky•1 day ago

If you have access to any other model it can create create pi extension that fixes problem. At least worked for me.

Imanari•1 day ago

Like a special parser? Would you mind elaborating?

tariky•1 day ago

It intercepts json commands and turns them in tool calls

abstracthinking•2 days ago

They have just released it, give it some time, they probably haven't pretested it with Pi

Imanari•2 days ago

How can they fix it after the release? They would have to retrain/finetune it further, no?

zargon•2 days ago

It's only in preview right now. And anyway, yes, models regularly get updated training.

But in this case, it's more likely just to be a tooling issue.

mark33vh•2 days ago

Yeah hope they fix this for PI

sergiopreira•2 days ago

DeepSeek is commoditizing frontier capability... Opus 4.6-level benchmarks at a fraction of the cost changes also who can access these tools.

Stuff that was prohibitive six months ago is now up for grabs. We keep on working on the infra level now, swithcing models whenever we run out of credits, or want a different result. The question is how do we build context, architecture and ensure the agent is effective and efficient..... wouldn't it be good if we simply used less energy to make these AI calls?

XCSme•about 20 hours ago

Their API issues seemed to have been resolved, now it does[0] as expected, similar to GLM 5 level.

[0]: https://aibenchy.com/compare/deepseek-deepseek-v4-flash-high...

mchusma•1 day ago

At first, I was more excited about the Flash model, but I'm now more excited about the Pro model in many ways. I feel like the Pro model with an Run through unsloth, and with some fine tuning, is gonna be enough for many vertical SaaS applications.

Where previously I was wary to under-provide the intelligence level, I'm now more excited about the idea of being able to give these pretty large intelligent models to my application. The idea that for basically sub-agents, we can fine-tune them, should reasonably expect to perform as well as Opus for a specific subtask of which my applications have many.

In other words, we can run a general-purpose intelligent model, Sonnet or Opus, orchestrating a fleet of, let's say, 30 to 50 of these sub-agents that have been fine-tuned. By doing that, I can get very low pricing versus something that would have occurred if I used Opus or Sonnet for everything.

embedding-shape•1 day ago

> The idea that for basically sub-agents, we can fine-tune them, should reasonably expect to perform as well as Opus for a specific subtask of which my applications have many [...] we can run a general-purpose intelligent model, Sonnet or Opus, orchestrating a fleet of, let's say, 30 to 50 of these sub-agents that have been fine-tuned

I've heard so many people saying this for the last year, and even tried doing it myself too, and never seen a successful application of it, nor succeeded myself either with SOTA models that are smart but slow or local models that are dumb but fast (even with beefy hardware).

What makes you believe this is possible in the first place? Every "swarm of agents" implementation I've seen only been able to produce lowest quality of code, most of the time vastly bloated, but surely you must have seen something working in practice that you could share with the rest of us?

dandaka•1 day ago

I guess it depends on a task. Opus is already spawning Sonnet/Haiku for simple tasks with a good success rate.

embedding-shape•1 day ago

I think "agent spawns weaker agent to do safe edit sometimes" is vastly different than the imagined "general-purpose intelligent model orchestrating a fleet of 50 sub-agents".

sergiotapia•2 days ago

Using it with opencode sometimes it generates commands like:

    bash({"command":"gh pr create --title "Improve Calendar module docs and clean up idiomatic Elixir" --body "$(cat <<'EOF'
    Problem
    The Calendar modu...

like generating output, but not actually running the bash command so not creating the PR ultimately. I wonder if it's a model thing, or an opencode thing.

CJefferson•2 days ago

What's the current best framework to have a 'claude code' like experience with Deepseek (or in general, an open-source model), if I wanted to play?

deaux•2 days ago

https://pi.dev/

TranquilMarmot•2 days ago

https://opencode.ai/

whoopdeepoo•2 days ago

You can use deepseek with Claude code

esperent•2 days ago

You can, but does it work well? I assume CC has all kinds of Claude specific prompts in it, wouldn't you be better with a harness designed to be model agnostic like pi.dev or OpenCode?

rane•2 days ago

I've been using all Kimi K2.6, gpt-5.4 and now Deepseek v4 (thought not extensively yet) in Claude Code and I can say it works much better than you'd expect. It looks like the system prompt and tools are pulling a lot of weight. Maybe the current models are good enough that you don't need them to be trained for a specific harness.

Alifatisk•2 days ago

You can use CC with other models, you aren’t forced to use Claude model.

0x142857•2 days ago

claude-code-cli/opencode/codex

wolttam•1 day ago

I'm impressed! I've been giving the various open-weight models a particularly gnarly (for my brain, at least) refactoring/cleanup task in my DIY coding harness[0] - essentially, de-spaghettifi the main chat view's update logic, which had grown organically since early 2024.

Kimi 2.6 went hard and left me with a buggy mess. GLM 5.1 hedged and made a 25 line change (but it was an improvement). DS V4 went hard, fixed its issues along the way, and left me with a significantly nicer codebase! (...that I will now be spending some time testing before releasing to the project)

[0]: lmcli (simple, Go, nice UX, MIT licensed, works well with DS V4) https://codeberg.org/mlow/lmcli

luyu_wu•2 days ago

For those who didn't check the page yet, it just links to the API docs being updated with the upcoming models, not the actual model release.

talim•2 days ago

Weights are on Huggingface FWIW. https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/tree/main

cmrdporcupine•2 days ago

My submission here https://news.ycombinator.com/item?id=47885014 done at the same time was to the weights.

dang, probably the two should be merged and that be the link

culi•2 days ago

there's no pinging. Someone's gotta email dang

cmrdporcupine•2 days ago

beh. instead of merging they just marked mine as dupe, even tho it was submitted at same time and had (for a long time) about the same votes and a better target page

storus•2 days ago

Oh well, I should have bought 2x 512GB RAM MacStudios, not just one :(

muyuu•2 days ago

Unironically curious about the performance of this model on unified VRAM machines.

storus•1 day ago

Probably usable for chat sessions, unusably slow for agentic coding.

Jgoauh•2 days ago

So there are 4 versions (2 models with 2 modes): Flash non thinking, Flash thinking, Pro non thinking, Pro thinking,

Are there comparisons between Pro non thinking and Flash thinking ? i don't really get the use case for Flash thinking and Pro non thinking

xnx•2 days ago

Such different time now than early 2025 when people thought Deepaeek was going to kill the market for Nvidia.

antirez•2 days ago

Actually the fact the inference of a SOTA model is completely Nvidia-free is the biggest attack to Nvidia every carried so far. Even American frontier AI labs may start to buy Chinese hardware if they need to continue the AI race, they can't keep paying so much money for the GPUs, especially once Huawei training versions of their GPUs will ship.

putlake•1 day ago

By "completely Nvidia-free" do you mean Nvidia wasn't used for training nor inference? Because if it's only inference, we know that Opus already can run on TPUs. Not to mention Gemini.

antirez•1 day ago

Yep but they don't run on Chinese hardware that is going to be available to everybody and will cost a lot less than NVIDIA stuff. So now you have a full non-US pipeline for AI, and soon they'll have the training GPUs as well.

eunos•2 days ago

That's like saying Raytheon would outsource building drones from Saheed makers (don't know who exactly).

Not gonna happen

Ifkaluva•2 days ago

They might still kill the market for NVIDIA, if future releases prioritize Huawei chips

jfxia•2 days ago

Is V4 still not a multi-modal model?

vitorgrs•2 days ago

Not yet... Which is a shame.

gzer0•1 day ago

Congratulations on the release to the DeepSeek team. An interesting note on the use of CSA and HCA: CSA provides higher-resolution, query-selected memory over 4-token compressed blocks, while HCA provides very low-resolution dense global memory over 128-token blocks. That could be a plausible reason to interleave them: CSA alone risks missing information if the indexer fails, while HCA alone is too lossy for precise retrieval. Still reading through the release, as usual, always appreciate the attention to detail in the technical papers.

lifeisstillgood•2 days ago

On a seperate note, I am guessing that all the new models have announced in the space of a few days because the time to train a model is the same for each AI company.

Which strikes me as odd - Inwoukd have assumed someone had an edge in terms of at least 10% extra GPUs.

namenotrequired•2 days ago

But why would they all start at the same time?

lifeisstillgood•2 days ago

Because they all (if my memory serves) did this release at the same time thing last time. I have not looked into it but I am guessing that not letting one model pull ahead for a month means everyone keeps up - which implies the “stickiness” of any one model is a lot lower than we think

jdeng•2 days ago

Excited that the long awaited v4 is finally out. But feel sad that it's not multimodal native.

Alifatisk•1 day ago

Was that expected?

impossiblefork•2 days ago

After testing this for understanding complex stories, text comprehension is definitely comparable to or better than Sonnet, and definitely better than Microsoft's free stuff. Opus is of course very impressive, especially with how Opus is set up with recursive calls that allow it to make rather complete things as if by magic, but the underlying model probably isn't incredibly much better than this.

clark1013•2 days ago

Looking forward to DeepSeek Coding Plan

Alifatisk•1 day ago

If they offer something close to Z.ai:s coding plan during Christmas, I’ll take it!

m_abdelfattah•2 days ago

I came here to say the same :) !

ls612•2 days ago

How long does it usually take for folks to make smaller distills of these models? I really want to see how this will do when brought down to a size that will run on a Macbook.

simonw•2 days ago

Unsloth often turn them around within a few hours, they might have gone to bed already though!

Keep an eye on https://huggingface.co/unsloth/models

Update ten minutes later: https://huggingface.co/unsloth/DeepSeek-V4-Pro just appeared but doesn't have files in yet, so they are clearly awake and pushing updates.

mohsen1•2 days ago

"2 minutes ago" https://huggingface.co/unsloth/DeepSeek-V4-Pro

EnPissant•2 days ago

Those are quants, not distills.

inventor7777•2 days ago

Weren't there some frameworks recently released to allow Macs to stream weights from fast SSDs and thus fit way more parameters than what would normally fit in RAM?

I have never tried one yet but I am considering trying that for a medium sized model.

simonw•2 days ago

I've been calling that the "streaming experts" trick, the key idea is to take advantage of Mixture of Expert models where only a subset of the weights are used for each round of calculations, then load those weights from SSD into RAM for each round.

As I understand it if DeepSeek v4 Pro is a 1.6T, 49B active that means you'd need just 49B in memory, so ~100GB at 16 bit or ~50GB at 8bit quantized.

v4 Flash is 284B, 13B active so might even fit in <32GB.

zozbot234•2 days ago

The "active" count is not very meaningful except as a broad measure of sparsity, since the experts in MoE models are chosen per layer. Once you're streaming experts from disk, there's nothing that inherently requires having 49B parameters in memory at once. Of course, the less caching memory does, the higher the performance overhead of fetching from disk.

zargon•2 days ago

> ~100GB at 16 bit or ~50GB at 8bit quantized.

V4 is natively mixed FP4 and FP8, so significantly less than that. 50 GB max unquantized.

inventor7777•2 days ago

Ahh, that actually makes more sense now. (As you can tell, I just skimmed through the READMEs and starred "for later".)

My Mac can fit almost 70B (Q3_K_M) in memory at once, so I really need to try this out soon at maybe Q5-ish.

EnPissant•2 days ago

Streaming weights from RAM to GPU for prefill makes sense due to batching and pcie5 x16 is fast enough to make it worthwhile.

Streaming weights from RAM to GPU for decode makes no sense at all because batching requires multiple parallel streams.

Streaming weights from SSD _never_ makes sense because the delta between SSD and RAM is too large. There is no situation where you would not be able to fit a model in RAM and also have useful speeds from SSD.

zozbot234•2 days ago

These are more like experiments than a polished release as of yet. And the reduction in throughput is high compared to having the weights in RAM at all times, since you're bottlenecked by the SSD which even at its fastest is much slower than RAM.

the_sleaze_•2 days ago

Do you have the links for those? Very interested

inventor7777•2 days ago

Sure!

Note: these were just two that I starred when I saw them posted here. I have not looked seriously at it at the moment,

https://github.com/danveloper/flash-moe

https://github.com/t8/hypura

sibellavia•2 days ago

A few hours after GPT5.5 is wild. Can’t wait to try it.

yanhangyhy•2 days ago

somehow i canot open the link. but in their chinese version's release article, in the end ,there is a quote from xunzi(https://en.wikipedia.org/wiki/Xunzi_(philosopher))

"Not seduced by praise, not terrified by slander; following the Way in one's conduct, and rectifying oneself with dignity." (不诱于誉，不恐于诽，率道而行，端然正己)

(It is mainly used to express the way a Confucian gentleman conducts himself in the world. It reminds me of an interview I once watched with an American politician, who said that, at its core, China is still governed through a Confucian meritocratic elite system. It seems some things have never really changed.

In some respects, Liang Wenfeng can be compared to Linux. The political parallel here is that the advantages of rational authoritarianism are often overlooked because of the constraints imposed by modern democratic systems. )

muyuu•2 days ago

Sounds a lot like taoism, but i guess there's overlap

taosx•2 days ago

MErge? https://news.ycombinator.com/item?id=47885014

armanj•1 day ago

I have a few lightweight apps using deepseek api, and funny how the initial credit I topped up for using r1 is still left. Nothing makes the user happier than getting more for less. cc: anthropics with its fancy token-wasting claude code "features"

aeagentic•about 14 hours ago

Not like on Openai where the credits just expire

thefounder•2 days ago

They still don’t support json schema or batch api. It’s like deepseek does not want to make money

kiproping•2 days ago

What do you currently use for json and batch, I was doing some analysis and my results show that gpt-oss-120b (non batch via openrotuer) is the best for now for my use case, better than gemini-flash models (batch on google). How is your experience?

aliljet•2 days ago

How can you reasonably try to get near frontier (even at all tps) on hardware you own? Maybe under 5k in cost?

mordae•2 days ago

Look at GB/s.

Strix halo has 256 GB/s bandwidth for $2500. The Flash model has 13 GB activations.

256 / 13 = 19.6 tokens per second

Except you cannot fit it into the maximum RAM of 128 GB Strix Halo supports. So move on.

Another option is Threadripper. That's 8 memory channels. Using older DDR4-3200 you get roughly 200 GB/s. For $2000.

200 / 13 = 15.4 tokens per second

But, a chunk of per-token weights is actually always the same and not MoE, so you would offload that to a GPU and get a decent speedup. Say 25 tokens per second total.

Then likely some expensive Mac. No idea.

Eventually you arrive at a mining rig chassis with a beefy board and multiple GPUs. That has the benefit of pipelining. You run part of the model on one GPU and move on, so another batch can start on the first one. Low (say 30-100) tps individually, but a lot more in parallel. Best get it with other people.

revolvingthrow•2 days ago

For flash? 4 bit quant, 2x 96GB gpu (fast and expensive) or 1x 96GB gpu + 128GB ram (still expensive but probably usable, if you’re patient).

A mac with 256 GB memory would run it but be very slow, and so would be a 256GB ram + cheapo GPU desktop, unless you leave it running overnight.

The big model? Forget it, not this decade. You can theoretically load from SSD but waiting for the reply will be a religious experience.

Realistically the biggest models you can run on local-as-in-worth-buying-as-a-person hardware are between 120B and 200B, depending on how far you’re willing to go on quantization. Even this is fairly expensive, and that’s before RAM went to the moon.

zargon•2 days ago

Flash is less than 160 GB. No need to quantize to fit in 2x 96 GB. Not sure how much context fits in 30 GB, but it should be a good amount.

redrove•2 days ago

It seems to be 160GB at mixed FP4+FP8 precision, FYI. Full FP8 is 250GB+. (B)F16 at around double I would assume.

awakeasleep•2 days ago

The same way you fit a bucket wheel excavator in your garage

floam•2 days ago

Very carefully

zozbot234•2 days ago

Run on an old HEDT platform with a lot of parallel attached storage (probably PCIe 4) and fetch weights from SSD. You'd ultimately be limited by the latency of these per-layer fetches, since MoE weights are small. You could reduce the latencies further by buying cheap Optane memory on the second-hand market.

datadrivenangel•2 days ago

A loaded macbook pro can get you to the frontier from 24 months ago at ~10-40tok/s, which is plenty fast enough for regular chatting.

542458•2 days ago

The low end could be something like an eBay-sourced server with a truckload of DDR3 ram doing all-cpu inference - secondhand server models with a terabyte of ram can be had for about 1.5K. The TPS will be absolute garbage and it will sound like a jet engine, but it will nominally run.

The flash version here is 284B A13B, so it might perform OK with a fairly small amount of VRAM for the active params and all regular ram for the other params, but I’d have to see benchmarks. If it turns out that works alright, an eBay server plus a 3090 might be the bang-for-buck champ for about $2.5K (assuming you’re starting from zero).

jdoe1337halo•2 days ago

More like 500k

frozenseven•2 days ago

Better link:

https://news.ycombinator.com/item?id=47885014

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

namegulf•2 days ago

Is there a Quantized version of this?

mordae•2 days ago

They have released mixed fp8/fp4 for efficiency. It's still hundreds of gigabytes, though. Give up on local for these.

namegulf•1 day ago

That's right, need a lot of GPUs + memory, anybody experimented with Mac Studio M3 Ultra for this?

yanis_t•2 days ago

Is there a harness that is as good as cloud code that can be used with open weight models?

barnabee•2 days ago

I prefer OpenCode over Claude Code, and it works with basically everything. Give it a try. ymmv

sixhobbits•2 days ago

Try pi coding agent!

Numerlor•2 days ago

I've liked Hermes agent, but never used Claude code so don't know how it compares

laurentiurad•2 days ago

Try Opencode or Comrade. Both OSS and working great with OSS models too.

npodbielski•2 days ago

Never used Claude myself but there are agents that can use local model. I.e. - Jetbrains Junie - Mistral Vibe

Grp1•2 days ago

DeepSeek’s docs say V4 has a 1M context length. Is that actually usable in practice, or just the model/API limit?

Codex shows ~258k for me and Claude Code often shows ~200k, so I’m curious how DeepSeek is exposing such a large window.

lucrbvi•2 days ago

They have added a lot of optimization focussing on the KV-cache, so they can have a much larger window without eating all the VRAM.

The 1M window might be usable, but it will probably underperform against a smaller window of course.

nba456_•2 days ago

Wow, never seen a post with so many comments posted overnight like this.

mentos•2 days ago

Me neither makes me question now if all of these comments are botted

kilroy123•2 days ago

Yes and the vibe seems off to me.

GuardCalf•2 days ago

I like this. The more competitors there are, the more we the users benefit.

Aldipower•1 day ago

Where or how can I use this model with a DPA and better privacy terms? Are there EU friendly hosters already? Would love to use it.

dryarzeg•1 day ago

> better privacy terms

DeepInfra, as far as I'm aware, doesn't log your prompts and doesn't retain them in most cases, except "debugging purposes". As their per their privacy policy[1]: "We understand that the inputs you provide to our API and the outputs it generates may contain your Personal Information. We will not store, sell, or train using this data unless we have your explicit consent. We might sometimes store, for a limited period of time, the inputs and outputs to API calls for debugging purposes."

They're not EU-based, though. And I'm not sure how "private" their inference actually is. The throughput is also not the best everywhere, sometimes it can be really slow (although right now both DeepSeek-V4 models seem to be doing fine). However, they have a good pricing, probably on of the best on the market.

I'm not affiliated with them in any way, but when I want to test (I'm not a power user of LLMs, chatbots and agents, not at all; I'm doing it just out of the curiosity) something that is too big for my local hardware, DeepInfra is usually being my go-to provider.

[1] https://deepinfra.com/privacy

bandrami•2 days ago

I don't mind that High Flyer completely ripped off Anthropic to do this so much as I mind that they very obviously waited long enough for the GAB to add several dozen xz-level easter eggs to it.

cedws•2 days ago

He who is a ripper off-er cannot be ripped off.

KaoruAoiShiho•2 days ago

SOTA MRCR (or would've been a few hours earlier... beaten by 5.5), I've long thought of this as the most important non-agentic benchmark, so this is especially impressive. Beats Opus 4.7 here

Oxlamarr•2 days ago

The speed of progress here is wild. It feels like the hard part is shifting from having access to a strong model to actually building trustworthy systems around it.

biglyburrito•1 day ago

I wonder how long it will take China to respond to the release of Mythos and what their response will look like.

steveharing1•1 day ago

This one seems really impressive acc to bench scores but for me GLM 5.1 is still on top of every other open model so far

carrja99•1 day ago

Heh, my eighty year old neighbor uses DeepSeek. Everytime we catch up she tells me about all the new uses she has for it.

ksymph•1 day ago

Same with my parents! It's the only one they use. I think the simple and stable web interface goes a long way; the ChatGPT site (for example) bombards you with popups, new buttons, and opaque daily limits, while DeepSeek's is pretty consistent and straightforward.

DeepSeek also tends to follow prompts more closely IME, plus the thinking is shown, so I think it's able to register as a 'tool' more easily for the non-tech-inclined for whom that appeals.

dannyw•2 days ago

Are there better providers for inferencing this right now? I know it's launch day, but openrouter showing 30tps isn't looking great.

DennisP•2 days ago

No CUDA, 1.6T parameters but with 49B active...does that mean you can run it efficiently on a 64GB macbook?

segmondy•2 days ago

no, you need as much ram as the total model. But it means you can load the most important tensors in a smaller GPU. So you can run it on a PC with say 2 32gb rtx 5090 and 1tb+ of system ram.

leodavi•2 days ago

Probably not. The active parameter set may change from token to token, based on my understanding of MoE, so you'd be streaming (at the worst case, unlikely for a real scenario but frames the problem) 49B parameters from SSD for every output token...

reenorap•2 days ago

Which version fits in a Mac Studio M3 Ultra 512 GB?

simonw•2 days ago

The Flash one should - it's 160GB on Hugging Face: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/tree/ma...

ycui1986•2 days ago

So, dual RTX PRO 6000

swrrt•2 days ago

Any visualised benchmark/scoreboard for comparison between latest models? DeepSeek v4 and GPT-5.5 seems to be ground breaking.

apexalpha•2 days ago

This FLash model might be affordable for OpenClaw. I run it on my mac 48gb ram now but it's slowish.

flyingsquirrel_•1 day ago

Which one is better DeepSeek v4 or GLM 5.1 or opus 4.7 or gpt 5.5?

mariopt•2 days ago

Does deepseek has any coding plan?

jeffzys8•2 days ago

coolThingsFirst•2 days ago

I got an API key without credit card details I didn’t know they had a free plan.

nstj•1 day ago

Let's also not forget SoTA models stole from us.

JonChesterfield•2 days ago

Anyone worked out how much hardware one needs to self host this one?

cztomsik•2 days ago

So is this the first AI lab using MUON for their frontier model?

hodgehog11•2 days ago

No, Muon was developed by Moonshot; they've been using it in their Kimi models since Kimi K2 in 2025.

cztomsik•2 days ago

Jordan Keller worked at Moonshot? Or am I missing something? I thought he is the original author. https://x.com/kellerjordan0/status/1842300916864844014

hodgehog11•1 day ago

I was wondering whether someone would bring this up :-).

Yes, you're absolutely right, and no, Jordan Keller does not work for Moonshot. He is the original author of the algorithm, so credit goes to him.

There's a lot of legwork to go from prototyping to proper development though. The reason I said what I did is because Moonshot has the first research publication on it that I'm aware of. Could definitely have used better language though, my apologies to Jordan!

gigatexal•2 days ago

Has anyone used it? How does it compare to gpt 5.5 or opus 4.7?

periodjet•1 day ago

Very exciting. Amazing work. The CCP shilling on this board has reached epidemic proportions though, and is shocking to witness.

tcbrah•2 days ago

giving meta a run for its money, esp when it was supposed to be the poster child for OSS models. deepseek is really overshadowing them rn

alpineman•2 days ago

Meta is totally directionless

8note•1 day ago

so why is a model release just a politics thread?

is this not cool tech, available for use?

i look forward to seeing what gets made on top of deepseek 4, more than what it means for US politics.

especially with how open deepseek is with its advancements, im excited to see how they get applied into sota western models

fbrncci•2 days ago

Take that Anthropic and your shenanigans.

cl08•2 days ago

Any way to connect this to claude code?

showmexyz•2 days ago

As posted below https://api-docs.deepseek.com/guides/coding_agents#integrate...

mordae•2 days ago

It's literally in the linked docs.

kittikitti•1 day ago

This is a great model from DeepSeek and I look forward to seeing the developments from this. I am also very frustrated that American states, corporations, and organizations have banned DeepSeek models or made them illegal. It considerably restricts my AI operations and the ability to conduct research and development. As someone who hosts open-source models with compute resources available to serve DeepSeek V4, it brings considerable risk just because I am in America.

I hope that DeepSeek wins the AI race or at least gets ahead to the point where it becomes infeasible for bans and regulations against it. It's ridiculous that American legislators are advocating for less regulations for DeepSeek except for their own racist ideas about which AI should be approved or not.

ascii0eks84•about 21 hours ago

Someone did a simple "Count 10 starting from 11" and it got stuck.

cubefox•2 days ago

Abstract of the technical report [1]:

> We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: (1) a hybrid attention architecture that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to improve long-context efficiency; (2) Manifold-Constrained Hyper-Connections (mHC) that enhance conventional residual connections; (3) and the Muon optimizer for faster convergence and greater training stability. We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline that unlocks and further enhances their capabilities. DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, redefines the state-of-the-art for open models, outperforming its predecessors in core tasks. Meanwhile, DeepSeek-V4 series are highly efficient in long-context scenarios. In the one-million-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. This enables us to routinely support one-million-token contexts, thereby making long-horizon tasks and further test-time scaling more feasible. The model checkpoints are available at https://huggingface.co/collections/deepseek-ai/deepseek-v4.

1: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

tariky•2 days ago

Anyone tried with make web UI with it? How good is it? For me opus is only worth because of it.

augment_me•2 days ago

Amaze amaze amaze

neuroelectron•1 day ago

Shouldn't there be a hyper context view model context protocol standard?

zurfer•2 days ago

lots of great stuff, but the plot in the paper is just chart crime. different shades of gray for references where sometimes you see 4 models and sometimes 3.

dackdel•1 day ago

god bless deepseek

ghstinda•2 days ago

so many models not enough time

luew•2 days ago

We will be hosting it soon at getlilac.com!

casey2•2 days ago

Already over a billion tokens on open router in under 5 hours

punkpeye•2 days ago

Incredible model quality to price ratio

npv789•1 day ago

my current default model now, bye gpt 5.5

Rover222•1 day ago

Quite jarring to see how many people think the Chinese authoritarian regime, and the tech that it allows to be created in that country, are going to be "safer" or whatever than US tech.

It's trendy to say the US govt is now authorization, but that's just pure naïve groupthink.

mike_hearn•1 day ago

It's just the anti-Americanism that has typified the Euroleft for decades. You can find people complaining about it back in the 1800s. As can be seen by how much American product Europe consumes it's not actually an influential mode of thought, just a form of ingroup signalling, so it can largely be ignored.

Rover222•about 19 hours ago

But it's now mainstream thought on the left in America.

tehjoker•1 day ago

American security services can touch Americans, Chinese ones can't. That's even assuming the worst about China, which I don't think is appropriate.

hongbo_zhang•2 days ago

congrats

donbreo•2 days ago

Aaaand it cant still name all the states in India,or say what happened in 1989

mordae•2 days ago

Ask Claude how to overthrow a Nazi dictatorship in the US.

inspector14•2 days ago

easy, you buy twitter and let people speak freely again

gn_central•1 day ago

How does this actually perform in real-world usage? Benchmarks look strong, but I’m curious about latency and stability.

dhruv3006•2 days ago

Ah now !

howmayiannoyyou•1 day ago

More fawning over Chinese models without any mention of data privacy, or how this AI may someday be used to undermine US national or economic security. HN is hopelessly compromised by anti-American sentiment.

shafiemoji•2 days ago

I hope the update is an improvement. Losing 3.2 would be a real loss, it's excellent.

sheeshkebab•2 days ago

Ask it if there was a Tiananmen square massacre. Then decide if you really want to be part of this murderous propaganda.

segmondy•2 days ago

I bet you don't use any Chinese made product. Everything you own was not made in China. Please reply and let us know.

raincole•2 days ago

History doesn't always repeat itself.

But if it does, then in the following week we'll see DeepSeek4 floods every AI-related online space. Thousands of posts swearing how it's better than the latest models OpenAI/Anthropic/Google have but only costs pennies.

Then a few weeks later it'll be forgotten by most.

sbysb•2 days ago

It's difficult because even if the underlying model is very good, not having a pre-built harness like Claude Code makes it very un-sticky for most devs. Even at equal quality, the friction (or at least perceived friction) is higher than the mainstream models.

raincole•2 days ago

OpenCode? Pi?

If one finds it difficult to set up OpenCode to use whatever providers they want, I won't call them 'dev'.

The only real friction (if the model is actually as good as SOTA) is to convince your employer to pay for it. But again if it really provides the same value at a fraction of the cost, it'll eventually cease to be an issue.

throwa356262•2 days ago

    "If one finds it difficult to set up OpenCode to use whatever providers they want, I won't call them 'dev'."

I feel the same way. But look at the ollama vs llama.cpp post from HN few days back and you will see most of the enthusiasts in this space are very non technical people.

2ndorderthought•2 days ago

You can literally run it from Claude code. Easily too

cmrdporcupine•2 days ago

They have instructions right on their page on how to use claude code with it.