ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
76% Positive
Analyzed from 12066 words in the discussion.
Trending Topics
#more#models#don#model#fast#faster#code#speed#chinese#something

Discussion (465 Comments)Read Original on HackerNews
https://openrouter.ai/deepseek/deepseek-v4-pro?sort=throughp...
It's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.) and when you tell it to actually do those changes it's pretty much done in half an hour
Finally he convinced it to try. It one shotted it in 30 seconds.
Turns out the agents' idea of what is hard and easy also comes from Common Crawl.
those estimates are based on previous human estimates (the datasets it's been trained on).
unironically, when your comments will become part of a dataset, LLMs will likely get much better at estimating.
now that i think about it, all these writings about LLMs will give LLMs something much like meta-cognition.
Basically I never have to wait - yes I have to tell it little corrections occasionally (but I know the domain really well so that's not an issue), but it's so much faster than anything else it's kinda crazy. I love the super fast speeds with high involvement development cycle.
I actually enjoy using agentic development flows for the first time now - whereas with Claude I absolutely hated it. That 5 to 20 min wait after every prompt absolutely killed my desire to even want to work at all.
the way software engineering works these days reminds me a lot of factory workers on production lines that just sit in front of a production line all day and take out faulty items and/or perform a single step in the production of goods.
GamersNexus had a really good investigative piece (~3hrs long) on this where they went to China and met with grey market sellers. That piece absolutely pissed off NVidia and resulted in a fight with Bloomberg too.
Deepseek may be also be running inference on oodles of Chinese hardware but it wouldn’t surprise me for a second if they just acquired Blackwell chips through the grey market. The original Deepseek models were all trained using NVidia chips if I remember right.
But truly, using Cerebras at ~2k tokens/s, with very low latency is like a vision into the future. You start to rework your workflow around things that can happen without onerous manual review - stating the conditions for success, etc. It's rare that I have a problem that maps well to that, but I expect this is where things are headed.
Of course the fast models tend to not be the SOTA ones, but if that was the case - high quality and near-instant thinking, that's a game changer that I don't think we're really prepared for. The things that get unlocked with higher-than-reasonable speed become very interesting.
This is normal interactive UI for tasks that aren't compute-intensive. Programs spend most of their time idle, waiting for us to click a button. We shouldn't be waiting for them or spinning more plates to keep them busy.
However, a faster llm isn't enough. You also need fast compiles and fast tests.
I haven’t tried cerebras’ 3000 TPS yet but I did try the demo of that 15,000 TPS model whose name escapes me right now.
I’m not sure if it makes a meaningful difference for my actual work, but it sure is amazing to watch it generate a screen full of text in the blink of an eye.
I do think it’s super useful for rubbing little validation checks like showing it a diff to ensure that the changes are on task, and being able to do those quicker really helps because it means you can do many focused checks without them getting in the way.
Don't get me wrong though, that demo is still incredibly impressive & makes me very much excited for the hardware-based model era (potentially) ahead.
Once you've experienced those speeds, you really start to think about the whole class of things that becomes possible; massively parallel decode paths, extensive reasoning loops, etc…
The speed is incredible and fun to see, but the model is rather weak to the point where I’m not sure it’s particularly useful for most people.
You were likely thinking of AI accelerator startup Taalas.
Previous HN discussion: https://news.ycombinator.com/item?id=47086181
Then I ask it to do something else and it goes off-road and where I used to be able to interject with a "wow wow wow, that's not right", by the time I see the text on screen and react it's already made massive changes. Short of making it commit between every edit it's hard to prevent it from going wrong as quickly as it goes right (and even then, it can make a boo-boo on a remote API too depending on how much privilege it has).
Basically the entire token-maxxing AI hype train in a nutshell. Lovely!
So long as AI lives in server farms, humans will be needed for tasks in the physical world.
It's only if we combine AI with robots that things get really dicey.
https://en.wikipedia.org/wiki/I_Have_No_Mouth,_and_I_Must_Sc...
(I should go measure this now, I'm curious)
We need to really worry when we get amazing results very fast.
Giving directions and verifying its output? But my mental capacity is still limited. I can make way more prompts, than I can read code.
There can't be many normal use cases where there'd be any cost benefit.
It's a cute toy right now, but you can tell an LLM that it's an http server, and have it respond directly to a web browser hitting it. It generates headers in response, as well as page contents. As 1000 tok/sec becomes three new normal, we will come up with newer ways to use it outside of toy fiction encyclopedias.
I'm not saying there aren't any use cases for super-fast (and super-expensive) generation, but it does seem a bit niche. If it was free then sure faster is better, but what are the mainstream use cases where people might pay 3x more for a faster version of something that is already fast?
I think it would have to be an application where it paid for itself - where the 10x faster response was actually worth more than 3x the cost to you - where the extra speed was worth the extra cost.
I dont doubt it, but I don't think you can spawn 10 copies of yourself working simultaneously.
It will go much faster.
Doing non trivial work.
So, if any, I would say it's worse for us. Obviously, it's the completely opposite situation for corporations and executives: they are loving the AI situation so much!
Build and test would move back into the critical path, though, and for some projects that will take effort to bring down.
I am on Dutch subreddits a lot, to get a local pulse and not to be too HN minded.
A lot of them would have vilified you by now. Some even would have even questioned your morality.
Again, I agree with you. But clearly not everyone has this view.
It also makes me think about the temptation to stop thinking with these tools, i.e. "cognitive surrender". Addy Osmani wrote a nice blog post about this: https://addyosmani.com/blog/cognitive-surrender
If you start the AI on something big and come back after one hour then yes, you might discover that you wasted an hour and got nothing.
I’m excited for ultrafast AI. It likely means less temptation to multi-thread and deeper flow in single sessions.
Very often I do catch LLMs, even the best such as Opus, confidently saying wrong things about areas in theory I know little of. And sometimes I fail to catch them and only realize that later on….sort of like…how I learned my whole career? So many wrong abstractions, tools, and so many hard earned lessons. With LLMs it’s the same, but the process is much faster. For critical decisions I don’t blindly trust an LLM, for example.
For domains whete SoTA is constantly changing like AI, I use LLMs to aggregate and interact with my own research from trusted sources ala Karpathy LLM wiki.
I don’t generally trust everything I read on the internet whether its AI generated or not. I do my own research for the things that matter to me.
Also, with the added speed I can produce things more in line with the quality I’ve always wanted to add (many more tests, for example).
Consider that our ability to evaluate quality of the output is falling further behind our ability to produce it. The “right answer” is not the most likely outcome.
The thing I really love about working with computers is when I achieve something. That's the thing that makes me figuratively, and sometimes literally, throw my fists into the air and go "Yeaaah!"
With the AI tooling, I'm getting those more like a couple times a week.
Plus, I'm using AI to attack the things in my day that are "a drag", and getting them done too.
The highs are more frequent and the lows are not so low.
It feels like it cheapens the whole thing. Maybe I'm just old, because I remember people saying the same thing about code completion in Visual Studio back in the late 90s.
This is so much more than code completion, though.
It felt like that, kinda, for a bit. Now whenever it does something for me I get nothing. I didn’t do it… the chatbot did. What’s for me to celebrate? How can there be any real pride or satisfaction for a thing that was just handed to me because I asked for it?
If anything it diminishes my satisfaction looking back on previous projects. They’re “a few hours with a chatbot”, now.
The things I had to learn and the informed decisions I had to make? All pointless trivia, now. A child could do it.
The magic and possibilities parts just all wore off after a heavy run, and I don’t know if that’s ever coming back.
Equity / profit sharing should be commonplace in the age of AI.
If you're treating it like a slot machine you're doing it wrong. It will give you exactly what you ask for if you ask clearly, i.e. write a clear, detailed specification, not just "do X!". The nondeterminism comes from vagueness in specification.
First make it write a contract (REQ/ARCH/IMPL documents). Skim through those for any mistakes.
Then based on those ask it to write tests. Again skim through them.
Now you have a context full of guardrails. It’s less likely to surprise you.
i've a Github copilot yearly subscription. Microsoft recently changed their billing to based on token. i'm still getting billed per premium request but GPT 5.4 is now 6x compare to 1x before.
My current workflow involves going from PRD -> execution plan -> build -> review, and this works nicely with open weight models like GLM 5.1, Kimi K2.6, and DeepSeek V4 Flash. With Opus I can generally skip the PRD entirely, and sometimes even skip the plan, and 80-90% of the time it does exactly what I want. But that can easily burn $5-15 for one feature, whereas it'll cost maybe $1-2 with the open weight models (at API pricing).
This is at least my experience with Claude Code as harness. Also, GLM pricing is not that far off from Claude. It's cheaper but not DeepSeek cheap.
30 day eval for each.
I genuinely don't understand what moat these US model labs have. If they're saying recursive self improvement is just around the corner and Chinese labs are only slightly behind the leading US models, what moat does the US labs have? Are the US models going to recursively self improve better than the Chinese open source ones or something?
I might be completely wrong about this, but if I had money in OpenAI or Anthropic I'd be pulling it all right now. I think the chance of them going to near-zero over the next few years is very significant.
Or Google. I'm working with multiple customers right now that are very pissed at Google for deprecating Gemini 2.5 Flash, canning the GA release of 3.0 Flash and now have to decide whether to bite the bullet of the 5x price increase for 3.5 Flash or switching providers. Quite a few of them will likely fully pivot to open models.
For non subsidized plans? Pretty sure they'd need to put this in ToS, or law suites would have followed by now.
Sometimes Opus just gives me a rubbish session.
2. They are doing lots of shady stuff that would have gotten someone else banned from visa/mastercard. Your paid off plan literally changes after billing...
I think people are letting them fly for now, because if it turns out true that they'll have AGI they want to be on their good side? We might see the knifes getting pulled otherwise.
On HN China is seen as a cheap labor copycat. This used to be a fair approximation at some point in the past. In my opinion China is getting ahead of everyone else much more than US used to be.
SF is a beautiful thing in the US, vast power and wealth comes from there. Smart people collaborating communicating and building fast and with excitement. China did SF kind of thing for many different sectors in many different places.
The $0.87/M tokens price for Mimo Pro is probably subsidized.
Mimo models aren't widely available on western providers, but Kimi and Deepseek are similar sizes and cost about the same to run. They are priced $3-$4/M tokens (which is right were Google's very confused range of Flash models are priced at: between $0.40/M tokens and $9/M tokens depending on exactly which model - and you don't want the $9 one!).
Anthropic overprices Sonnet (probably because of their capacity issues). GPT 5.4 mini is $4.50/M tokens.
https://docs.fireworks.ai/serverless/pricing
https://www.together.ai/pricing
Mimo is also widely available on western providers. It's on openrouter and you can sign up with Xiaomi directly for a token plan on an English website priced in dollars.
It was pretty clear the USA won World War 2 because it out produced and out innovated everyone else. Probably with that in mind, after World War 2 the USA adopted the "Vannevar Bush" model, summarised in this picture: https://www.researchgate.net/figure/annevar-Bushs-Science-th... The idea is to jump start R&D through public funding. The hoped for outcome was that R&D feed private enterprise, leading to a productivity boom.
The boom happened, and the USA did seem to out-compete everybody else in R&D, science, and the products they delivered for decades after that.
That way of doing things seems to have faded over time in the USA. The decline seemed to coincide with the rise of Neo-econmics, and now of course it's been obliterated by Trump. He's very keen to fund Intel to produce chips in a year or two's time (which is something the stock market and banks do perfectly well), but funding basic science is getting drastic cuts.
Still other countries noticed the rise of the USA, and some adopted similar funding models for basic R&D. China seems to have picked it up with gusto, both subsidising R&D and STEM training, leading to huge numbers of engineers and scientists. Whether it will lead to an economic boom remains unknown, but acceleration of ideas and innovations coming out of China seems undeniable. More recently, Ukraine showered its local engineering garages with funds in the hopes of getting a similar outcome to the USA in WW2. It looks like it worked. If the Iran war continues, it's entirely possible arms trade will reverse: the USA could well start buying drones off Ukraine.
It’s not even close to frontier meaning it’s the best intelligence.
It is another thing the BigLabs accuse open weight models of benefiting from distillation & other techniques & essentially avoid higher training costs (which typically bleed into bills end users pay for inference).
Ex A: https://www.anthropic.com/research/2028-ai-leadership
Ex B: https://www.reuters.com/world/china/openai-accuses-deepseek-...
In this case, at least it’s threatening multimillion dollar salary jobs instead of entire towns of working class people in America or Mexico.
And the Chinese labs actually release their weights. You could call it… open AI.
Data at https://gertlabs.com/rankings
MiMo v2.5 is on there, as well as the pro version.
We found a few anomalies in our evaluations, which makes sense -- if every new sub-release is better across the board in every area of the model card, that should raise alarms about benchmaxxing. But the main thing we found is that hype != performance, and I trust our benchmark methodology significantly more than the model cards the labs add to their press releases.
Flash handles it fine, which I found amusing. (Since Mimo is supposed to be opus level!) But Flash seems to work even better in Claude Code...
With smaller models I always have the issue of needing to adapt myself to their preferred workflow... which sort of defeats the purpose. Price is hard to beat tho :)
Discussions about choosing a library with the best syntactic sugar method naming is just as crazy as suggesting we type in assembly.
This strategy will seem to work really well until the economy that enabled that foundation to form is hollowed out. Then, there will be a reckoning (but we will have no choice but to march forth from there).
I'm not agreeing or disagreeing with you, but my brain cannot comprehend how machines can advance such interconnected systems while keeping humans in focus.
Perhaps I shouldn't have watched the Animatrix again.
There will only be a reckoning if models don't get much better.
If they do get much better you can just have them refactor, fix bugs in, or replace the existing codebase.
The concept of tech debt is sort of meaningless if you anticipate intelligence gains in models to continue.
If you haven't seen it, I think you would appreciate the film Margin Call.
In software + GenAI now every housewife can build some App over evening.
Especially as teams invest in proper agentic harnessing.
We have had a champion in our team that has invested a lot of time into it over the last 4 months, and if anything, quality has improved, not decreased. Architecture is more coherent, codebase has been cleaned up, agents find information quickly, code produced is very solid and my role is more and more checking that the output meets the requirements. But I cannot confidently say that I would've done a better job than AI more often than not I have to admit it does a better job than mine.
The mistakes are less and less technical and merely in the domain mapping. And AI is still not creative as I am for finding solutions quickly to unlock stakeholders' issues. Also, AI is still not creative as I am for finding the proper solutions for advanced technical problems. But it does a better job than me, even on that front, one shotting few solutions in a fraction of a time it would've taken me to test one idea myself.
Mind you, I don't like AI and I think it ruined the job, I don't like working this way, it's exhausting, way more work on one side, way less fun and fiddling with technical parts.
And yet, I have the genuine belief that few years from now we'll be cloning open source repositories that are already optimized/harnessed and tested for agentic loops and best practices left and right with software engineers mostly overseeing the domain translation and putting their 2 cents on the non-boilerplatey parts of the product (which, in general, are a small part of the surface).
I think that the next years of my career will be mostly spent in setting up and writing the harnessing and domain mapping part. Then I will move to another sector, not because I necessarily believe I won't have a job, but because I want to vomit thinking that's going to be my job.
"Watching John with the machine, it was suddenly so clear. The terminator would never stop. It would never leave him, and it would never hurt him, never shout at him, or get drunk and hit him, or say it was too busy to spend time with him. It would always be there. And it would die to protect him. Of all the would-be fathers who came and went over the years, this thing, this machine, was the only one who measured up. In an insane world, it was the sanest choice."
As long as you've indicated what you want, the machine will try to do what you ask of it. It won't get tired because "the codebase is too big", or it has gotten bored of the pattern, or it wants to introduce a new technology.
It just does the thing you asked of it. (note, that yes, I get that as a codebase size increases, it might make it more difficult to fit into context, but that only applies if it needs to read a large percentage of the project to implement the task, which shouldn't be the case.
there are good actors, which are empowered by AI to produce positive impact, but often there are N times more bad actors, which push crappy code to close feature requests fast, increase performance LoC-like metrics, etc.
> No one cares anymore.
I never cared about this.
I think this captures something that I've been searching for the words for. (Maybe I should have gotten an LLM to write the words for me.) Some of the biggest AI boosters are the kind of dev that would have cared about the new frameworks of the last 3 months. They had a "the framework does all the thinking for me" attitude already, so it is easy for AI to slot into that.
It's going to skip the code entirely for small businesses and just render UIs straight from context data and prompts at interactive speeds. Kind of like Google's Genie does with games but much more accurately.
it needs to win marketing landscape, hyper-overcrowded by thousands of competitors, slop-gened over weekend.
I have a more hopeful take. As AIs improve and get faster we can more quickly and iteratively improve code which we may have historically avoided due to the work involved.
I know i've made several refactors that would have otherwise been insane lifts. Not only because the work involved but because sometimes you don't know if it will work, and so you have a sort of double friction; you don't know if it will even succeed. With an AI you can just throw it at the refactor to see if it runs into a problem all while you're having a coffee break or w/e.
In general AI is going to enable humanity to be more extreme versions of itself. For good and bad. I suspect more bad than good, though.
If you extract the spec from first implementation and reimplement from scratch you get a free testing oracle. Where they diverge you send the agent to decide which one had a bug.
VibeOS — Fully Hallucinated Operating System
https://www.youtube.com/watch?v=z3pV6FHvcgM
For a while I was running Cerebras GLM 4.7 for a bunch of tasks. Not a very smart model, but it's fantastic to be have a live prototype of a site up and be able to type "make the fonts bigger. No not that big" and see it change in real time. And MiMo 2.5 is a lot more capable than GLM 4.7.
MiMo 2.5 is not the same model as MiMo 2.5 Pro.
GLM 5.1 is z.ai's lastest iteration & is one of the popular open weight coding models.
If you've had the chance, how does GLM 5.1 (which is now more expensive than MiMo 2.5 Pro after its recent 70% price drop) compare?
But quite a bit more expensive than MiMo 2.5 Pro. Like 5x to 10x more on my little tests, at least by the API rates.
> On the model side, we applied FP4 quantization
> introduced DFlash, an efficient speculative decoding method based on block-level masked parallel prediction
> On the system side, TileRT perfectly adapts to the dynamic characteristics of these algorithms
> 1000+ tokens/s output [...] using just a single standard 8-GPU commodity node
Not nearly as obvious as the ones from 6 months ago, but seems to be more the use of hyperbolic phrasing in a particularly unnatural way.
The assess/explain, then hyperbole at the end kind of structure.
Top comment looks suspicious from this perspective, but it's kind of a losing battle to be able to differentiate them with sufficient accuracy anyway
$2.61/M tokens * 1,000 tok/s = $9.40/hr
That would be pretty cheap for an 8-GPU node which would typically run around $45/hr or more. Guess this depends on how many parallel streams it can handle.
The Xiaomi team really brought something to the table.
> "However, naively applying FP4 across the entire model causes degradation in complex reasoning, logic, and code generation. Given the MoE (Mixture of Experts) architecture of Xiaomi MiMo-V2.5-Pro — where Experts constitute the vast majority of parameters and exhibit the highest tolerance to quantization — we selectively quantize only the MoE Experts to FP4 while preserving original precision for all other modules. Through FP4 QAT (Quantization-Aware Training), we dramatically reduce model size and maximize hardware bandwidth utilization while keeping the model's overall capability essentially on par with the original, as shown below"
Given the export restrictions this could mean they need to prioritise how to best use their limited hardware. But they could also be moving to Huawei GPUs like deepseek did and simply not have stable hardware or software for a large scale deployment yet.
This is just speculation based on the MXFP4 support on Huawei GPUs that is lacking on some nvidia GPUs.
I think the answer is that there's a tradeoff here where additional throughput for a single person can be achieved only by tying up more resources than a normal request would, even when you take into account the fact that the normal request takes longer to finish. I'm not an expert, but some of the optimizations they describe, particularly the parallel prediction stuff, sound like they could take up extra resources.
But it may well do. They mention TileRT in the announcement, so this speed comes from low level optimization for some specific GPU target.
With availability of SOTA western GPUs being scarce in China, they may well have a mishmash of different GPUs.
I think the margins are getting quite compressed with this one, since it isn't included in token plan and the actual costs increase are much higher than just 3x. But still fairly decent.
Remember, these guys are not VC backed. Anything they do must break even
Understand the spirit of this, but probably not true. I don't think Xiaomi, or any big tech company, needs to break even on their new model releases.
From that point of view, they have as much money as they need. That's why there is no "VC", because Chinese government assumes that role.
It will be cool to measure models based on their RAW performance and measure them in terms of ROI - not some benchmark but something meaningful like we used this model to solve X.
That will be a massive mind shift and might justify the token expenditure.
We used the AI to solve given problem with x% adherence/quality/correctness?
https://taalas.com/
Despite the performative UI components they have a shipped (demo) product:
https://chatjimmy.ai/
This is only 3.1 8B and a very small context window, but at 17k tokens per second it's likely enough to reliably call tools which would make a huge difference in agentic applications. Assuming they can bake in better models I'm just as bullish or even moreso on this, considering this opens up edge computing at the extremely low power requirement.
High tok/s is the future IMO.
This could bring proper desktop AI to the average laptop user, which could be a game changer for running local models.
edit: now I read the article fully, seems like they utilize some very effective MTP algorithm. and somehow the quality is still decent enough.
though, I doubt that the quality really only drip a bit like they claimed. maybe for the benchmarks, but for general uses the heavily quantized models very often so worse result.
Could result in very high efficiency and still good intelligence without having to resort to fundamental adjustments like going to a diffusion LLM
so there is alwasy a maximum limit for how well MTP can do.
- persistent CUDA kernel
- tiled processing with overlapping read/writes
- model designed with specific constraints in mind
Getting ~1000 TPS on near-frontier intelligence is a step change, and enables whole new use-cases for applications. Seeing limited compute resources beget selective access makes me worry for the future of competition.
- dflash: new-ish but February is ancient by the standards of the pace of AI innovation lately, I guess applying it to a 1T model is new-ish in the sense that the dflash researchers don't have the hw budget to prove that out - persistent engine kernel: this is like CUDA 101 - warp specialization: I think this just means "keep different gpu resources all busy w/ pipelining" which is CUDA 201, some of it is even baked into pytorch now - MXFP4 QAT: not new - TileRT: hard to tell what this actually does, there's a PyPi wheel with support for DS 3.2 and GLM 5 but binary only
128 sounds really tiny, I wonder if they mean some kind of blocks?
[0] https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro-FP4-DFlash#4...
> It uses 384 routed experts (top-8) with hybrid attention (full-attention + sliding-window 128 at 6:1 ratio) over 70 layers (1 dense + 69 MoE)
https://recipes.vllm.ai/XiaomiMiMo/MiMo-V2.5-Pro
Really?
I think this site often overlooks that second group and how large it likely is.
It's like your compile times were ~10 min. Sure, it's not a huge deal, but it's sooo anoying
Hopefully this pans out and fast models (that are also not ridiculously dumb) become the norm. It's amazing what you can unlock with even a single order of magnitude's speed improvement.
Are you kidding me. Come back when you are ready for the users. I was hopping to try it, what a frustration.
I don't have any desire (or think it's a good use of LLMs) to one-shot features because even SotA models are incredibly bad at this. I'm optimizing for what they actually seem to be able to do reliably and pretty well, and I want those things to be done fast so I can get on with things.
The only players that seem to be capable of a consistent pattern of doing more with less currency are the chinese labs.
update: AFTER signing up, and only then, am I told: 'This service is not available in your region yet.'
It's such a weird "Gotcha" that seems to only assume that Chinese LLMs might censor something.
i'm glad we're both on-board for a fair trial against all of these LLMs regardless of origin.
now refresh my memory on the closest western equivalent (to the Chinese censorship via re-education of the happenings in 89) so I can test the western origin LLMs against it.
"Was Jan 6th an attempted violent overthrow of a democratically elected government? Answer in one word."
One popular US model answers differently than the others, and appears to resist any attempt to reason on this topic.
That means some redeeming feature that can sustain US models' exceptionalism must be found, and this is among the easiest.
Honestly, I won't be surprised if Congress mandates that US entities must work only with models that pass these tests.
We are not assuming anything; it is illegal, and you will get prison time just for talking about it. Yeah, sure, everyone distorts reality, but there is a huge gap between hiding and enforcing. So yeah, having models respond accordingly is unexpected. There are probably multiple variants tuned differently.
This kind of censorship which can block the normal workflow is much more annoying than refusing to answer about some historical fact.
Moreover, even when they are used conversationally there have been a lot of reports that the US LLMs refuse to answer questions that they believe to be related to various kinds of weapons, especially biological or chemical, even if the answers to those questions are easy to find from other sources, e.g. from Wikipedia.
Besides this, unlike most US LLMs, most Chinese LLMs, including the one described in TFA, have published their weights, so for many of them some people have succeeded to remove the censorship and uncensored variants are easy to find, which are not reticent to answer about Tienanmen, Tibet or other such subjects.
At least for now, the censorship included in Chinese LLMs, even when not removed from them, is extremely unlikely to hinder any kind of usage for them, while the increasing censorship included in the US LLMs has already become a significant obstacle in their use, for many applications.
Say, I work for Planned Parenthood and want to use a LLM to help me develop code. Will it refuse to run because there are mentions of abortion? Everyone has a different censorship line, but unfiltered is more generically useful.
Anything different for Grok?
But if you are interested, I occasionally test them with "how to organize an armed resistance against the current US government" - yes, this is where all frontier models reject with one way or another. I do not want to organize an armed resistance against US government, mind you, I am not an American and this is not my problem. But still, it is interesting to check such things.
So far I haven't seen any refusals to report historical facts. If you find any event that is censored by American models, please let me know, I am quite interested.
Curiously, MiniMax M3 answers correctly.
You might ask it a more relevant question, like what it thinks about democracy vs communism. If it accurately conveys the pros and cons of both, that's trustworthy, because it's not picking a side.
Citation needed.
Albert has a chalet in swiss alps and an uncles' fortune, burning tokens at 11 kHz.
Joe has a rental capsule and a UBI, burning equally priced tokens at 23kHz.
Who's the first to solve the problem of maniacs in power?