RU version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
73% Positive
Analyzed from 5627 words in the discussion.
Trending Topics
#kimi#model#models#opus#more#china#https#open#anthropic#same

Discussion (241 Comments)Read Original on HackerNews
Transcript and HTML here: https://gist.github.com/simonw/ecaad98efe0f747e27bc0e0ebc669...
https://github.com/scosman/pelicans_riding_bicycles
I mean the prompt was succinct and clear, as always - and it still decided to hallucinate multiple features (animation + controls) beyond the prompt.
It'd also like to point out that to date no drawing was actually good from an actual quality perspective (as in comparative to what a decent designer would throw together)
Theyre always only "good" from the perspective of it being a one shot low effort prompt. Very little content for training purposes.
And so if you ask it to do something big it will do a very surface level implementation. But if you have it iterate many times, or give it small pieces each time, you’ll end up with something closer to what a human would do.
I imagine the pelican test but done in a harness that has the agents iterate 10+ times would be closer to what you’d expect, especially if a visual model was critiquing each time.
Of course, a while back there was a Gemini release that I believe specifically called out their ability to produce SVGs, for illustration and diagramming purposes. So it's not longer necessarily the case that the labs aren't training on generating SVGs, and in fact, there's a good chance that even if they're not doing so explicitly, the RLVR process might be generating tasks like that as there is more and more focus on frontend and design in the LLM space. So while they might not be specifically training for a pelican riding a bicycle, they may actually be training on SVG diagram quality.
https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/
Surely, you know someone makes the same post you did every time one is posted. Surly you see the answers and pushback since you are familiar with these posts. Genuine question, did you expect a different answer this time?
Private companies will never open up a technological breakthrough to their competitors. It just doesn't make sense. If you want an entire field to advance, you have to open it up.
Here's the aggregated AI benchmark comparison for K2.6 vs Opus 4.6 (max effort).
- Agentic: Kimi wins 5. Opus wins 5.
- Coding: Kimi wins 5. Opus wins 1.
- Reasoning & knowledge: Kimi wins 1. Opus wins 4.
- Vision: Kimi wins 9. Opus wins 0.
Please note that the model publisher chooses their benchmarks, so there's a bias here. Most coding and reasoning & knowledge benchmarks in their list are pretty standard though.
$200/m minimum to use Claude would bankrupt my country's white collar labor market
Yes, absolutely.
China regularly produces long term planning documents to coordinate efforts, and the latest ones have specifically prioritized technology like chips and AI to compete with the west. https://www.reuters.com/world/china/china-parliament-approve...
I don't believe there's any publicly stated intent to sabotage the west... unsurprisingly.
This I assume will make it more difficult for US AI labs to turn a profit, which might make investors question their sky high valuations.
Any sort of melt down in the AI sector would almost certainly spread to the wider US market.
In contrast, in China, most of the funding for AI is coming directly from the government, so it's unlikely the same capital flight scenario would happen.
We're making this way too easy. The rationale and logic are reasonable, but ultimately irrelevant.
After all historically both statistics and research that comes out of China is not very trustworthy.
The strings attached by the Chinese govt to deep partnerships are not so benign.
in capitalism the people with the capital get the profit, not the people who do the work. however, workers are said to benefit too through their salary, just less so
I do wonder where we go from here.
Price/quality is absolutely bonkers though. I loaded $40 a few weeks/months ago and I haven’t even gone through half of it.
I use OpenCode and the openrouter provider. From opencode I only select the model like kimi-2.6 and have no way of selecting which cloud hosting will receive my request.
https://clocks.brianmoore.com/
It was the best creative writer by some distance
[1] https://huggingface.co/moonshotai/Kimi-Dev-72B
[2] https://huggingface.co/moonshotai/Kimi-K2.5
There’s other options like photonic computing which might be able to reduce power significantly but are still in research as far as I can tell. Because so much money is invested in AI & traditional gpu inference is so power hungry, I would expect significant improvements in this space quickly.
I wouldn't expect this.
Historically we've had a roughly exponential rate of shrinkage. If we keep that same exponential going, we should expect the amount of time to shrink "room full of compute" to "pocket full of compute" to be equal.
And recently we've fallen behind that exponential rate of shrinkage. And this is rather expected because exponentials are basically never sustainable rates of growth.
I still expect that technological progress is getting faster year by year, and that we're still shrinking compute, but that's not necessarily enough for the next shrinking to take less time than when we had exponential progress on shrinking.
I tried it once, although it looks amazing on benchmarks, my experience was just okay-ish.
On the other hand, Qwen 3.6 is really good. It’s still not close to Opus, but it’s easily on par with Sonnet.
Kimi K2.6 seems to struggle most with puzzle/domain-specific and trick-style exactness tasks, where it shows frequent instruction misses and wrong-answer failures.
It is probably a great coding model, but a bit less intelligent overall than SOTAs
[0]: https://aibenchy.com/compare/moonshotai-kimi-k2-6-medium/moo...
I'm hoping that Anthropic will be able to release an updated Haiku soon and they really need something that is 1/3-1/5 the price of Haiku to compete with the truly cheaper models (Gemma-4 is really good at this range).
https://www.kimi.com/code
Details here [0]
[0] https://techstackups.com/comparisons/kimi-2.6-vs-opus-4.7-an...
Also discovered that using OpenCode instead of the kimi cli, really hurts the model performance (2.5).
Kimi 2.5 (which this is based on) is served at $0.44 input / $2 output by a ton of different providers on OpenRouter, 2.6 will certainly be similar.
That's about 11X less than Opus for similar smarts.
In China, there's no recourse at all. Surveillance must be presumed.
Does US actually follow laws? They literally kidnapped head of another state and bombed another state and you are expecting legal protection from them?
I really hope this holds true in real world use cases as well and not only benchmarks. Congrats to Kimi team!
I will have to test this full release of K2.6 but could see it serve as a very good overall drop-in replacement for Opus 4.5 and Opus 4.6 at 200k across the vast majority of tasks.
I will say however that Opus 4.7 Max 1M has been a very significant jump in performance for me, especially in tasks beyond 120k token where I'd argue it is now the most reliable model in continued task adherence and tool calling without compaction. Ironically, my initial experience was less than pleasant as on XHigh I found task adherence to have regressed even with less than 1/10th of the context window having been used.
Am very interested in K2.6s compaction strategy (which appears to be very simply all things considered) and how it performs beyond 100k tokens. As it stands, only OpenAI models have made compaction for long running tasks work well, though overall, GPT-5.4 is still inferior in my tests regardless of context window over other models such as Opus 4.6 1m and Opus 4.7 1m. Haven't gotten around to testing Opus 4.7 200k and will have to do this to properly assess K2.6 fairly, but I'd be very surprised if K2.6 truly beat Opus 4.7 200k given the jump I have experienced.
The test data is purposely difficult to access to reduce the chance of leaking it into the training dataset.
Is this the same model?
Unsloth quants: https://huggingface.co/unsloth/Kimi-K2.6-GGUF
(work in progress, no gguf files yet, header message saying as much)
Our hope these days seems to be that maybe perhaps possibly High Bandwidth Flash works out. Instead of 4, 8, or maybe more for some highest end drives, having many many many dozens of channels of flash.
Ideally that can be very very near to the inference. PCIe 7.0 is 0.5Tb/s at 16x which is obviously nowhere remotely near enough throughout here. The difficulty is sort of that nand has been trying to be super dense, so as you scale channels you would normally tend to scale nand capacity too, and now instead of a 2tb drive you have a 200tb drive prices way beyond consumer means. Still, I think HBF is perhaps the only shot of the most important thing in computing going from mainframe back to consumer, and of course the models are going to balloon again if this dies hit, probably before consumers ever get a chance.
But the files are only roughly 640GB in size (~10GB * 64 files, slightly less in fact). Shouldn't they be closer to 2.2TB?
"Kimi-K2.6 adopts the same native int4 quantization method as Kimi-K2-Thinking."
So am I misunderstanding "Tensor type F32 · I32 · BF16" or is it just tagged wrong?
Model seems quite capable, but this use-case is just yikes. As if interviewing isn't already a hellscape.
Unfortunately the generation of the English audio track is work in progress and takes a few hours, but the subtitles can already be translated from Italian to English.
TLDR: It works well for the use case I tested it against. Will do more testing in the future.
The ~100k hardware is suitable for multi-user, small team usage. That's what you'd use for actual work in reasonable timeframes. For personal use, sure macs could work.
Deepinfra for example is not preserving thinking correctly for GLM5.1, even though they are for GLM5. This is one of the more obvious issues that crop up.
When you have a consistent model, you can incorporate fixes/prompts into your workflow to make it behave better. But this, always having to guess if Anthropic has quantised the model today, wastes so much time and effort.
This should be so easy to prove if it were true. Yet there is none of it, just vibes.
Still, your other two points are completely valid. The opaqueness of usage quotas is a scam, within a single month for a single model it can differ by more than 2x. And this indeed has been proven.
https://github.com/anthropics/claude-code/issues/42796
https://scortier.substack.com/p/claude-code-drama-6852-sessi...
edit: Note that you can run it yourself with sufficient resources (e.g., companies), or access it from other providers too: https://openrouter.ai/moonshotai/kimi-k2.6/providers
Edit: found it.
> We may use your Content to operate, maintain, improve, and develop the Services, to comply with legal obligations, to enforce our policies, and to ensure security. You may opt out of allowing your Content to be used for model improvement and research purposes by contacting us at membership@moonshot.ai. We will honor your choice in accordance with applicable law.
Section 3 of https://www.kimi.com/user/agreement/modelUse?version=v2
So in other words only if you can point to a local law which requires them to comply with the opt out?
This sounds so so so cool. It would be so amazing to see this unfurl:
> Kimi K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac. By implementing and optimizing model inference in Zig—a highly niche programming language—it demonstrated exceptional out-of-distribution generalization. Across 4,000+ tool calls, over 12 hours of continuous execution, and 14 iterations, Kimi K2.6 dramatically improved throughput from ~15 to ~193 tokens/sec, ultimately achieving speeds ~20% faster than LM Studio.
Might be a configuration or prompt issue. I guess I'll wait and see, but I can't get use out of this now.
In the past I tried Kimi thru Claude code I might try that again
The other release, Qwen-3.6-Max is the one comparing it to 4.5