FR version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
86% Positive
Analyzed from 561 words in the discussion.
Trending Topics
#longcat#question#deepseek#fuel#answer#training#run#exist#coming#higher

Discussion (11 Comments)Read Original on HackerNews
This is the real news story. It looks like they may have used Huawei Ascend 910C chips: https://nitter.net/teortaxesTex/status/2071708141037781407#m
In any case, LongCat-2.0. gave a very well reason but incorrect answer that Pu-241 is preferable.
I then tested on Qwen 3.7 Plus, and it correctly answered that U-235 is preferable because of its much higher delayed neutron fraction. I then went to Gemini Flash, which answered the same, with much more confidence, and with much stronger arguments, and the speed of the answer was much higher.
Overall I rate Gemini Flash the best, Qwen 3.7 Plus an acceptable second, and LongCat-2.0 an ok'ish third, if you have nothing better.
Or stated another way, "If you could run a generator on gasoline or jet fuel, which one would you choose and why?" I would answer jet fuel owing to slightly higher energy density and purity of the material - likely leading to a cleaner burn. Which would ignore that jet fuel is going to be a multiple of the gasoline price.
/s
A bonus would be tok/s on common hardware.
They haven't posted weights/inference solutions for LongCat-2.0 [1], but LongCat-Next had transformers support, which I assume means it works with vLLM/SGLang.
Given it's 1.6T, "common hardware" is probably out of the question; even 2bpw is going to measure out at 400GB, even before considering the bandwidth requirements for 48B active. I haven't read the LongCat-2.0 architecture docs, but if you're not running GLM-5.2, you're probably not running this either.
[1] https://huggingface.co/meituan-longcat/LongCat-2.0: "Model weights coming soon — stay tuned!"
Maybe I'm wrong, but that's just the first impression.
EDIT: I take my words back (which happens rarely) - although they do build upon DeepSeek's work, their contribution far exceeds merely post-training the base model in a different way. They did introduce something new to the architecture, though I still can't find the full tech report, with Hugging Face and GitHub links returning 404 right now.
EDIT-2: Now when I think about it, I'm not quite sure if they're going to release in the open the full report with methodology, as well as the model weights, at all.