The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin

138

ffreediver 2 days ago 111 commentsRead Article on minimaxir.com

ZH version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

68% Positive

Analyzed from 2064 words in the discussion.

Discussion (111 Comments)Read Original on HackerNews

simonw•1 day ago

First model I've tried that gave me back HTML with a "Change Pelican Color" button: https://static.simonwillison.net/static/2026/hy3-preview-pel...

(Transcript: https://gist.github.com/simonw/c2a0d8ecd3056a2681319eae8fc3f...)

cwmoore•1 day ago

But…and I’m sure I’m not alone here…that is a snowman, and what it is on is not a bicycle.

What do we think we are doing with this life?

fragmede•1 day ago

Haha does it get bonus points for the extra button, or does it fail because html != SVG?

dodslaser•1 day ago

Any bonus points for the color sre immediately subtracted because the "animate wheels" button leaves the wheels stationary and makes the sun rotate.

MostlyStable•1 day ago

I wonder if it is actually animating the wheels as well, but just managed to match up the spin rate to the gap size.

cicko•1 day ago

That depends on the perspective. If you're on the Sun, the wheels rotate around you.

Garlef•1 day ago

Judging from the dotted trajectory lines, it even "thought" about giving the bike a wobble.

(But maybe that's just my interpretation based on something else going wrong in the animation)

fragmede•1 day ago

Hy3 is a Scandinavian model, and is leaking that out via Norse mythology about Sol being a wheel!

postepowanieadm•1 day ago

ROTFL

preek•1 day ago

It actually rendered an SVG inline in the HTML page. I just tested the SVG and it renders itself just fine, including colors. So, tbh, I'd say the task has been properly achieved.

embedding-shape•1 day ago

Maybe I'm just extremely nitpicky, but I'd consider that a failure, as the prompt is asking for SVG, not HTML.

Bit like asking for CSS and then getting a HTML file back with the CSS embedded, that was not what I was asking for!

PunchyHamster•about 22 hours ago

...the animate wheels button makes sun start to spin

zone411•1 day ago

I’ve tested this model on four of my benchmarks:

https://github.com/lechmazur/buyout_game 10th out 36.

https://github.com/lechmazur/pact/ 14th out 25.

https://github.com/lechmazur/nyt-connections/ 60th out 81.

https://github.com/lechmazur/debate 16th out of 29.

Sandworm5639•about 13 hours ago

oh, I love the connections benchmark.

Just curious, can you share what are those hardest puzzles that even the top models can't crack? sometimes when I find the puzzle absolutely undecipherable I like to ask LLMs to solve it, and I haven't seen them fail yet.

aorloff•about 12 hours ago

Ask your top model this question : I'm 100 feet away from the carwash, should I drive my car or walk ?

Tepix•about 9 hours ago

You messed up the question.

baxtr•about 21 hours ago

Good stuff!

Is there a reason you change the leaderboard graphs for the third and fourth one?

Also: would be great to have an overview page with a summary over all test, like a total score or similar.

CamperBob2•about 22 hours ago

Would be interesting to see the 27B dense Qwen 3.6 model thrown into the mix.

Aurornis•1 day ago

> Two new models are now beating LLM darling Claude in terms of token usage and by more than 50%?

Time for a reminder that OpenRouter leaderboards only show tokens sent through OpenRouter, which most Anthropic API users don’t use.

svantana•1 day ago

I would think that's true for all the models on OR. The data is skewed for sure, but it's interesting none the less.

9cb14c1ec0•about 21 hours ago

That doesn't mean it can't be used as a market signal. These 2 things can both be true at once.

TurdF3rguson•about 16 hours ago

I'm pretty sure the popularity came from being free at some point

smartbit•about 11 hours ago

Notice that Hy3 Preview usage didn’t go down after the free period was over https://openrouter.ai/tencent/

The list of apps using Hy3 Preview shows Hermes Agent causing 65% usage over the last 3 weeks https://openrouter.ai/tencent/hy3-preview/apps

  Hermes Agent      72B
  OpenClaw          10B
  OpenHands          9B
  Claude Code        8B
  Kilo Code          8B

killingtime74•1 day ago

Are you next going to say YouTube rankings don't take into account videos that aren't on YouTube and Spotify rankings don't take into account songs that aren't on Spotify?

simonw•1 day ago

OpenRouter rankings frustrate me, because they show the total number of tokens but they provide no indication of how many unique users a model has.

Which means if a surprise model tops the leaderboard one week we can never be sure if it was because a single whale user pushing billions of tokens a day switched to it, or if it represents a genuine community trend towards that model.

numlocked•about 22 hours ago

(openrouter co-founder here)

Yeah we should do something to indicate cardinality. I can share that there can often (I'm talking generally; not related to this model in particular) be e.g. a very large app that can be pushing a lot of volume. But in almost all cases that app has a large number of end users. Hypothetically, for instance, would Cursor be consider one user, or millions?

Will think about it! Thanks for the feedback.

simonw•about 21 hours ago

I'd consider Cursor one user because it's one entity that made an editorial decision about which model to make available to their own community.

If you treated Cursor as millions of users it might look like millions of people independently chose a new model when actually it was Cursor making the choice for them - and the thing I care most about is how many choices were made that selected a model and put it above the others.

dotancohen•about 11 hours ago

An alternative viewpoint is that the single choice made about switching the Cursor model was done after extensive testing by a competent and experienced team. Whereas my naive self choosing a model to play with this week is far less a signal to others that the model is fit for purpose.

minimaxir•about 21 hours ago

One idea I had was to count # of distinct API keys that have spent atleast $100 (number's flexible), which would be enough to provide guidance on if the traffic is from a single power-user.

In the Cursor case which is BYOK, that would count as distinct API keys.

martinald•about 22 hours ago

Hi! Big fan of OpenRouter and the data you provide. It'd be awesome if you would consider providing volume of tokens per hour, mostly for my own curiosity as to quite how peaky demand is.

Thanks!

svantana•1 day ago

Also, while we're pitching new features to openrouter, I'd like to see a "$ spent" chart, which would remove all these huge freebie spikes. It looks like it would be pretty much dominated by claude.

senordevnyc•1 day ago

Agreed. My little solo dev SaaS app’s production pipelines push almost two billion tokens a day.

senordevnyc•1 day ago

Haha, I never tire of the AI haters downvoting stuff like this.

Down with reality!!

daveguy•about 20 hours ago

Or, everyone finally realizes that token burn is not the same as productivity. Maybe they just down voted for the questionable spending brag.

andai•2 days ago

So basically, Hy3 is the cheapest decent model on OpenRouter, unless you use DeepSeek as the provider for DeepSeek V4 Flash, in which case DeepSeek's insane caching wins out. (And Hy3 is close-ish on the benchmarks.)

0xbadcafebee•1 day ago

You need to use DeepSeek API directly to gain the extra caching benefits. The DeepSeek provider on OpenRouter is only the 5th-cheapest for V4 Flash, so you have to specify DeepSeek provider when calling OpenRouter. But DeepSeek's API discounts on its models only applies if you call DeepSeek directly. So anyone using OpenRouter to call DeepSeek models is actually losing quite a bit of money.

NitpickLawyer•1 day ago

> The DeepSeek provider on OpenRouter is only the 5th-cheapest for V4 Flash

You might have the default settings on your account, which limit Deepseek as a provider. If you disable that feature you see them on openrouter as well (and they serve it at the same cost as their own API).

0xbadcafebee•1 day ago

I just checked my settings and I have everything enabled. https://openrouter.ai/deepseek/deepseek-v4-flash?sort=price (per-1M price) shows DeepSeek provider as #5. https://openrouter.ai/deepseek/deepseek-v4-flash/pricing?sor... (effective price) shows them as #3. The effective price will change your total cost since each provider has a different price for input vs output vs cache, so what's #1 and #5 for one person could be #5 and #1 for somebody else, depending on their workload.

However, I just double checked, and OpenRouter's pricing page for Flash v4 with DeepSeek provider shows a cache hit rate of $0.0028, which is the same as on DeepSeek's official API pricing page ($0.0028), so they do seem to be the same price, (assuming DeepSeek is able to pin your specific OpenRouter requests to the same DeepSeek server). OpenRouter adds 5% to that cost, but still it might be cheaper than the other providers.

Also just found out OpenRouter has a new feature "Response Caching" where they can cache identical requests and return them immediately with no billing. The entire request must be identical, though, not just a prefix, and you have to enable this feature. I don't know who would need to send multiple identical requests, but it's better than nothing?

beacon294•1 day ago

ZDR is also on by default and deepseek is not ZDR.

cicko•1 day ago

How is it a "mysterious" model? It's Tencent's Hy3?

theanonymousone•1 day ago

My question as well. Isn't Tencent a very well-known company? Maybe the mystery is in the model itself?

0xbadcafebee•1 day ago

> it makes sense that a cheaper model would prevail, but only if it offered similar quality

You're trying to think logically, which has no place in an AI discussion. :) People just jump to whatever the latest model is. Plenty of people also prefer price to "quality" (which is very subjective). It's new, it's cheap, so people use it. It's likely people will stop using it when something else is cheaper and/or newer.

olmo23•1 day ago

Since my employer pays for it, I just select the latest and greatest.

alecco•1 day ago

PSA: Don't use OpenRouter for DeepSeek V4 as it messes up you caching. Use DeepSeek API directly and you'll get 2x to 3x more cached tokens.

numlocked•about 22 hours ago

Can you share more? I'm with OpenRouter and we would love to address this! We don't see this in our own testing, I don't believe -- but will share this feedback and dig in.

alecco•about 1 hour ago

Just try. In a case last week it was ~3x and I tried multiple providers: deepseek, gmicloud/fp8, novita/fp8, and another one I can't remember. It was a large job where at least 2/3rds of the start of the prompts was exactly the same (literally a static string).

Then I read somewhere (I think X) that OpenRouter adds stuff and breaks caching (telemetry? headers? can't remember). So I stopped the job, switched to actual DeepSeek provider, and voilá, caching 3x more tokens per request (on average).

bwfan123•about 3 hours ago

Here is some data from my experience using both deepseek v4 flash directly, and deepseek v4 flash via openrouter.

Directly: 135M input tokens - $0.57 (134M cached)

Via OpenRouter 6M tokens - $0.81 (caching stats & inp/out not reported)

Caching is a huge win with using deepseek directly.

phainopepla2•about 1 hour ago

I am experiencing this using Opencode. Caching works fine via Deepseek API but not so good via Openrouter

jaggs•40 minutes ago

Yes, I definitely noticed a problem with openrouter and deepseek v4 pro. It's much more expensive.

SV_BubbleTime•about 16 hours ago

When you say Deepseek API, you mean servers in China? Or is it a copy of the model operated and run by OpenRouter?

sheepscreek•about 18 hours ago

FYI - DeepSeek has NOT announced its own coding platform. That app is an independent project. It says so in the footer as well:

“Independent open-source project · not affiliated with DeepSeek”

lithiumii•1 day ago

What's so mysterious? Isn't it from Tencent?

gmerc•1 day ago

Very mysterious: https://huggingface.co/tencent/Hy3-preview

vessenes•1 day ago

Since there’s only one inference provider it could be a recycling/ad experiment. The similar usage between trial and paid periods would be explained by this as well.

thot_experiment•1 day ago

Tried this extensively in OpenCode, never used it once since Gemma 4 came out, got into thought loops and did stupid edits I didn't ask for more often than the local 31b model. One of the worst "frontier" models I've ever tried.

BoorishBears•about 6 hours ago

This article got me messing with it, and I'm loving it as a post-training target.

Training on ~1B tokens on 8xB300 and the first checkpoint halfway in learned really well. Tencent might be struggling with agentic work, but the base knowledge is there.

segmondy•1 day ago

High token usage cuz it's free doesn't count

minimaxir•about 21 hours ago

The post goes into that issue. Throughly.

The numbers at the beginning of the post are weekly aggregate values well after the endpoint was paid-only.

segmondy•about 17 hours ago

The post is wrong, it's still free, see - https://openrouter.ai/tencent/hy3-preview:free it's free in kilo.ai https://kilo.ai/models/tencent-hy3-preview-free It's free in a lot of places.

minimaxir•about 2 hours ago

The first endpoint was closed. If you actually try and call it from the API you get this response:

> Hy3 preview is no longer available as a free model. It has transitioned to a paid model. Continue using it here: https://openrouter.ai/tencent/hy3-preview

The Kilo Code may have free traffic but if you check the numbers is still inconsequential relative to the trillions of tokens through OpenRouter.

freakynit•1 day ago

This was originally a 400+B param model which was later reduced to 295B considering it as the "optimal zone".

https://www.mdshare.online/s/uend0pj3og_A_rgcxzINf

bandrami•1 day ago

For the life of me I will never understand the thought process that leads you to say "we don't really know who developed this LLM but I'm going to feed all of my business's data to it"

WithinReason•1 day ago

It's from Tencent, says it in the article:

https://hy.tencent.com/research/hy3

bandrami•1 day ago

Right but Tencent is a massive half-state-controlled holding company so that's not really helpful.

throawayonthe•1 day ago

but we know who they are? how is this relevant

minraws•1 day ago

OpenAI & Anthropic are deeply in bed with US govt, and they need US govt approval before model releases, and all US Companies under various acts need to share data with the govt.

I mean sure there are investors and a little more open-ness, but with the example of Mythos we don't even know if public will get access to the "good" stuff because it's too dangerous.

If your only opinion on trusting these companies more than one based in China is, they are Chinese then good luck, all the best.

est•1 day ago

> I'm going to feed all of my business's data to it

Your business data is probably worthless, even considered harmful for the pretrain corpus.

Your interactions and decision making process are most valuable parts of the whole business.

bandrami•1 day ago

I assure you my business's data is not remotely worthless which is why there are pretty strict laws and regulations about what we can do with it

TZubiri•1 day ago

>Your business data is probably worthless

please tell me you are not in charge of the data of any business I'm a client of

est•1 day ago

to clarify, probably worthless to AI vendors, but might be useful for third-parties.

elpocko•1 day ago

Could be! Let's check. I just need your name and address, your SSN, a list of businesses you are a client of, and a DNA sample.

kirtivr•1 day ago

You don't need to know who developed the LLM - whether it was Google or OpenAI.

What you need to know is who is the provider for the LLM, and whether their endpoints are zero data retention enabled and opted out of training. OpenRouter gives you an easy way to control this.

lmf4lol•1 day ago

This is not entirely true and ignoring a couple of potential attack vectors like Data Poisoning: https://arxiv.org/abs/2408.12798

Its of course highly dependant on the use case and the environment, but simply saying that the only important part is to know where the data goes is too simple.

koiueo•1 day ago

How can openrouter control what LLM provider does with your data on their side?

kirtivr•1 day ago

OpenRouter and the provider sign a contract clearly specifying how input data is to be handled.

It's the same way we trust OpenAI to not train on our data if we've opted out although there is no control on whether they can retain the data indefinitely.

ddalex•1 day ago

what can it do ? it's just a big set of numbers, if you trust the host that's good enough

what266262•1 day ago

If you are ok with everything being fed into it being stored forever I guess it’s no problem. I don’t see how you trust them if you don’t know them.

Dylan16807•1 day ago

Who is "them" here? The developers and the hosts are not the same.

ddalex•1 day ago

where would it be stored ? it's just a big set of numbers.

Mashimo•1 day ago

If you Code open source projects anyway, might give it a spin.

st3fan•1 day ago

How do you “feed data into a model” ? Use the correct terminology and concepts please. It is important.