Cursor Introduces Composer 2.5

282

aasar 2 days ago 221 commentsRead Article on cursor.com

DE version is available. Content is displayed in original English for accuracy.

https://twitter.com/cursor_ai/status/2056415413077233983

⚡ Community Insights

Discussion Sentiment

78% Positive

Analyzed from 6647 words in the discussion.

Discussion (221 Comments)Read Original on HackerNews

throwaw12•1 day ago

> Composer 2.5 is built on the same open-source checkpoint as Composer 2, Moonshot's Kimi K2.5.

Really nice to see they're giving credit to the company and I am optimistic Kimi K open models soon will outperform Opus models

vessenes•1 day ago

Sounds like it's the last Kimi-line model at Cursor? As expected they say they'll be training a larger model on the SpaceX infrastructure, or have already started most likely.

I'm very curious to read about the Composer 3 architecture when it comes out. More frontier coding models are a good thing, especially if they diversify into different strengths/weaknesses.

bfeynman•1 day ago

That only seems plausible if whatever corpse of xAI is around is giving them engineering time. I don't know if they hired a bunch of ex frontier lab staff but its unlikely they have the technical capability to train their own frontier models especially the pretraining. Because the thing is if its not competitive with claude/codex it will be panned.

vessenes•about 23 hours ago

Hmm, I read the situation a little differently. Grok is not a slouchy model. It’s not the best, but it’s not the worst. X currently has one source of proprietary data, Twitter, and grok is by far the best at all the things you might imagine there - today’s zeitgeist, who’s saying what, current news, etc.

Cursor adds in a large corpus of proprietary coding data — I think this is actually fairly hard to acquire right now, because claude and codex are so good.

I bet there’s enough talent at the Grok team to work with the cursor team and data to get something good out the door.

That said, I don’t track Grok’s engineering leads — I’m not sure who’s currently around, and who is not.

scosman•1 day ago

> I am optimistic Kimi K open models soon will outperform Opus models

Hard to outperform the model you distill...

nl•1 day ago

Most of the performance on coding comes from RL, not distillation.

Distillation helps with world knowledge and things like that.

Bolwin•about 24 hours ago

They're not distilled. Stop spreading anthropics misuse of the term.

They do use it for synthetic data/judging though, so yes, hard to outperform.

Not that they need to. If they can basically match it for a fifth of the price.

intrasight•1 day ago

Is that true? If the distillation is not lossy and the model runs much faster due to less resource consumption, then it may outperform.

mwigdahl•1 day ago

One of those conditionals is a pretty huge assumption.

howdareme9•1 day ago

Only because last time they tried to hide it lol

trymas•1 day ago

Yes and if I remember the drama correctly - Kimi's license or terms of use says that for commercial use cases (or was it user count?) - you must declare credit to Moonshot and Kimi.

Lennie•1 day ago

It's important to mention: they were compliant, because they trained the model at an AI hosting provider that had a partnership with Moonshot AI, but Moonshot didn't know Cursor was a customer.

Aurornis•1 day ago

This was misinformed Twitter and Reddit drama.

They had properly licensed it and were complying with the terms of the license.

maxdo•1 day ago

How can distilled opus become better than original? There are numbers of reports including anthropic that kimi team was participating in fraudulent activities

goyozi•1 day ago

I kind of want to try it, to see if and how far they can take an open model and improve it but I really don’t miss the Cursor user experience. Constant UI changes, half-baked features, smaller and smaller limits, useless AI change attribution; I think I’ll wait for others to report if it’s any good.

whywhywhywhy•1 day ago

Noticed recently they keep opening their “Agents” window when the project was last opened in the VSCode fork window in the hopes I’ll just continue working in that when the UI is totally different and missing things I need.

For a professional tool it’s getting egregious how little respect they have for my workflows and flow state they way they keep moving, changing iconography and flipping switches of the UI.

It’s clearly being ran by someone who comes from a social app or sales app growth hacking background.

dmix•1 day ago

I’ve personally never experienced that issue with Cursor. I never use the agents window and it always shows me the editor.

whywhywhywhy•1 day ago

You're not in the A/B test. I've never opened the agents window consensually.

SebastianKra•1 day ago

It seems obvious that they plan to eventually drop VSCode. I'd be willing to take them up on that offer. Their agent window is genuinely better as a starting point.

What annoys me is how little they want to integrate with ...anything. Wanna open a link in your default browser? Use our built-in chromium fork, we insist. Wanna open a location in Zed? No, please use our half-baked editor re-implementation. Wanna open a location in Cursors own vscode-based editor? You can't. Managed to work around that somehow? We changed your files to "Worktree TS", disabling all your language servers. It's like programming on an iPhone.

znpy•about 11 hours ago

> It’s clearly being ran by someone who comes from a social app or sales app growth hacking background.

I fixed that by using cursor the agent but not the UI.

I'm just running cursor in GNU Emacs via agent-shell (https://github.com/xenodium/agent-shell). Their cli client (aptly named "agent") supports ACP (agent client protocol) so the UI can be skipped altogether.

I know this sounds like a meme ("use x in emacs") but at this point at the very least i can keep my workflows and my UI all the same and focus on my work rather than "where did $company put $feature this month".

omederos•1 day ago

Use their cli?

https://cursor.com/docs/cli/installation

znpy•about 11 hours ago

I use it via the gnu emacs integration :P

https://github.com/xenodium/agent-shell

kilroy123•1 day ago

I 100% agree. It's soooo buggy.

I gave up, canceled my plan, and went back to boring old VSCode. It feels so much more stable, and my Mac no longer runs out of memory. With cursor I had to reboot my macbook several times a week and had to always be plugged in.

smnscu•1 day ago

That's me with Google Antigravity. Switching back to vscode was such a breath of fresh air. Porting over my (extensive) settings/extensions/keyboard shortcuts was extremely easy too (just ask the agent to do it), and now I can use both Copilot models and Claude Code easily. More to your point though, the speed and stability is incomparable. I can't remember having many issues with Cursor last year when I used it at my last job, but still, vscode has been surprisingly pleasant for agentic use.

tomasz-tomczyk•1 day ago

Yeah I have a soft spot for Cursor because it was my first tool that unlocked huge productivity with AI, but I avoid doing anything there now.

Should try their CLI!

Aurornis•1 day ago

I try it from time to time and feel the same way. Some people I know really like it but I can’t tell if that’s because it’s good or just because it’s what they’ve become familiar with and they don’t like to change tools. Cursor had a good head start and a lot of early PR.

fjdjshsh•1 day ago

I've had good experiences with Cursor so far and it's my main IDE. I've noticed some UI changes, but I've switched fast and they didn't bug me

indiantinker•1 day ago

I agree. I quit cursor and replaced it with conductor and a mix of Claude Code / Codex/ Copilot and i dont miss it as such. Maybe one day I will come back.

ttouch•1 day ago

you can use either the cursor cli and/or zed editor with cursor as the underlying provider with ACP (agent context protocol)

presentation•1 day ago

Tried that, it just seemed way dumber this way unfortunately. And the zed UI provided 0 visibility whenever it was doing tool calls, and for some reason it kept running sleep 30 calls because it couldn’t figure out how to see the results of its own tool calls for some reason.

rubyn00bie•1 day ago

Damn do I feel the UI changes being a pain point.

It’s a near constant regression in my workflows. “Multiple agents” got destroyed recently, and the new interface for it some sort of command isn’t as good or reliable. Then you’ve got modals everywhere[1] and truncated bits (like long branch names) that make it insanely frustrating to use.

They’re constantly changing the UI without actually improving it at all. I’ll likely cancel it and use opencode for personal stuff with Deepseek and only use it at work because I have to. There was a time when I appreciated the harness but it’s becoming less useful, or at least noticeable, over time… all the while the actual UI becomes substantially more painful and awkward to use (like @ in the “agents” window being completely unable to find a file because it’s some sort of “global” scope).

One thing that surprises me about this whole segment is that JetBrains haven’t eaten these folks lunch. Their IDEs are leagues better than VSCode but their AI integration is awful by comparison (and the bar is low). I can’t even see how much of the context window I have left.

[1] it’s insane I have to answer questions in a tiny input box I cannot resize or adjust the size of. Let alone the fact the text area I input prompts into cannot be resized. Truly feels like the UI/UX is done by people without any experience.

animuchan•1 day ago

> Truly feels like the UI/UX is done by people

To me it feels like it's done entirely by an LLM, starting from the product vision.

jstummbillig•1 day ago

Isn't there a cli version of cursor by now?

yourboirusty•1 day ago

It's a bit better than the VSCode fork, but still much worse than competition:

- lags constantly,

- if you type while it's generating you'll get missed inputs,

- 'plan mode' doesn't clear context before starting work,

- you can't directly edit the plan, you can only ask the bot to do it,

- you can't immediately whitelist commands, only accept once or allow all.

vorticalbox•1 day ago

Yes

https://cursor.com/cli

epolanski•1 day ago

Good point.

One of the things I've came to appreciate about the cli tools like Codex or Claude is that the interface is so limited that every feature they release is still limited and constrained to the same UX limitations, whereas those "funkier" IDEs change from month to month giving me further fatigue.

asar•2 days ago

The model is (like Composer 2) based on Kimi K2.5 and they claim SOTA performance for 1/10th of the cost. The tweet also mentions that they've started a new model from scratch on Colossus 2 (xAI/SpaceX Cluster). Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.

onlyrealcuzzo•2 days ago

> Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.

Impressive, yes. But they still don't have a moat...

infecto•2 days ago

I am not sure we should dismiss what they have today. Nobody has yet to come close with a full package ide that works well for coding. Is that not a moat? It is easy for my to in my head discount it, thinking that I could build something myself but between autocomplete and their workflow for agent use, it feels like they have some tangible moat emerging.

virgilp•1 day ago

If we ignore cost (which is kinda hard to ignore), I feel Codex kinda' does it for me. Sure it's not really an editor but I find I don't need that _that much_ and it's easy to launch an external editor (they actually have the feature).

The ironic thing is that half a year ago, after trying factory.ai I thought chat-first interface was a stupid idea that will never work.

chillfox•1 day ago

Have you tried Zed?

I haven’t tried Cursor, so don’t know how they compare, but I like Zed a lot.

Anyway, would love to see a comparison from someone who has used a recent version of each.

alach11•2 days ago

Isn't a large user base and the data collected from those users a moat of sorts?

onlyrealcuzzo•2 days ago

A moat is when you have something other's can't easily get.

Every MAG 7 / FAANG company already has more users and more data...

That's not a moat.

That's traction.

AussieWog93•2 days ago

Honestly the data itself is probably worth heaps even in the company itself collapses. Early attention engineering when humans were still in the loop!!!

kkukshtel•2 days ago

And its still just a vscode fork

icemelt8•1 day ago

Cursor 3 is a complete rewrite, its no longer a fork.

antirez•1 day ago

How much the RL they are doing really improves Kimi K2.5 is to be seen. So, right now, the ground truth is that they combined what they had with a strong open weights model. The RL improvement may be both marginal (since may folks report strong results with vanilla K2.6) and may mostly bias the model towards coding tasks: when a model like this is trained to be generalist, there is a tension between being good at one thing and the other, in terms of SFT and RL. You can see this in the DeepSeek v4 Flash training report for instance but it is a known fact. So if you have the GPUs and a decent RL pipeline that does not run the model you can indeed specialize it a bit more for a given task at the expenses of tasks people will not do inside Cursor. But, so far, the measurable reality is that Cursor uses an open weight model like most could do, and the RL story could be partilly a marketing move to call to Composer 2.5 more than a real strong gain, given that there is no way to verify and K2.5 was already strong. And we also know that they had to partner to do the training, which is also not a good news.

Lionga•2 days ago

They are still a vscode fork with no moat? Like they lost about 70% of users in half a year which goes to show how there is not even the tiniest of moat.

GenerWork•2 days ago

I feel like they've been targeting enterprise pretty hard. I know my company uses them, and the companies that hire us also use Cursor.

Squarex•1 day ago

All enterprises I know use GitHub copilot as they already have Office, Teams, … wonder how will it change with the recent pricing changes

pjmlp•1 day ago

I can tell my company wants nothing with them.

kvetching•2 days ago

Cursor will definitely win the enterprise for coding. Enterprises aren't going to trust a TUI

kilroy123•1 day ago

I think it's going to be brutal for them to compete with OpenAI and Anthropic.

I switched to claude code because of usage. For $200 a month, I would run out of usage halfway through the month. Then be forced to use their composer model or whatever slow, dumb model they served up in their "auto" mode.

For that same $200 a month, I could use claude code and basically never hit usage limits.

I don't understand what people are doing who run into the limits on that max x20 plan. I NEVER have.

liuliu•2 days ago

Since the frontier is only 8-month ahead of DeepSeek, it is hard to see how model training can be a moat as all the tricks are available from open labs in China. You really just need <100m to bootstrap at this point.

wg0•1 day ago

This was the only way forward.

the_duke•1 day ago

In my opinion cursor actually has one of the best harnesses again at the moment.

farco12•1 day ago

One would hope the vscode fork with a $50B valuation and no moat, would wisely spend the money they raised to build a moat.

whywhywhywhy•2 days ago

It's still a VsCode fork just now with a Kimi fine tune and still no moat...

I won't debate that it turns out none of this mattered when it came to being as successful company though and kinda makes anyone who tried to roll their own instead of fork look a little silly.

hkleppe•1 day ago

"No moat", well...

How I see this is that its so important to bundle the model with the right tooling.

Like a racecar, having the best engine doesn't help if the rest of the car lacks other winning properties (reliability, aerodynics etc).

So for Cursor, which IMO, they put themself in a strong position by having both a solid IDE __and__ a solid+cost efficient model. Those two working great in combination for the task they are designed to solve (coding) is more important than benchmarks

DeathArrow•1 day ago

>Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.

With so much money and computing from SpaceX, is not so impressive.

make3•1 day ago

why is that part impressive specifically? they got purchased by SpaceX, they have access to infinite compute and cash now.

& now they're still losing all of their users to Claude Code and Codex.

DeathArrow•1 day ago

>& now they're still losing all of their users to Claude Code and Codex.

Why pay for Cursor when I can use GLM 5.1, Kimi K2.6, MiniMax M2.7, Xiaomi MiMo V2.5 Pro and Deepseek v4 for cheap and use whatever harness I want, including Claude Code.

It's not like Cursor harness is the best out there.

And even if I want to edit the code, I don't need to run the agent harness in an IDE.

make3•1 day ago

these are in the trillion parameters range, not sure it's actually that cheap to have at a reasonable speed without quality degradation & without like.. your own DGX B200

aurareturn•2 days ago

I doubt it's a brand new model. It's likely just Kimi K2.5 further trained on coding.

enraged_camel•2 days ago

They didn't say it's a new model... in fact they said exactly what you just said.

rcleveng•1 day ago

I have to say the new model is quite good at the basics, I've been handing over more and more tasks from Linear straight to it instead of the copy-paste into Claude dance lately.

At this point, more of my complaints are on the harness side, which is odd since originally they were by far the best harness out there.

Support - This is pretty much non-existant, it's community support or sales support.

Interacting with GitHub - this should work and be awesome, Claude code does this well (responding to lint errors and comments). Cursor you have to poke the agent to look at the comments or lint errors, and even then it's about 10% good. Even GitHub Copilot is better here.

Bugbot - I have it setup to trigger manually, but it still seems to wake up and burn 80-120k tokens just to notice it's configured to be manually invoked. When it does run, it tells me there's no issues (but claude or copilot both find real things)

App - When you have both agent window and the ide windows, it's hard to open up the code in the right directory. A simple "cursor ." from the terminal used to do it, now it'll often open the agent window, you have to try a few times for it to work.

I love that they are running super fast, it's just hard when many of the basics break or don't work.

khazhoux•1 day ago

> I've been handing over more and more tasks from Linear straight to it instead of the copy-paste into Claude dance lately

Tangent: we've been using Linear at work and I still don't understand why it claims to be "task tracking for agents". Is there anything at all that lends itself better to agentic workflows compared to JIRA or gitlab/github issues or whatever else?

Seems like Linear just hopped on the buzzword hype train at the exact right moment...

dbalatero•1 day ago

> Seems like Linear just hopped on the buzzword hype train at the exact right moment...

I think you nailed it. Provided an agent can connect and ingest the information in the ticket, that's basically what's needed. I guess it's nice to be able to nudge ticket status and post back to it, but all of those seem like wiring up existing APIs to an MCP and calling it good. I don't see why JIRA couldn't execute on that, despite being Atlassian.

rcleveng•1 day ago

Yup, honestly a google spreadsheet could probably do it as well.

I like the "copy prompt" feature, it's super simple but makes it just a few seconds to go from issue -> claude session.

Also assigning directly to cursor or codex, that's how I handle the easier tasks.

We also have scheduled tasks that elaborate existing tickets with information where needed, again that's just MCP but it works well enough

memoryleakgame•2 days ago

If these benches from their site hold up (they likely wont)

Wouldn't this compress ai revenue like 15x quickly

If they really have a 4.7 opus high equivalent at 1/16 the cost wouldn't this significantly effect all the current capex and planing

Maybe they are getting elon to cover cost

vessenes•1 day ago

It's worth being specific:

"Will this decrease Revenue?" -- only if demand for high quality tokens is inelastic. If demand is instead elastic (grows with cheaper pricing) then revenue will likely increase.

"Will this lower earnings?" -- they have a current inference margin for their old models, and with the Elon deal in place, they have a new inference margin. It might be better or worse than their old one. If it's worse, then they'd need to see a concomitant increase in usage. If they don't, then yes it might lower earnings.

"Will this lower corporate value?" -- no - not least because this company is going to be owned by SpaceX approximately 90 days after IPO -- so all the new owner will care about is being benchmark competitive with Anthropic and oAI for the first n quarters. If they can do that, it will massively increase the corporate value of SX; it's hard to build a frontier lab.

infecto•2 days ago

The way I have read their benchmark results is that they trained a model to work insanely well in their coding workflow. It’s not a general purpose model.

One of the surprisingly hardest problems to solve is to get a model to use the tools you give it access to.

zackify•2 days ago

this thing is so awesome on fast mode, so far i am impressed, some of its observations feel similar to opus.

i use gpt 5.5 and opus 4.7 a lot every day, if i can get good results at this speed, hopefully the usage level holds up on my team plan haha

2001zhaozhao•2 days ago

> compress ai revenue like 15x

that roughly just puts it on par with OpenAI and Anthropic subscriptions in terms of pricing per token

smallnamespace•1 day ago

AI revenue has been going up while the cost per token has been rapidly falling. The Jevons paradox applies here. The cheaper software is, the more software is written. There is not a finite demand for software.

rafaelmn•1 day ago

> AI revenue has been going up while the cost per token has been rapidly falling

Every model release now has been straight price increases since what GPT 4 ? When was the last time a new flagship model decreased prices compared to the previous one ?

jstummbillig•1 day ago

1. GPT 4 has gotten 6x cheaper over it's evolution (from initial release to Turbo to 4o). Maybe you meant "Only since 4o and only since its final release". Alas.

2. We are not interested in how different model naming schemes relate to prices, we are interested in the capabilities. So if you want to learn something about price development you need comparative levels of capabilities, and then look at the prices. 4o is not comparable to 5.5 in the first regard. It is (according to the benchmarks) maybe more comparable to current 5 nano - which is 98% cheaper.

dktp•1 day ago

Opus 4.5 became significantly cheaper directly per token

baq•1 day ago

token efficiency

vb-8448•1 day ago

I, and I guess basically everyone here, don't have access to OAI or Anthropic books, and it's really difficult to disprove your statements but:

- AI revenue going up & cost/token are not related metrics, at least not in the way you are assuming - basically all players (except OAI for the moment) struggling with capacity and/or reducing-dismissing subscription based solutions in favour of pay-per-use. If token cost/token was falling, we would see quite the opposite.

lompad•1 day ago

This is conjecture. There is a reason both openai and anthropic refuse to comment on inference costs. If it were falling so much, they would use it to brag. I really don't understand why so many people keep repeating it without any actual data for the frontier models.

Apart from that, I'm not sure if focusing on tokens is even a good idea, because they are so different from model to model. I'd almost consider them a red herring now.

We could look at tasks instead. Is there anything even remotely suggesting that your typical task you give an LLM now costs less in inference than before?

romanovcode•1 day ago

The problem with this is that we do not know the actual cost. For all we know they might be pulling an Anthropic. Subsidizing costs to get users, then increasing them later on.

yorwba•1 day ago

They're offering a model based on Kimi K2.5 for $0.50/M input and $2.50/M output while the cheapest third-party provider on OpenRouter charges $0.40/M input and $1.90/M output https://openrouter.ai/moonshotai/kimi-k2.5 Those third-party providers have little incentive to subsidize their customers, so Cursor probably has a margin >20% on their inference cost.

The real money furnace is the training, not just of models that get released, but also experimental training runs that fail to move benchmarks and are quietly thrown away. E.g. Cursor claim that 85% of the compute for Composer 2.5 comes from additional training on top of Kimi K2.5, where I'm not sure how they determined that, but it can't have been cheap. Then they say "Together with SpaceXAI, we're training a significantly larger model from scratch, using 10x more total compute."

So yes, they're probably attempting to replicate the Anthropic playbook of paying a large upfront cost for a very good model, and then rapidly acquiring paying customers, hoping that the inference margin will be enough to cover the training cost.

epolanski•1 day ago

I'm not sure that to be the case, it seems like bringing capabilities up and costs down merely serves to induce more demand.

brunooliv•1 day ago

Any reason why they indexed on Kimi K2.5 model? I have tried many open-source ones in Opencode, and, in my experience (standard backend development, Java, Python, Spring, etc) Qwen3.6 is SO MUCH BETTER that's shocking. Kimi can't even get most tool calling arguments right.

CuriouslyC•1 day ago

There's a lead time on models, and there's some tuning gotchas they probably already figured out with Kimi, so they weren't ready to just drop everything and switch. I'm sure they will switch models eventually.

roflcopter69•1 day ago

I recommend reading the entire article

  Together with SpaceXAI, we're training a significantly larger model from scratch, using 10x more total compute.
  With Colossus 2's million H100-equivalents and our combined data and training techniques, we expect this to be a major leap in model capability.

grim_io•1 day ago

I guess this will largely decide if xai is going to pay 60 or 10 billion, depending on the success of the new coding model.

KaoruAoiShiho•1 day ago

Kimi 2.5 has the best long context. For raw coding benchmark scores you can just post train on top of it with more specialized data. 2.5 is kinda old, 2.6 is the current release which is exactly just that and catches up to the frontier in most aspects.

Bombthecat•1 day ago

Cheaper to run?

steviedotboston•1 day ago

It's very confusing that they use the same name as the very well known PHP package manager, composer

https://getcomposer.org/

wesammikhail•1 day ago

I dont know what it is with products names these days. Antigravity, Antimatter, Composer, Clay, Ramp, Bolt, etc.

You'd think the founders would Google for naming conflict before choosing a name.

varun_ch•1 day ago

I genuinely wonder if consulting LLMs for naming advice could be an explanation.

They certainly wouldn’t be great at coming up with new words for a product name.

dewey•1 day ago

Naming issues are as old as time. Apple Computer vs. Apple Records comes to mind as a popular example.

PUSH_AX•2 days ago

They set themselves up for flack when they use whatever these evals are… they did the same for composer 2 which was evaled in close competition with frontier models, spoiler alert, it wasn’t even close in practice.

So now 2.5 is supposed to compete with opus 4.7? Sure…

jmcqk6•1 day ago

That does not match my experience. Composer 2 was fantastic for my uses, and I hit Composer 2.5 with some very difficult things last night, which it handled fast and effectively. I don't really care about benchmarks. I care about practice, and in practice, it's been very very good for me.

tuo-lei•2 days ago

they say it themselves in the post - behavior dimensions "not well captured by existing benchmarks". that was the exact problem with composer 2. not dumber on individual tasks, just bad at session-level decisions like when to stop editing, how much context to carry forward, when to re-read a file vs assume. you don't catch any of that in an isolated eval.

infecto•2 days ago

As I have said before in prior composer threads. The proof is in the usage. I am inclined to somewhat believe the results as I use composer and also take the results for the given context. It’s not a general purpose sota model. It’s a model that runs inexpensively in their coding workflow that is creating results similar to opus or gpt.

criemen•2 days ago

Well is that a statement about the quality of Opus 4.7 or about compose 2.5? :P

chemex•1 day ago

I've been using Claude Code as my daily driver on a React Native + iOS codebase for the last few months. The thing that surprised me wasn't quality differences on individual edits — those are pretty close once you control for harness wiring — but how differently I'd ended up structuring my workflow around each style of tool.

Tab completion + chat-in-sidebar feels like an extension of my editing. An agentic harness feels more like delegating a 20-minute task and coming back to review. Different cognitive load, different bug profile. The "which is better" framing tends to skip over the fact that they reward different working styles.

Two things I'd watch on Composer 2.5 specifically:

1. How it handles long-running multi-file refactors that touch 10+ files. My experience with smaller models in that slot is they lose track of which files they've already edited around 30% of the way through. Frontier models keep the plan coherent for longer.

2. How it deals with non-obvious file boundaries. The thing that takes me out of "let it work" mode is the model deciding it needs to edit a config file I didn't think of. Usually that's right, but occasionally it's spelunking somewhere I don't want it to be.

The Kimi K2.5 base is interesting on its own. Open weights below frontier closed models is the thing worth watching from the harness side. If anyone's set up to fine-tune for a specific harness, this is the moment.

chis•1 day ago

AI slop detected, you're under arrest

jtwaleson•2 days ago

Ok this might be weird but I've moved everyone in my 4 person team to our team plan and costs seem to have sky rocketed compared to the individual plans. Where before most people spent 20-100 USD, now the total bill is more like 1k USD. I haven't gone into the details but it feels like I'm being scammed.

mohsen1•1 day ago

We moved off Cursor and onto Codex + Claude Code. Cost went from multiple thousand per engineer per month to about $500

zackify•1 day ago

Best deal currently:

Cursor team Codex team Claude team

Swap between the models when limited.

I am saving our company a lot of money vs Claude enterprise usage cost

DedlySnek•1 day ago

My company is shifting us from Cursor to Claude due to increased costs.

danbrooks•2 days ago

Check which model you're using.

The fast version of composer is the default now (which costs ~x3 as much).

infecto•2 days ago

Keep in mind I believe there is a larger buffer given to personal plans. If they have 50% extra with the personal plan you now only get 25%.

skeptic_ai•1 day ago

I did some monitoring. 15 accounts, 300 millions tokens input, 200k output went to 0 the 5h quota in 7 hours. 4 parallel tasks.

I think 300 million is too low. For reference before I could do more than 1 billion on same conditions.

PUSH_AX•2 days ago

My cursor costs sky rocketed recently too

wunderlotus•1 day ago

I love Cursor as a tool, but I'm skeptical bc:

1/ CursorBench is so opaque [1] that it makes it hard to trust. Not to mention the v3.1 eval is a newer iteration and there's no insight into the tasks or if the model was just tuned to max it out. Composer 2 previously scored between 60-65% on the previous benchmark eval [2] but scores between 50-55% on CB v3.1[3].

2/ I've experienced Composer 2's performance and it leaves much to be desired as a daily driver for a knowledge worker. but KWs are obviously not the target users and I can see how it's cost-efficient for executing on clearly-defined, discrete coding tasks. Obviously that's their value proposition and they're figuring out how to communicate it well to the target customer. It just doesn't feel like CursorBench is that.

[1] https://cursor.com/blog/cursorbench#building-cursorbench

[2] https://cursor.com/blog/composer-2-technical-report#performa...

[3] https://cursor.com/blog/composer-2-5

everfrustrated•2 days ago

Full details https://cursor.com/blog/composer-2-5

dang•1 day ago

Thanks! Link belatedly changed above.

zurfer•1 day ago

Kudos to the team. Please consider making the model available via API!

bg24•1 day ago

They shipped an SDK recently. https://cursor.com/blog/typescript-sdk

sofumel•about 9 hours ago

I'm currently using Claude Code, but should I cancel it at the next renewal and switch to Composer 2.5?

granzymes•2 days ago

Surprised this got pushed off the front page so quickly! It’s exciting to see what the Cursor team has been able to do with significantly fewer resources than the frontier labs.

I do wish they weren’t joining xAI. Something tells me there will be a contingent of researchers that departs Cursor if that merger is consummated.

dang•1 day ago

It set off the flamewar detector, a,k.a. the overheated discussion detector. We'll turn that off.

granzymes•1 day ago

Thanks, dang! The blog post[1] might be a better source than the twitter thread. Also I regret my typo above (lab -> labs) but too late now!

[1] https://cursor.com/blog/composer-2-5

dang•1 day ago

Thanks! I had been just about to add that maybe the link wasn't the most informative. We've switched it now from https://twitter.com/cursor_ai/status/2056415413077233983.

As for the typo, s's are cheap and I've added one :)

m_mueller•1 day ago

It's a bit confusing to me why they'd make this 'fast' version the default, as it appears to be much more expensive than Composer 2. Wasn't it supposed to be a very cheap alternative to SOTA models?

mrklol•1 day ago

Isn’t it a really cheap alternative to sota models (according to benchmarks)?

ChrisArchitect•2 days ago

Non-x link: https://cursor.com/blog/composer-2-5 (https://news.ycombinator.com/item?id=48182126)

ryanshrott•about 23 hours ago

The cost claim is the easy part to sell. The real test is whether it stays useful in ugly codebases, long files, and repos with a bunch of half-broken conventions. That’s where these assistants usually fall apart, even when the benchmark numbers look great.

machiaweliczny•1 day ago

Tested and it's good. Fast version is bad though. I like planning model in Cursor that it works more like human written design doc instead of too detailed AI plan. Seems like this is more responsible for results that model but still on fast it failed but on normal got good results.

luodaint•1 day ago

Benchmarks measure turn-level capabilities: you feed a task into the system and then grade the result. Capability for production-level usage concerns session-level decision making: does the agent know when to stop editing, retain the right amount of context, or go back and reread the file if the state has changed?

This is not a property of the model, but a property of the discipline; it can be operationalized by what you have documented before the session begins. Without "stop editing where you can no longer follow your changes to the spec" and "go back and read the migration file before changing the schema," there is nothing to halt the process until it fails integration.

Those teams who get consistent results independent of the model being used typically do so because they have operationalized their discipline first. Those switching out models monthly tend to expect the model to supply them.

0fes911•1 day ago

I found composer 2 pretty good as a subagent delegating tasks like auditing for bugs after finishing implementation, but hopefully composer 2.5 will be more reliable so it can be used to implement and execute long running tasks.

WhitneyLand•1 day ago

Say what you want about Cursor but they don’t lack for ambition.

Forking VS Code, going big on bleeding edge features like cloud agents, and now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.

They’ve been highly successful so far. Raised $50B, $2B in revenue, forecast to end 2026 above $6B. But even at these heights, they’re just not in the same league as OpenAI/Anthropic/Google.

And if building a state of the art multitrillion parameter model is not challenging enough, it’s a mountain you don’t climb just once. Every few months you need to push it farther with a new release. Fall off for a couple cycles and like Facebook you may never catch up again.

Not for the faint of heart.

pdq•1 day ago

Why is this comment upvoted?

It is most likely AI generated with a nice "Raised $50B" hallucination and filled with cliches ("thrown down the gauntlet", "mountain you don’t climb just once", "not for the faint of heart").

Aurornis•1 day ago

Good catch. I didn’t even notice it at first, but the hallucinations on top of cliches gives it away.

The account doesn’t have a history of other comments that have too much of an AI vibe, but this one does. Even if it wasn’t AI, it’s misinformation.

WhitneyLand•1 day ago

Please see reply to your other comment on this thread.

WhitneyLand•1 day ago

I wrote this 100% off the top of my head on my phone while eating a sandwich.

Ffs.

edit: removed cursing you out. Sorry but this is frustrating. I don’t leave AI generated comments here (or anywhere else).

Aurornis•1 day ago

EDIT: As others have pointed out, the comment above contains hallucinations (Like the $50 billion number) and a lot of AI tells. The account doesn’t have a history of AI-like comments but the hallucinations and structure in this one are suspicious. If anything, don’t trust the numbers it cites because they’re made up.

Cursor is a team that I want to see succeed. They have stacked their company with very smart people and they’re going hard at a highly competitive market. We all win when there is more competition and more innovation.

My problem is that every few months I look at Cursor’s product offerings and maybe retry it, but it never feels like something I want to use. Part is personal preference, the other part is the fact that my combination of other tools and services just does a better job. Their biggest advantage felt like first-mover advantage when they came out early and captured market share, but at in person meetups I hear stories about companies switching away from Cursor or trying to convince their management to let them switch away. They need to come up with a compelling advantage fast, which is a hard thing to do against the other companies with their virtually unlimited budgets by comparison.

WhitneyLand•1 day ago

So, you’re wrong on two counts.

1. Evidently you’re no longer able to distinguish AI from people as the whole comment was written by a human off the cuff.

2. The numbers are not hallucinations. It’s word on the street reporting, so yes it’s speculative, but a model did not make up it up unless that’s where TechCrunch got it which is not on me.

https://techcrunch.com/2026/04/17/sources-cursor-in-talks-to...

Aurornis•about 23 hours ago

Quoting directly from your comment:

> They’ve been highly successful so far. Raised $50B,

They have not raised $50B. The article you linked says they're raising $2B, not $50B.

The valuation is not the amount raised.

adamkeys•1 day ago

Same, I kick the tires on Cursor every several weeks wanting to find they've finally crossed some chasm I can't quite explain. But every time, I bounce off the ground-truth that they're forked off vscode, which just isn't for me. I think moving agents to the center of their experience and developing a model that focuses on speed/efficiency over maximum depth is a promising step away from being a spicy vscode fork.

whs•1 day ago

My company is heavy on Cursor and I still ask them to provide me GitHub Copilot, for the sole reason that Cursor is probably the reason Microsoft had to implement technical enforcement of their TOS on proprietary plugins. Previously, you could use PyLance on VSCodium but now those plugins do not work outside VSCode anymore.

If Cursor (and every other commercial VSCode forks) didn't use MS extension store in the beginning and violate the TOS these might not have happened.

chrisrickard•1 day ago

Cursor 3 is a full rewrite. No VS Code

causal•1 day ago

Yeah I want them to do well. I find Cursor to be a much better tool for actually working with the code the agent writes than whatever the big vendors provide.

highfrequency•1 day ago

> now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.

To clarify, the model Composer 2.5 announced in this post is not that; it uses Kimi 2.5 as a strong starting point. This is not to discount Cursor's work or future ambitions, but one of the most striking things about the last 6 months is that multiple open-source models/labs are now within striking distance of the frontier closed-sourced labs.

See eg Kimi 2.6 benchmarks: https://www.kimi.com/blog/kimi-k2-6

didroe•1 day ago

They have no choice but to train their own model to try and survive. They're paying API pricing for the top tier models but competing against subsidized subscriptions.

worldsavior•1 day ago

Them raising this much money doesn't mean they're successful, it only means they know how to fool the investors well. A project that is basically an extension to VSCode only adding a chat interface, isn't really worth this much money. Obviously, it's the users, but people think it's something genius and revolutionary, but no.

infecto•1 day ago

This is rsync all over again. Go create it yourself if you think it’s just a simple extension.

worldsavior•about 14 hours ago

You're right, I regret I didn't have the sense to do the same as them at the time.

dtagames•1 day ago

As a heavy user, I don't think the model is their product. Cursor is primarily a harness and lately, a specialized agent dashboard.

Composer, their in house model, is dispatched by other models like Claude Opus for individual items on a task list. No one is suggesting you write your main prompt to Composer 2.

benmusch•1 day ago

they aren't "throwing down the gauntlet", they're trying to find ways to eke margin out of their product by owning a commodity-level coding model. it's an impressive engineering task but it's not particularly ambitious.

Survey8430•1 day ago

AI comment... BOO!

jorl17•1 day ago

I want to like composer, but I just can't.

- Its communication style is completely opposite to Anthropic models. It's not as bad as OpenAI's models, which are obsessed with "shapes", "wrinkles", hyphenated-words, and other cryptic formulations that make you feel like you're not on planet earth after a while talking to them. But it is nonetheless markedly "rude", "dry", "cold", gives off this "entitled I'm right, you're wrong" attitude. I once had composer2-fast accidentally run `rm -rf $HOME` (no harm done) as part of a bug in an install script it wrote and all it could say once it realized it was: "Running script with proper hardening". Qwen's models have clearly been distilled from Anthropic models because they have a much closer communication style and that's why I hope cursor will one day release a new family of composer models derived from that. A damn joy to use.

- It's just dumb. I don't know what they're doing with benchmarks, but for my work (python, bash, docker, whatever), cursor is just incredibly dumb. Always does in 10 lines what could be done in one. Doesn't know loads of internals of things that other models know. Never places things in the right files, constantly makes terrible edits (inline imports, edits without testing). Everything is so complicated when done by composer2, it's just a joke to me at this point. It clearly needs more handholding than Opus 4.x or GPT-5.x. I tried 2.5-fast and it seemed more of the same. And this would sort of be acceptable if it owned up to its incompetence, but it is so confidently incompetent that it's revolting.

I know that for many people the "tone" of the models is not relevant, or maybe they even prefer models like these. I simply cannot work like that.

Ever since Gemini started blowing benchmarks out of the water while being a clearly inferior model incapable of producing anything (and pretty much just doing tool calls without any feedback to the user), I gave up on benchmarks. Composer has been more of the same in that regard.

As a GPT model would say:

   "Small wrinkle: the production-ready benchmark results were tainted by real-world data points. I've assimilated the inconsistencies and added guardrails so that v2 has the right shape for future evaluations."

bingud•1 day ago

Seems like a promising and useful model but its probably scary how much customer data they fed into it to reach this performance

sergiotapia•2 days ago

Congratulations on the launch! I'm interested in trying Cursor but it's very confusing what I should buy. What does the Pro $20 plan get me in usage if I only use Composer 2.5? How fast is the model?

darkwi11ow•2 days ago

I use $20 plan on daily basis for more than a year now, and have yet to exhaust that limit. The plan includes $20 in api costs for non-Cursor premium models and $20 for Composer and Auto models provided by Cursor themselves.

That said, I am pretty old-fashioned coder and use LLM mostly to overcome the blank page problem, which means I review and often rewrite LLM output by hand and avoid prompt loops for a single task.

People who are aiming to not read code any more might find this $20 plan lacking for their needs, however for my needs it fits perfectly.

kaizoku156•2 days ago

The limits are probably even higher than that, i seem to get about 100$+ of usage on composer and about 45-50 usd on non composer models

uf00lme•1 day ago

I wonder why they didn’t train off Kimi 2.6, I hope is it because they already had a good base and not that they messed up that relationship.

NitpickLawyer•1 day ago

> and not that they messed up that relationship.

There's nothing to mess up. The license is MIT w/ attribution, and the attribution clause can be easily sidestepped w/o any legal repercussions. The "drama" was simply content creators going nuts over some misunderstandings and poor comms from some kimi related devs.

re-thc•1 day ago

That's 3.0

Armonsrer•1 day ago

It looks a massive update from cursor and i like their platform Let hope its good

I_am_tiberius•1 day ago

I hope people soon wake up to the fact that they use user data for model fine tuning.

vanuatu•2 days ago

It's always great that more companies are throwing their hat in the ring, especially focusing on value (latency + intelligence + cost)

try-working•1 day ago

A lot of people saying Cursor have no moat. Sure. Neither do OpenAI or Anthropic.

svantana•1 day ago

You could say they have a sort of anti-moat (drawbridge?) since you can use their product to create a competitor. But that's true of most dev tools, in a sense.

big-chungus4•1 day ago

Can you please train Qwen 3.5 like 0.8B to 9B using the same training techniques

jdlyga•2 days ago

It's a bit odd that they're not comparing it against Sonnet

jjice•2 days ago

I don't think so. They're comparing it to the highest tier available models from Anthropic and OpenAI. Generally speaking, Opus is better than Sonnet in almost every way, so why have the redundancy?

3836293648•1 day ago

Price to performance?

jjice•1 day ago

I think their comparison to how their benchmarks compare to Opus are a great way to show "look at similar benchmarks for a fraction of the cost". If it has Opus benchmarks (I don't actually take benchmarks seriously, but for their comparison purposes) and Sonnet is still more than half the price of Opus, I figure it's close enough where it doesn't matter.

CodingJeebus•2 days ago

The tweet specifies that the new model is geared towards long-running tasks, which is what you'd use a model like Opus for anyway.

lukebrichey•2 days ago

this feels super bullish on cursor/spacexai's ability to train a frontier level model. could be truly SOTA on coding given that their RL data is this powerful

XCSme•about 6 hours ago

Can we use Composer 2.5 via API/OpenRouter?

DeathArrow•1 day ago

I think anybody will be much better by acquiring a coding plan from Kimi.com and using Kimi K2.6, with whatever harness they like, including Claude Code, instead of paying more for Cursor's version of Kimi K2.5.

svclaws•2 days ago

Their previous Composer was already marketed as a cheap model capable of competing with SOTA on most tasks. The evals they shared back then backed this up but in my day-to-day usage it fell short across the board. Canceled my cursor subscription and switched to Claude Code a few weeks ago. It has its own shortcomings but in terms of model capability and UX quality Cursor will have a hard time competing in the long term. Elon Musk will be a very good way out for them.

Glohrischi•1 day ago

Hahah wtf? They are training on colossus 2? Their own model?

Dude what the hell happened to Musks Grok? How incapable are they that they give away training compute to Cursor like this?

Weird that the genius Musk doesn't need his own compute, after all shouldn't Macrohard (no joke) already building the worlds software from scratch?

mgambati•1 day ago

Words on the street is that xAI will buy cursor.

Glohrischi•1 day ago

Yeah for 10-60 BILLION. which again makes this even stupider.

For this amount of money you can rebuild cursor and everything else on the market, and with the rest of 9-59 Billion, you just hire experts in coding and let them code real high quality code examples.

And then you just use your existing grok pipeline and just add this functionality.

This xAI stuff has to be run by idiots

radu_floricica•1 day ago

Buy "Cursor", not "Cursor's IP". This means brand, users, and a shitton of data.

And if you combine a shitton of data with a lot of compute, large userbase and good engineers, you have a pretty good chance of doing something interesting.

timmmmmmay•1 day ago

it seems like they were trying that last year, it didn't work, so he flipped out and fired everyone and now plan B is to buy Cursor and run a quick rename of "Composer 3" to "Grok 5"

enraged_camel•1 day ago

I tested it yesterday. It is pretty bad. Just like with Composer 2, it's fast, but quality is nowhere near what Cursor claims with their benchmarks. It is not even at Opus 4.5 level.

I gave it a mix of refactoring tasks and new feature tasks. For each one, I had it write a plan, then I had Codex review it. Codex found major issues with every plan: patterns that don't match the rest of the code base, hallucinated variable/function names, and even outright bugs in the way the plan was written. I fed the feedback to Composer 2. After it made the changes and implemented the revised plan, I had Codex and Opus 4.7 do code reviews, and once again both of them found major bugs.

Overall it was a very frustrating experience. I feel like I wasted a whole day. Which is sad, as I have been looking for an excuse to come back to Cursor. But as things stand, Codex + CC combo cannot be beat, not just in terms of price but also quality.

polski-g•2 days ago

I don't know why their model isn't on Openrouter yet. They must not have enough capacity to offer it.

re-thc•2 days ago

Did they just upgrade Kimi 2.5 to 2.6?

lukebrichey•2 days ago

still uses 2.5

Dongyu_Jia•1 day ago

Will this be the cursor's last dance? LoL