DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
78% Positive
Analyzed from 6647 words in the discussion.
Trending Topics
#cursor#model#composer#more#models#kimi#don#claude#code#vscode

Discussion (221 Comments)Read Original on HackerNews
Really nice to see they're giving credit to the company and I am optimistic Kimi K open models soon will outperform Opus models
I'm very curious to read about the Composer 3 architecture when it comes out. More frontier coding models are a good thing, especially if they diversify into different strengths/weaknesses.
Cursor adds in a large corpus of proprietary coding data — I think this is actually fairly hard to acquire right now, because claude and codex are so good.
I bet there’s enough talent at the Grok team to work with the cursor team and data to get something good out the door.
That said, I don’t track Grok’s engineering leads — I’m not sure who’s currently around, and who is not.
Hard to outperform the model you distill...
Distillation helps with world knowledge and things like that.
They do use it for synthetic data/judging though, so yes, hard to outperform.
Not that they need to. If they can basically match it for a fifth of the price.
They had properly licensed it and were complying with the terms of the license.
For a professional tool it’s getting egregious how little respect they have for my workflows and flow state they way they keep moving, changing iconography and flipping switches of the UI.
It’s clearly being ran by someone who comes from a social app or sales app growth hacking background.
What annoys me is how little they want to integrate with ...anything. Wanna open a link in your default browser? Use our built-in chromium fork, we insist. Wanna open a location in Zed? No, please use our half-baked editor re-implementation. Wanna open a location in Cursors own vscode-based editor? You can't. Managed to work around that somehow? We changed your files to "Worktree TS", disabling all your language servers. It's like programming on an iPhone.
I fixed that by using cursor the agent but not the UI.
I'm just running cursor in GNU Emacs via agent-shell (https://github.com/xenodium/agent-shell). Their cli client (aptly named "agent") supports ACP (agent client protocol) so the UI can be skipped altogether.
I know this sounds like a meme ("use x in emacs") but at this point at the very least i can keep my workflows and my UI all the same and focus on my work rather than "where did $company put $feature this month".
https://cursor.com/docs/cli/installation
https://github.com/xenodium/agent-shell
I gave up, canceled my plan, and went back to boring old VSCode. It feels so much more stable, and my Mac no longer runs out of memory. With cursor I had to reboot my macbook several times a week and had to always be plugged in.
Should try their CLI!
It’s a near constant regression in my workflows. “Multiple agents” got destroyed recently, and the new interface for it some sort of command isn’t as good or reliable. Then you’ve got modals everywhere[1] and truncated bits (like long branch names) that make it insanely frustrating to use.
They’re constantly changing the UI without actually improving it at all. I’ll likely cancel it and use opencode for personal stuff with Deepseek and only use it at work because I have to. There was a time when I appreciated the harness but it’s becoming less useful, or at least noticeable, over time… all the while the actual UI becomes substantially more painful and awkward to use (like @ in the “agents” window being completely unable to find a file because it’s some sort of “global” scope).
One thing that surprises me about this whole segment is that JetBrains haven’t eaten these folks lunch. Their IDEs are leagues better than VSCode but their AI integration is awful by comparison (and the bar is low). I can’t even see how much of the context window I have left.
[1] it’s insane I have to answer questions in a tiny input box I cannot resize or adjust the size of. Let alone the fact the text area I input prompts into cannot be resized. Truly feels like the UI/UX is done by people without any experience.
To me it feels like it's done entirely by an LLM, starting from the product vision.
- lags constantly,
- if you type while it's generating you'll get missed inputs,
- 'plan mode' doesn't clear context before starting work,
- you can't directly edit the plan, you can only ask the bot to do it,
- you can't immediately whitelist commands, only accept once or allow all.
https://cursor.com/cli
One of the things I've came to appreciate about the cli tools like Codex or Claude is that the interface is so limited that every feature they release is still limited and constrained to the same UX limitations, whereas those "funkier" IDEs change from month to month giving me further fatigue.
Impressive, yes. But they still don't have a moat...
The ironic thing is that half a year ago, after trying factory.ai I thought chat-first interface was a stupid idea that will never work.
I haven’t tried Cursor, so don’t know how they compare, but I like Zed a lot.
Anyway, would love to see a comparison from someone who has used a recent version of each.
Every MAG 7 / FAANG company already has more users and more data...
That's not a moat.
That's traction.
I switched to claude code because of usage. For $200 a month, I would run out of usage halfway through the month. Then be forced to use their composer model or whatever slow, dumb model they served up in their "auto" mode.
For that same $200 a month, I could use claude code and basically never hit usage limits.
I don't understand what people are doing who run into the limits on that max x20 plan. I NEVER have.
I won't debate that it turns out none of this mattered when it came to being as successful company though and kinda makes anyone who tried to roll their own instead of fork look a little silly.
How I see this is that its so important to bundle the model with the right tooling.
Like a racecar, having the best engine doesn't help if the rest of the car lacks other winning properties (reliability, aerodynics etc).
So for Cursor, which IMO, they put themself in a strong position by having both a solid IDE __and__ a solid+cost efficient model. Those two working great in combination for the task they are designed to solve (coding) is more important than benchmarks
With so much money and computing from SpaceX, is not so impressive.
& now they're still losing all of their users to Claude Code and Codex.
Why pay for Cursor when I can use GLM 5.1, Kimi K2.6, MiniMax M2.7, Xiaomi MiMo V2.5 Pro and Deepseek v4 for cheap and use whatever harness I want, including Claude Code.
It's not like Cursor harness is the best out there.
And even if I want to edit the code, I don't need to run the agent harness in an IDE.
At this point, more of my complaints are on the harness side, which is odd since originally they were by far the best harness out there.
Support - This is pretty much non-existant, it's community support or sales support.
Interacting with GitHub - this should work and be awesome, Claude code does this well (responding to lint errors and comments). Cursor you have to poke the agent to look at the comments or lint errors, and even then it's about 10% good. Even GitHub Copilot is better here.
Bugbot - I have it setup to trigger manually, but it still seems to wake up and burn 80-120k tokens just to notice it's configured to be manually invoked. When it does run, it tells me there's no issues (but claude or copilot both find real things)
App - When you have both agent window and the ide windows, it's hard to open up the code in the right directory. A simple "cursor ." from the terminal used to do it, now it'll often open the agent window, you have to try a few times for it to work.
I love that they are running super fast, it's just hard when many of the basics break or don't work.
Tangent: we've been using Linear at work and I still don't understand why it claims to be "task tracking for agents". Is there anything at all that lends itself better to agentic workflows compared to JIRA or gitlab/github issues or whatever else?
Seems like Linear just hopped on the buzzword hype train at the exact right moment...
I think you nailed it. Provided an agent can connect and ingest the information in the ticket, that's basically what's needed. I guess it's nice to be able to nudge ticket status and post back to it, but all of those seem like wiring up existing APIs to an MCP and calling it good. I don't see why JIRA couldn't execute on that, despite being Atlassian.
I like the "copy prompt" feature, it's super simple but makes it just a few seconds to go from issue -> claude session.
Also assigning directly to cursor or codex, that's how I handle the easier tasks.
We also have scheduled tasks that elaborate existing tickets with information where needed, again that's just MCP but it works well enough
Wouldn't this compress ai revenue like 15x quickly
If they really have a 4.7 opus high equivalent at 1/16 the cost wouldn't this significantly effect all the current capex and planing
Maybe they are getting elon to cover cost
"Will this decrease Revenue?" -- only if demand for high quality tokens is inelastic. If demand is instead elastic (grows with cheaper pricing) then revenue will likely increase.
"Will this lower earnings?" -- they have a current inference margin for their old models, and with the Elon deal in place, they have a new inference margin. It might be better or worse than their old one. If it's worse, then they'd need to see a concomitant increase in usage. If they don't, then yes it might lower earnings.
"Will this lower corporate value?" -- no - not least because this company is going to be owned by SpaceX approximately 90 days after IPO -- so all the new owner will care about is being benchmark competitive with Anthropic and oAI for the first n quarters. If they can do that, it will massively increase the corporate value of SX; it's hard to build a frontier lab.
One of the surprisingly hardest problems to solve is to get a model to use the tools you give it access to.
i use gpt 5.5 and opus 4.7 a lot every day, if i can get good results at this speed, hopefully the usage level holds up on my team plan haha
that roughly just puts it on par with OpenAI and Anthropic subscriptions in terms of pricing per token
Every model release now has been straight price increases since what GPT 4 ? When was the last time a new flagship model decreased prices compared to the previous one ?
2. We are not interested in how different model naming schemes relate to prices, we are interested in the capabilities. So if you want to learn something about price development you need comparative levels of capabilities, and then look at the prices. 4o is not comparable to 5.5 in the first regard. It is (according to the benchmarks) maybe more comparable to current 5 nano - which is 98% cheaper.
- AI revenue going up & cost/token are not related metrics, at least not in the way you are assuming - basically all players (except OAI for the moment) struggling with capacity and/or reducing-dismissing subscription based solutions in favour of pay-per-use. If token cost/token was falling, we would see quite the opposite.
Apart from that, I'm not sure if focusing on tokens is even a good idea, because they are so different from model to model. I'd almost consider them a red herring now.
We could look at tasks instead. Is there anything even remotely suggesting that your typical task you give an LLM now costs less in inference than before?
The real money furnace is the training, not just of models that get released, but also experimental training runs that fail to move benchmarks and are quietly thrown away. E.g. Cursor claim that 85% of the compute for Composer 2.5 comes from additional training on top of Kimi K2.5, where I'm not sure how they determined that, but it can't have been cheap. Then they say "Together with SpaceXAI, we're training a significantly larger model from scratch, using 10x more total compute."
So yes, they're probably attempting to replicate the Anthropic playbook of paying a large upfront cost for a very good model, and then rapidly acquiring paying customers, hoping that the inference margin will be enough to cover the training cost.
https://getcomposer.org/
You'd think the founders would Google for naming conflict before choosing a name.
They certainly wouldn’t be great at coming up with new words for a product name.
So now 2.5 is supposed to compete with opus 4.7? Sure…
Tab completion + chat-in-sidebar feels like an extension of my editing. An agentic harness feels more like delegating a 20-minute task and coming back to review. Different cognitive load, different bug profile. The "which is better" framing tends to skip over the fact that they reward different working styles.
Two things I'd watch on Composer 2.5 specifically:
1. How it handles long-running multi-file refactors that touch 10+ files. My experience with smaller models in that slot is they lose track of which files they've already edited around 30% of the way through. Frontier models keep the plan coherent for longer.
2. How it deals with non-obvious file boundaries. The thing that takes me out of "let it work" mode is the model deciding it needs to edit a config file I didn't think of. Usually that's right, but occasionally it's spelunking somewhere I don't want it to be.
The Kimi K2.5 base is interesting on its own. Open weights below frontier closed models is the thing worth watching from the harness side. If anyone's set up to fine-tune for a specific harness, this is the moment.
Cursor team Codex team Claude team
Swap between the models when limited.
I am saving our company a lot of money vs Claude enterprise usage cost
The fast version of composer is the default now (which costs ~x3 as much).
I think 300 million is too low. For reference before I could do more than 1 billion on same conditions.
1/ CursorBench is so opaque [1] that it makes it hard to trust. Not to mention the v3.1 eval is a newer iteration and there's no insight into the tasks or if the model was just tuned to max it out. Composer 2 previously scored between 60-65% on the previous benchmark eval [2] but scores between 50-55% on CB v3.1[3].
2/ I've experienced Composer 2's performance and it leaves much to be desired as a daily driver for a knowledge worker. but KWs are obviously not the target users and I can see how it's cost-efficient for executing on clearly-defined, discrete coding tasks. Obviously that's their value proposition and they're figuring out how to communicate it well to the target customer. It just doesn't feel like CursorBench is that.
[1] https://cursor.com/blog/cursorbench#building-cursorbench
[2] https://cursor.com/blog/composer-2-technical-report#performa...
[3] https://cursor.com/blog/composer-2-5
I do wish they weren’t joining xAI. Something tells me there will be a contingent of researchers that departs Cursor if that merger is consummated.
[1] https://cursor.com/blog/composer-2-5
As for the typo, s's are cheap and I've added one :)
This is not a property of the model, but a property of the discipline; it can be operationalized by what you have documented before the session begins. Without "stop editing where you can no longer follow your changes to the spec" and "go back and read the migration file before changing the schema," there is nothing to halt the process until it fails integration.
Those teams who get consistent results independent of the model being used typically do so because they have operationalized their discipline first. Those switching out models monthly tend to expect the model to supply them.
Forking VS Code, going big on bleeding edge features like cloud agents, and now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.
They’ve been highly successful so far. Raised $50B, $2B in revenue, forecast to end 2026 above $6B. But even at these heights, they’re just not in the same league as OpenAI/Anthropic/Google.
And if building a state of the art multitrillion parameter model is not challenging enough, it’s a mountain you don’t climb just once. Every few months you need to push it farther with a new release. Fall off for a couple cycles and like Facebook you may never catch up again.
Not for the faint of heart.
It is most likely AI generated with a nice "Raised $50B" hallucination and filled with cliches ("thrown down the gauntlet", "mountain you don’t climb just once", "not for the faint of heart").
The account doesn’t have a history of other comments that have too much of an AI vibe, but this one does. Even if it wasn’t AI, it’s misinformation.
Ffs.
edit: removed cursing you out. Sorry but this is frustrating. I don’t leave AI generated comments here (or anywhere else).
Cursor is a team that I want to see succeed. They have stacked their company with very smart people and they’re going hard at a highly competitive market. We all win when there is more competition and more innovation.
My problem is that every few months I look at Cursor’s product offerings and maybe retry it, but it never feels like something I want to use. Part is personal preference, the other part is the fact that my combination of other tools and services just does a better job. Their biggest advantage felt like first-mover advantage when they came out early and captured market share, but at in person meetups I hear stories about companies switching away from Cursor or trying to convince their management to let them switch away. They need to come up with a compelling advantage fast, which is a hard thing to do against the other companies with their virtually unlimited budgets by comparison.
1. Evidently you’re no longer able to distinguish AI from people as the whole comment was written by a human off the cuff.
2. The numbers are not hallucinations. It’s word on the street reporting, so yes it’s speculative, but a model did not make up it up unless that’s where TechCrunch got it which is not on me.
https://techcrunch.com/2026/04/17/sources-cursor-in-talks-to...
> They’ve been highly successful so far. Raised $50B,
They have not raised $50B. The article you linked says they're raising $2B, not $50B.
The valuation is not the amount raised.
If Cursor (and every other commercial VSCode forks) didn't use MS extension store in the beginning and violate the TOS these might not have happened.
To clarify, the model Composer 2.5 announced in this post is not that; it uses Kimi 2.5 as a strong starting point. This is not to discount Cursor's work or future ambitions, but one of the most striking things about the last 6 months is that multiple open-source models/labs are now within striking distance of the frontier closed-sourced labs.
See eg Kimi 2.6 benchmarks: https://www.kimi.com/blog/kimi-k2-6
Composer, their in house model, is dispatched by other models like Claude Opus for individual items on a task list. No one is suggesting you write your main prompt to Composer 2.
- Its communication style is completely opposite to Anthropic models. It's not as bad as OpenAI's models, which are obsessed with "shapes", "wrinkles", hyphenated-words, and other cryptic formulations that make you feel like you're not on planet earth after a while talking to them. But it is nonetheless markedly "rude", "dry", "cold", gives off this "entitled I'm right, you're wrong" attitude. I once had composer2-fast accidentally run `rm -rf $HOME` (no harm done) as part of a bug in an install script it wrote and all it could say once it realized it was: "Running script with proper hardening". Qwen's models have clearly been distilled from Anthropic models because they have a much closer communication style and that's why I hope cursor will one day release a new family of composer models derived from that. A damn joy to use.
- It's just dumb. I don't know what they're doing with benchmarks, but for my work (python, bash, docker, whatever), cursor is just incredibly dumb. Always does in 10 lines what could be done in one. Doesn't know loads of internals of things that other models know. Never places things in the right files, constantly makes terrible edits (inline imports, edits without testing). Everything is so complicated when done by composer2, it's just a joke to me at this point. It clearly needs more handholding than Opus 4.x or GPT-5.x. I tried 2.5-fast and it seemed more of the same. And this would sort of be acceptable if it owned up to its incompetence, but it is so confidently incompetent that it's revolting.
I know that for many people the "tone" of the models is not relevant, or maybe they even prefer models like these. I simply cannot work like that.
Ever since Gemini started blowing benchmarks out of the water while being a clearly inferior model incapable of producing anything (and pretty much just doing tool calls without any feedback to the user), I gave up on benchmarks. Composer has been more of the same in that regard.
As a GPT model would say:
That said, I am pretty old-fashioned coder and use LLM mostly to overcome the blank page problem, which means I review and often rewrite LLM output by hand and avoid prompt loops for a single task.
People who are aiming to not read code any more might find this $20 plan lacking for their needs, however for my needs it fits perfectly.
There's nothing to mess up. The license is MIT w/ attribution, and the attribution clause can be easily sidestepped w/o any legal repercussions. The "drama" was simply content creators going nuts over some misunderstandings and poor comms from some kimi related devs.
Dude what the hell happened to Musks Grok? How incapable are they that they give away training compute to Cursor like this?
Weird that the genius Musk doesn't need his own compute, after all shouldn't Macrohard (no joke) already building the worlds software from scratch?
For this amount of money you can rebuild cursor and everything else on the market, and with the rest of 9-59 Billion, you just hire experts in coding and let them code real high quality code examples.
And then you just use your existing grok pipeline and just add this functionality.
This xAI stuff has to be run by idiots
And if you combine a shitton of data with a lot of compute, large userbase and good engineers, you have a pretty good chance of doing something interesting.
I gave it a mix of refactoring tasks and new feature tasks. For each one, I had it write a plan, then I had Codex review it. Codex found major issues with every plan: patterns that don't match the rest of the code base, hallucinated variable/function names, and even outright bugs in the way the plan was written. I fed the feedback to Composer 2. After it made the changes and implemented the revised plan, I had Codex and Opus 4.7 do code reviews, and once again both of them found major bugs.
Overall it was a very frustrating experience. I feel like I wasted a whole day. Which is sad, as I have been looking for an excuse to come back to Cursor. But as things stand, Codex + CC combo cannot be beat, not just in terms of price but also quality.