Claude Fable 5 vs. GPT-5.5: Better Planning, Similar Execution

lubujackson•1 day ago

My approach lately, in a corporate env, has been to use Opus 4.8 for planning and GPT 5.5 for execution. I think this is a good balance, as Opus (and Fable) tend to have a wider aperture and seem better at realizing downstream effects (like updating tests, docs, etc) that GPT sometimes misses.

On the other hand, GPT feels much more consistent and direct with execution, where Opus might fade or timeout because Anthropic's servers are on fire at 2pm on a Monday, or take longer than necessary burning tokens for the same result. GPT seems more consistent and dots all the i's, etc.

I was trying Fable for execution and noticed a fair bit of what looked like thrashing or farting around rewriting tests that it just made which were failing, which didn't give me a lot of confidence. But the final result was clean, just a longer path to get there.

I then like to have GPT or Opus review my PR for any issues before I spend time reading the output. This usually surfaces some stuff to tweak, but with Fable it was coming back clean. Again, this was a small window of normal usage for a few days, but some interesting takeaways.

If Fable doesn't come back it's not the end of the world for me and in some ways I prefer a bit more of an antagonistic relationship. It makes a nice in-road to reasoning about the code and how I might want restructure things. This is a bit harder when the code is "bug free" except for subtle or architectural decisions you can overlook, but I find if I sweat the architecture early on, anything beneath that is compartmentalized and stays trivial to fix.

ricardobeat•1 day ago

I guess it depends a lot on what kind of programming you are doing. Claude Opus, even Sonnet, are significantly better at frontend development than Codex.

Claude seems to write the exact code that you expect, about 90% of the time, and consistently follows project standards; while Codex goes on wild goose chases creating unnecessary indirection and abstractions – they work correctly, but add cruft. I can spot both with decent confidence in the project I’m currently working.

bicepjai•1 day ago

Yes this is the way. I have been shouting the same. https://news.ycombinator.com/item?id=48388550 Opus is yellow and gpt is green, if you have read the book surrounded by idiots

justiceforsaas•1 day ago

Do you have an in-between step when you go from planning to execution? Eg. have a PLAN.md file you then ask the execution model to implement.

bicepjai•about 16 hours ago

Opus does a great job explaining a plan and writing the first plan. But gpt is the one who gives more honest review and does good rewrites. Opus can explain with as I diagrams real well, but gpt can’t do that

sutterd•1 day ago

Fable was a big improvement in planning for me over Opus. I usually do a bit of work preparing tasks before handing them off to Opus or else I get bad results. I didn't plan on writing software this week because I was working on other things but changed my mind to test out Fable. I didn't have any work prepared. Fable was able to write the high level plans that later turned into coding tasks. Of course any model could wirte plans like that, but I had confidence in these plans similar how Opus 4.5 gave me a huge jump in confidence in the code it wrote. (Honest, I am not paid to write this.)

justiceforsaas•1 day ago

Honestly the code gen part has been “good enough” for a while now, especially with models like Opus. The broader point this post is making is that newer SOTA models are improving at the "planning layer", and this is usually the the part a senior developer would usually handle (identifying edge cases, thinking ahead, thinking about tradeoffs, etc.)

dpbrinkm•1 day ago

Was it really that bad when you would use skills like the superpowers pack?

dmzxnico•1 day ago

I felt Opus went straight to the point, less than GPT 5.5 though. Fable was clearly above that. Just went to work like a workaholic. Shame it's already off the list.

Claude Fable 5 vs. GPT-5.5: Better Planning, Similar Execution

⚡ Community Insights

Discussion (9 Comments)Read Original on HackerNews