Show HN: Command Center, the AI coding env for people who care about quality

DDarmani 1 day ago 29 commentsRead Article on cc.dev

FR version is available. Content is displayed in original English for accuracy.

Hi HN! We’re Jimmy and Ray. Jimmy is a Thiel Fellow with a Ph. D. from MIT who has worked on programming tools for 15 years; Ray became VP of Sales at a $2B company when he was 19 and has built side-businesses vibe-coding.

Last year, we set to answer the question “If AI can write code 100x faster, then why aren’t you shipping 100x faster?” What we learned shocked us — even fairly nontechnical people and solo founders told us they were spending more than half of their development time reading the AI-written code. And much of the rest of the time was spent either de-slop-ping it, or wishing they had done so.

As luck turns out, our last two products were a tool that quickly onboards people to large codebases ( https://x.com/0xjimmyk/status/1873357324229984677 ) and trainings that taught deep concepts of code quality to CEOs, YC founders, and engineers at top companies ( mirdin.com ), so we were extremely well-positioned to solve these problems.

Command Center is an agentic coding environment focused on quality. With a few keypresses, you can start building 3 features at once and soon have 3 diffs ready, each consisting of 2000 changed lines across 50 files….

This is normally the point where you think “Crap, what now?”

With Command Center, at this point you simply click “Refactor,” and watch the vibed slop turn into readable robustness. Then you click “Generate Walkthrough,” and then suddenly, to read a 2000 line diff, instead of scrolling up and down trying to make sense of it, you just press the right arrow key 200 times. See something you don’t like? Click on line 37, type “Do this and all other network fetches in the background Cmd+Enter,” and you have a few more agents getting your code into final shape. Click or type “Commit,” “Push,” “Create PR” — you just shipped a high quality, non-slop feature

We’re striving to be the best at every step of the pipeline, but can just try Command Center in pieces wherever you feel your current workflow is weakest. We have users who do all their coding in Zed or the Codex app, and then jump over to Command Center for a walkthrough when it finishes running. There’s even a skill that will pop open a Command Center walkthrough from the environment of your choice. Or you can just keep Command Center running while you do your work elsewhere, and if your AI deletes anything, you have Command Center’s snapshots to the rescue.

We launched quietly last year and have been refining since. The quality and usability have kept going up, and Command Center is now ready for a lot more attention.

Since our quiet launch, we’ve seen at least a dozen other agentic coding environments appear….approximately all of which have the same feature set focused on the part which is already easy (generating the first version of the code) and with at best a shoddy answer to the hard part (everything that comes after). Command Center’s focus is making the hard parts easy.

Here’s what our users have to say:

“[The refactorings] give your LLM taste. I’ve never seen an LLM write code this good before.” — Doug Slater, Staff Engineer, Climavision

“With Command Center walkthroughs, I can get through a 400-line diff in less than half the time.” — Prateek Kumar, Platfor Engineer, Sumo Logic

This product is not for everyone. If you’re someone who preaches “the prompt is the source, the code is the compiler output,” then you probably won’t enjoy Command Center.

But if you want to uphold traditional engineering discipline while also shipping 20 PRs a day, then this is the environment for you.

⚡ Community Insights

Discussion Sentiment

66% Positive

Analyzed from 1573 words in the discussion.

Discussion (29 Comments)Read Original on HackerNews

poetril•about 6 hours ago

I do think this style of working is where software engineering work is heading. This style is essentially exactly what my workflow is today using _insert agent harness here_ + plannotator[0]. Linear also recently rolled out something very similar for reviews[1]. The working style of spec driven dev, following by fast code reviews using tools like plannotator/linear/command center seems to be where we are headed, and more more tools like it are popping up nowadays.

0: https://plannotator.ai/

1: https://linear.app/docs/diffs

Fizzadar•about 16 hours ago

Somewhat ironic that the site is completely broken on mobile, the text doesn’t render until you scroll near past it. Production code eh?

Darmani•about 12 hours ago

Hi Fizzadar,

On the one hand, it is true that the website code was pushed 0 minutes before this announcement went up.

On the other hand, I tested just now on two different phones and didn't see any issues. Can you say in more detail what you expected vs. what actually happened?

There was an occlusion issue on some smaller screens, but it's been fixed now.

jpease•about 21 hours ago

“Ray became VP of Sales at a $2B company when he was 19”

I guess that’s OK, but I was skateboarding at 19.

Can you even kick flip?

egamirorrim•about 17 hours ago

How can I have any confidence in the security of your product?

It's extremely hard to convince myself to use a product for the huge variety of often sensitive agent tasks when it's not open source. I understand the business reasons for that, but it's unusual in this space at the moment.

Instead: Can you post any independent security assessments perhaps? Fundamental things like SOC2?

Darmani•about 15 hours ago

Hi egamirorrim,

The basic answer is that it runs locally. If you turn telemetry off and don't use our free Gemini credits, it's trivial to verify that no traffic goes to our servers other than a tiny subscription check. For our enterprise customers, we offer a version that doesn't even do that. Everything stays between you and your model providers (and we support custom and local models).

SOC2 is still a work in progress. I'm a former security researcher with work featured in the New York Times, and I know that doing it right (and not going through Delve) takes time. I can tell you that we have passed a compliance check for a company in a highly-regulated space.

I didn't find your contact info, but I'm available at jimmy@cc.dev, and happy to discuss your needs.

sltr•about 23 hours ago

I'm Doug, quoted above. I took Jimmy's excellent course, and when I learned about Command Center, I subbed immediately. I wasn't disappointed. It's a bit like turning your LLM into a graduate of that course.

pooploop64•about 22 hours ago

Not trying to accuse anyone of anything but this sounds exactly like one of those scam courses that turns out to be a pyramid scheme centered around selling the course to other people.

Darmani•about 22 hours ago

We have a referrer program

Doug has not signed up for it.

sltr•about 13 hours ago

Sorry about the infelicitious timbre. Nope, I'm just a happy customer.

eltonlin•about 22 hours ago

Code walkthroughs are underrated

mklifelife•about 17 hours ago

It's interesting how AI is making software development dramatically faster, but quality is becoming an even bigger differentiator. Building is no longer the bottleneck for many founders. Knowing what to build and maintaining quality are becoming more important.

billehunt•about 21 hours ago

Command Center is really cool. I worked with Jimmy at Thiel Fellowship - wicked smart guy.

Zyros111•about 8 hours ago

Seems like an interesting idea, any tips for how a junior dev can get the most out of using the app with the goal of programming skills growth?

yegemberdin•1 day ago

How do you guys ensure that the refactoring improves the existing code?

i_eat_rocks•about 16 hours ago

The answer to "how do you ensure refactoring improves code?" is embedded in the binary as a system prompt. It's his own blog post about the Embedded Design Principle. The binary contains 9 system prompts, all instruction templates for the LLM. None contain any code for measuring code quality (unfortunately) The pipeline is three steps: suggest-data-unifications - prompts the LLM with the blog post. The prompt starts literally with "For each data structure in the specified code, do the following." suggest-code-unifications - same agent, different prompt. Starts with "Now look at the file and apply the above guidelines." execute-refactoring - runs the LLM's suggestions through a coding agent. No verification between steps. No quality gate. No baseline comparison. The refactoring agent's entire context is the blog post, literally. Read it. Find duplication. Merge it. The closest thing to a "guardrail" is a function which calls eval() on arbitrary user-defined JavaScript. And AutoAcceptDecorator which intercepts LLM messages matching /proceed|go ahead|make|implement|apply/ and auto-replies "Yes, please proceed with the changes." So when you ask "how do you ensure it improves code?" the answer is: we ask an LLM to read a blog post about code quality and then we trust it. And we built a regex that auto-accepts its own changes. The binary also has a separate class for fiber-based refactoring execution, and a full walkthrough generation pipeline that auto-generates code walkthroughs from git diffs. There's a separate workflow for file organization that reads Jimmy Koppel's rule ("Make the design apparent in the code") and applies section headers to changed files. Completely independent from the deduplication agent but uses the same pipeline: read prompt, LLM, apply changes. And the DoItAll workflow chains everything together. DeDuplicate runs in parallel, then embedded-design and organize-file run on every changed file with concurrency:2. It's a full refactoring pipeline.... but every single step is just: read a blog post, LLM, apply. The entire product is two blog posts, a concurrency manager, and a regex.

Darmani•about 24 hours ago

Ooh. The answer is probably more interesting and philosophical than you expected

I can tell you that we do extensive testing, we figured out how to objectively measure the code quality on certain benchmark problems, empirically it's extremely helpful nearly all the time.

But in the general case: it is not actually possible to guarantee this.

That's because whether a change improves the code often depends on information which is literally not present in the codebase.

Some of these are more trite. E.g.: whether a comment is helpful or redundant slop depends on the audience.

Some are deeper. E.g.: whether a piece of duplication is good or bad depends on the intent, and that is often impossible to recover from the source. https://www.pathsensitive.com/2018/01/the-design-of-software...

A simpler example: There's a function that's never called. Should it be deleted?

There's a number of factors outside the codebase that determine the answer. Including the obvious one "Not if your next prompt is going to start using it."

foecalfork•about 23 hours ago

You found a way to objectively measure code quality?? Sell that! Why even sell this course when you have the ability to literally beat every software company?

Darmani•about 23 hours ago

In honesty, that's not a bad idea, and we hadn't thought of that.

It's pretty expensive to measure even for small programs. It's also more of a relative than an absolute measure, i.e.: it scores two variants of the same codebase, but the raw scores aren't very meaningful on their own. So our goal had been to use this in the benchmark set we're working on when we release a standalone refactoring product.

But the more I think about this suggestion, the more I think: "Hmmm, why not?"

embedding-shape•about 12 hours ago

> But if you want to uphold traditional engineering discipline while also shipping 20 PRs a day, then this is the environment for you.

It seems like an interesting tool, curious about trying it out once it's been out for a while. But who in holy hell, with AI assistance or not, could possibly "ship" (merged?) 20 PRs a day and still know what they're doing?

You talk a lot about quality and making sure to avoid slop, but there is no way in heaven you can ship 20 PRs and still ship quality design/architecture/code and avoiding slop.

I'd be curious to see some of those PRs if you're saying you've essentially solved the holy paradox of "ship fast = shit code" or "ship slow = good code".

Darmani•about 12 hours ago

Most days I don't ship 20 PRs. But I think my record is 30.

Three things made that possible.

The first, obviously, is having Command Center.

The second is that a lot of those were fixes or UX improvements under 100 lines.

The third is, no joke, not sleeping. I've had quite a few 20+ hour days in the last 6 months. Some of that is work pressure, but also I've considered getting evaluated for a broken circadian rhythm.

> I'd be curious to see some of those PRs if you're saying you've essentially solved the holy paradox of "ship fast = shit code" or "ship slow = good code".

If you're serious, I'll be happy to get on a call and show you.

embedding-shape•about 10 hours ago

Thanks for explaining, I still have my doubts about the actual quality, but I'm also I'm also very open to be proven wrong! It has happened before, bound to happen again at some point or another :)

> but also I've considered getting evaluated for a broken circadian rhythm.

Heh, personally I fixed this by just adopting the sleep cycle my body wants of going to bed at 04:00/05:00 and going up at 11:00/12:00, life is much better now when I just accept it. One approach if your life can allow it :)

> If you're serious, I'll be happy to get on a call and show you.

Very much so, obviously prefer something async if possible, just a .patch file could suffice I suppose, but could do a call to have a look if that's the only way :) Reach out to my email from my profile and we can coordinate :)

Darmani•about 5 hours ago

> Heh, personally I fixed this by just adopting the sleep cycle my body wants of going to bed at 04:00/05:00 and going up at 11:00/12:00, life is much better now when I just accept it. One approach if your life can allow it :)

That was my life in my mid-late 20's.

But as I've gotten older, my sleep schedule has only gotten more messed up. Now I consider it a victory if I manage to go to sleep before the dawn.

> Very much so, obviously prefer something async if possible, just a .patch file could suffice I suppose, but could do a call to have a look if that's the only way :) Reach out to my email from my profile and we can coordinate :)

Cool, let's chat async then. Contacting you now.

csunoser•about 23 hours ago

Oh hey, this is the jj workshop person!

Darmani•about 23 hours ago

And indeed, I think we're the only agentic coding environment with jj support.

The most difficult code in the 1.0 release is some gymnastics to avoid the appearance of a concurrency conflict with a user running their own jj commands, made at the request of the person who introduced me to jj.

plastic041•about 16 hours ago

Header layout breaks on ipad. haha...

Darmani•about 15 hours ago

Thanks!

The final moments before this launch announcement consisted of me twiddling my thumbs while waiting for our designer to upload any version he could get ready in time that is better than the previous version of our website. So we knew we'd be launching with a lot of imperfections in the visuals. Did test in mobile, but not on iPad.

android521•about 16 hours ago

"even fairly nontechnical people and solo founders told us they were spending more than half of their development time reading the AI-written code.." ~ Is this even true? I haven't read code for at least 6 months and I have many who are in the same boat.

Darmani•about 16 hours ago

In fairness, I did most of these interviews last summer, and I know some people have changed. And while I did go a fair bit outside my network to interview people, there are all sorts of hard-to-understand selection effects that come from me being me. A 21 year-old frat boy who tried doing the same kind of interviewing with the people he could find to interview would probably get different results.

But yes, that is indeed what happened. Multiple times, I'd talk to someone that I'd expect to not be reading the code at all (solo founder, mostly nontechnical), then I'd interview him in detail about his workflow and think "Huh, there was absolutely no point in there where he was reading stuff," and then I'd ask "So how much of your time is reading code?" "60, maybe 70%"