My automated doubt development process

aaself101 2 days ago 26 commentsRead Article on alexself.dev

RU version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

64% Positive

Analyzed from 1204 words in the discussion.

Discussion (26 Comments)Read Original on HackerNews

docheinestages•2 days ago

Most writings about the spec-driven development I see start with a product requirements document that is assumed to be valid. But I doubt that's the case. If so, you would've written about it, and probably would've involved agents in the research that goes into it. My gut feeling tells me there's much more emphasis on implementing the feature than on questioning if it's relevant, feasible, and based on valid assumptions.

HPsquared•2 days ago

Yes, it's called "questioning attitude", one of the traits of a healthy nuclear safety culture (and a good thing to apply in other fields!)

https://www.nrc.gov/docs/ML1433/ML14338A739.pdf

ErroneousBosh•1 day ago

Good article.

A long time ago I dealt with the on-site radio systems at a major oil refinery. It was interesting over the ten years or so that I worked for the company that provided them to see how their safety policies changed, and how other companies with a similar risk profile (like distilleries and whisky bonds) just plain didn't.

For example, they drastically changed how they view Permits to Work. Now a rigidly-enforced PtW would have prevented the Piper A explosion - the permit was returned to the Permit Office, not actually looked at, and then someone assumed that since it had been handed back and the work was *supposed* to be complete, then the work *was* complete. Had they looked at the permit they'd have seen there was more to do and the isolations should remain in place.

Anyway, when I started doing stuff at that site then every permit required a rigorous Method Statement and Risk Assessment. Your RA would get rejected if you failed to mention every single of PPE that you were required to wear on site, and your Method Statement would be rejected if it didn't describe the function and use of every last tool you planned to bring on site.

This was, frankly, fucking *stupid*. It took longer to write the RAMS and apply for the permit than it did to carry out most jobs, and if there was any deviation no matter how small (often because of other work you'd no way of knowing about) the whole thing would have to be stopped and relogged, with a new RAMS taking into account whatever was in your way. Someone's put scaffolding up near the aerial you want to replace? Well tough, you're not getting on site today with that permit!

They changed this about halfway through my time with my previous employer, to a "Risk-Assessed Permit", where you'd describe the risks around the specific tasks you needed to carry out and how you'd mitigate them.

Now your RAP would get rejected if you *did* put on lists of PPE. You're expected to use the correct PPE, you're expected to use your tools correctly and the correct tool for the job. Don't tell me that, just do that.

Hour-long meeting to go over the RAMS? No, five-minute "Toolbox Talk" - look up "Take Five" for some helpful guidelines - and if there are no screaming blockers, crack on, get the job done, get off site, get home. Safely.

Oh now there's some scaffolding right near the aerial you want to work on? Okay, ask the operator of that job if you're going to affect their work. Oh, they're letting you use the scaff to access the roof instead of bringing a boom lift on? Excellent, cross that off the RAP, bring it up at the Toolbox Talk, far safer that way isn't it?

I still do Take Five at work even though I'm predominantly working from home managing network equipment. If there are big complex changes to make or a major piece of work, we'll discuss it together. Anyone got any question? Anyone see something that's going to blow it all up and cause a major outage? Okay, well, you know where I am if you need help. Crack on.

It's a great way to eliminate the kind of mistakes that lead to "I wish I was still bored" days.

burticlies•2 days ago

Biased cause I work there, but that’s where software like Tactiq shines. We just added an MCP, and now the agent has access to the meetings when writing the plan.

Last week I had three meetings with three stakeholders, and the agent was able to gather everyone’s ideas and make sure they are all working together in the feature.

Waterluvian•2 days ago

Add some cron jobs to run a prompt that reviews work being done against the plan and baby, you’ve got a middle manager going.

ben30•2 days ago

Most of my energy is refining a prd these days.

docheinestages•2 days ago

Then how come that process is not agentic and not well-described?

ben30•2 days ago

Personally it's well-defined and agentic - just not circulated.

/understand - agents interrogate the problem /huddle - Thinking panel turns it into a PRD - attacks the premise, PRDs regularly die here /tm - claude-task-master breaks the survivor into a dependency graph

Nobody writes this half up because "agent talked me out of building it" demos worse than "agent built it".

watersb•2 days ago

Strangely reminiscent of an Electric Monk:

The Electric Monk was a labor-saving device, like a dishwasher or a video recorder. Dishwashers washed tedious dishes for you, thus saving you the bother of washing them yourself, video recorders watched tedious television for you, thus saving you the bother of looking at it yourself; Electric Monks believed things for you, thus saving you what was becoming an increasingly onerous task, that of believing all the things the world expected you to believe.

-- Douglas Adams, "Dirk Gently's Holistic Detective Agency"

m12k•2 days ago

I've stumbled on the same workflow. Except for one thing: If I just do as OP does, Claude Code will tend to overengineer. For example it'll build complex solutions to super rare race conditions that have trivial fallout. But I've found that all it takes is a "skeptical pass". Here's how it goes: After having a bunch of specialist subagents review the (plan/implementation), after doing the deduplication/synthesis of their findings, the main agent will bucket them into A) Trivial/obvious fix B) there's multiple possible resolutions, but the LLM had a strong lean, so it went with it on its own C) Genuine ambiguity, where it asks me what to do (and presents its lean) and D) Wontfix. Crucially, after doing this, I have it run a "skeptical pass" where it takes a hard look at these findings and see if maybe some of them deserve to be downgraded. Generally, a lot of things make their way into wontfix this way. I find, I don't need to push back against overengineering, I can have the LLM do so itself, and it'll actually do a decent job of it.

ErroneousBosh•2 days ago

This sounds harder than just writing the code.

m12k•1 day ago

If I was doing all this ad hoc, it might just be. But I’ve had Claude save this as three “skills” (standard workflows) that chain together (review-branch, triage-findings, apply-fixes) so all I need to do is say “review the branch”, make judgment calls on truly ambiguous decisions, and then “apply the fixes”. It’s effort-full but not for me

Vachyas•2 days ago

When you put it like that, it really does, lol.

aself101•2 days ago

This has been my attempt at wrangling the new A.I. assisted development that seems to be overtaking the software engineering profession. I jumped head first into LLM development after observing the trends from the last year and it appears this process might be a viable path forward.

ben30•2 days ago

I’ve had similar feelings how can I trust this if I no longer write the code directly.

I wrote an /assess tool. I designed it to be token light but assesses on everything I could do to regain trust and help AI to improve my code base not by add features but by adding discipline.

jnewton_dev•2 days ago

There's a selection bias here that nobody's mentioned. The people who had bad experiences with this approach probably aren't commenting.

marcus_holmes•2 days ago

I have a similar skill that assesses project drift (how far the current project state is from the original spec and brief) and looks for artifacts of that - orphaned code or features that are no longer relevant, design decisions that made sense at the time but are now dubious, or parts of the design that no longer fit the current project state.

I found it really useful taming the spaghetti that claude tends to generate by itself.

HappySweeney•2 days ago

I think its common to develop an adversarial-collaborative approach to getting some semblance of quality out of AI. I personally favour using multiple models for different roles, having a bunch of continuity documentation maintained, and having the plan surface human-verifiable deliverables as soon as feasible. It does involve more attention than most people would tolerate probably.

tuo-lei•1 day ago

review agents have the same training biases as the one writing the code. you get 30 findings about error handling and edge cases, but wrong domain assumptions slip right through.

dbgrman•2 days ago

Enjoyed it until the first emdash (was half expecting it to arrive anyway). Sorry.

hexasquid•2 days ago

I'm coming around to liking them; they're like a sort of anti-shibboleth.

"Ah, an em-dash", I think: "now I know".

hlieberman•1 day ago

“They inhabit the fulcrum of the process” is straight up AI psychosis talk.