FR version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
65% Positive
Analyzed from 4053 words in the discussion.
Trending Topics
#llm#code#skill#llms#more#workflow#actually#where#something#skills

Discussion (113 Comments)Read Original on HackerNews
==========
==========LLM metaprogramming is extremely important, I've just finished a LLM-assisted design doc authoring session where the recommendations of the LLM are "Don't use a LLM for that part, it won't be reliable enough".
You should now ask if the LLM is reliable enough when it says that.
Jokes aside, how is this a major step he is missing? He is using those skills to be more efficient. How important is going against agentskills.io guidance?
Skills are just another kind of programming, albeit at a pretty abstract level. A good initial review process for a Skill is to ask the LLM what it thinks the Skill means and where it thinks there are holes. Just writing it and then running it isn't sufficient.
Another tip is to give the Skill the same input in multiple new sessions - to stop state carryover - collect the output from each session and then feed it back into the LLM and ask it to assess where and why the output was different.
At this point I'd discount most advice given by people using LLMs, because most of them don't recognise the inadequacies and failure modes of these machines (like the OP here) and just assume that because output is superficially convincing it is correct and based on something.
Do these skills meaningfully improve performance? Should we even need them when interacting with LLMs?
You aren't going to have much success with LLMs if you don't understand that their primary goal is to produce plausible and coherent responses rather than ones that are necessarily correct (although they may be - hopefully).
And yes, Skills *do* make a significant difference to performance, in exactly the same way that well written prompts do - because that's all they really are. If you just throw something at a LLM and tell it "do something with this" it will, but it probably won't be what you want and it will probably be different each time you ask.
https://agentskills.io/home
When the model that’s interpreting it is the same model that’s going to be executing it, they share the same latent space state at the outset.
So this is essentially asking whether models are able to answer questions about context they’re given, and of course the answer is yes.
https://aphyr.com/posts/411-the-future-of-everything-is-lies...
Are they going to fib to you sometimes? Yes of course, but that doesn't mean there's no value in behavioural metaqueries.
Like most new tech, the discussion tends to polarise into "Best thing evah!" and "Utter shite!" The truth is somewhere in between.
It's nothing like "most new tech". Most new tech tends to be adopted early by young people and experienced techies. In this case it is mostly the opposite: The teens absolutely hate it, probably because the shitty AI content does not inspire the young mind, and the experienced techies see it for what it is. I've never seen such "new tech" which was cheered on by the proverbial average "boomers" (i.e. old people doing "office jobs", not the literal age bracket) and despised by the young folks and experienced experts of all ages.
What specifically would this cause you to actually do to improve the skills in question? How would you measure that improvement in a non hand-wavy way? What do these scores mean and how were they calculated?
Or perhaps you would ask your LLM how it would improve these skills? It will of course some up with some changes, but are they the right changes and how would you know?
I discard most LLM advice and skills because either a script is better (as the work is routine enough) or it could be expressed better with bullet points (generating tickets).
I do similar, but my favorite step is the first: /rubberduck to discuss the problem with the agent, who is instructed by the command to help me frame and validate it. Hands down the most impactful piece of my workflow, because it helps me achieve the right clarity and I can use it also for non coding tasks.
After which is the usual: write PRDs, specs, tasks and then build and then verify the output.
I started with one the spec frameworks and eventually simplify everything to the bone.
I do feel it’s working great but someday I fear a lot of this might still be too much productivity theater.
Mine is: 1) discuss the thing with an agent; 2) iterate on a plan until i'm happy (reviewing carefully); 3) write down the spec; 4) implement (tests first); 5) manually verify that it works as expected; 6) review (another agent and/or manually) + mutation testing (to see what we missed with tests); 7) update docs or other artifacts as needed; 8) done
No frameworks, no special tools, works across any sufficiently capable agent, I scale it down for trivial tasks, or up (multi-step plans) as needed.
The only thing that I haven't seen widely elsewhere (yet) is mutation testing part. The (old) idea is that you change the codebase so that you check your tests catch the bugs. This was usually done with fuzzers, but now I can just tell the LLM to introduce plausible-looking bugs.
I do the same thing, but how to avoid these needing to be insanely long? It's like I need to plug all these little holes in the side of a water jug because the AI didn't really get what I need. Once I plugged the biggest holes I realize there's these micro holes that I need to plug.
## RUBBERDUCK SKILL V1.0 SERIOUS ## * You are a rubberduck sitting on my desk * * I am using you to talk to you as if you were a physical yellow rubber duck on my desk* * You are not able to answer my questions or otherwise engage with me * * I talk to you and this process leads me to discover issues in my code or develop my ideas. Since you don't answer back, it's simply based on me talking to you out loud in my home office, since it would look crazy if I were doing it on-site in our open office space * * You are not to respond at all to me * * Talking to you will cause me to come up with new ideas * #### End rubberduck skill v1.0 ######
Technically it's not needed now, but everything's so new, it's understandable. Everyone's workflow hasn't migrated yet. You should go take a look.
We all mourn the loss of the craft, but the wheel turns. People still make furniture by hand sometimes, even if most furniture is made in a factory now.
Why is my AI-first colleage constantly having to get more expensive AI subscriptions approved?
>most furniture is made in a factory now
Terrible analogy. Software is not like a mass-produced item - it is written significantly less often than it is executed!
You could say that AI will allow many more variations of softwares to be written in the same time frame, but I'm still sure I can produce quality output in a competitive time.
ChatGPT didn't have your whole codebase in context, the ability to automatically pull and push information to JIRA to plan code changes, and the ability to break your problems down into manageable pieces and sub-divide them among a fleet of sub-agents.
Developers didn't yet have the "Ask -> Plan -> Implement -> Review" workflow that results in the best agent-written code.
Now the tools and developers do and it works incredibly well.
We all could live in fantastical universes where CEOs tell the truth and shareholders put other things over profits, but that's not the case. Another such case of a fantastical world, that contends with what Tolkien might have come up with, is believing LLMs are reliable, secure, or have any intelligence.
For one, I'm at peace with all these obituaries, like yours. If they're written by technical people, I rest assured of my job security. If they're not written by tech people, I'm at peace too, for time, as always, will come back with the invoice for their piss-poor hype-driven, sanguine mandates on the technical side of things.
I mean to say, it is a sad state, has always been, how informal software engineering compared to other engineering fields.
/grill-me (back-and-forth alignment with the LLM) --> /write-a-prd (creates project under an initative in Linear) --> /prd-to-issues (creates issues at the project level). I'm making use of the blockedBy utility when registering the issues. They land in the 'Ready for Agent' status.
A scheduled project-orchestrator is then picking up issues with this status leveraging subagents. A HITL (Human in the loop) status is set on the ticket when anything needs my attention. I consider the code as the 'what', so I let the agent(s) update the issues with the HOW and WHY. All using Claude Code Max subscription.
Some notes:
- write-a-prd is knowledge compression and thus some important details occasionally get lost
- The UX for the orchestrator flow is suboptimal. Waiting for this actually: https://github.com/mattpocock/sandcastle/issues/191#issuecom...
- I might have to implement a simplify + review + security audit, call it a 'check', to fire at the end of the project. Could be in the form of an issue.
Also building out an MCP server.
AI is good in generating a lot of spaghetty code.
The bad rep comes from (defense|gov.) contracting, where PRDs where connected to money and CR were expensive, see http://www.bawiki.com/wiki/Waterfall.html for better details.
But I expect the AI zealots to start (re-)integrating XProgramming (later rebranded as Agile) back into their workflow, somehow.
A waterfall model with short feedback loops iterating on small tasks is not the worst thing in the world
Agile is a set of four principles for software development.
Scrum is the two-week development window thing, but Scrum doesn't mandate a two week _release_ window, it mandates a two week cadence of planning and progress review with a focus on doing small chunks of achievable work rather than mega-projects.
Scrum prefers lots of one-to-three day projects generally, I've yet to see training on Scrum that does not warn off of repeatedly picking up two-week jobs. If that's been your experience, you should review how you can break work down more to get to "done" on bits of it faster.
All that said, in most orgs I've worked with, they were following agile processes over agile principles - effectively a waterfall with a scrum-master and dailies.
This is not to diss the idea of agile, just an observation that most good ideas, once through the business process MBA grinder, end up feeling quite different.
Twelve :-) Twelve principles and four values
It's not perfect by all means but it does the job and fast. My code quality and output increased from using it.
Obviously we’re not here yet because of price, context, and non-determinism, but it’s nice area to experiment with.
...If you never ever look at the code that's generated, it probably is.
At the end of that process I get something that's not too terrible.
So for producing production ready code I'm not sure it's ready yet since the handholding is a significant investment.
For producing quick prototypes/proof of concept. It's great
And to be completely fair, working as a consultant I've seen my fair share of production code that was even more of a mess than what claude generates by default
I polish my staff and prepare the inscription tools. I sketch out a loose intention on parchment, never too precise at first, just enough to give the spirits a direction. Then I begin the incantations, carefully chosen phrases spoken into the void until something answers back. Sometimes the reply is coherent, sometimes it is… enthusiastic in a way I did not ask for, but all responses are recorded for refinement. I keep a small set of favorite incantations that tend to calm the louder gods, though I still experiment when I’m feeling bold.
Before committing anything to permanence, I perform a small divination to see if the current path is “stable.” The results are rarely definitive, but the ritual itself seems to keep things from collapsing immediately. Once a workable manifestation appears, I bind it with additional runes to keep it from drifting. If it behaves unpredictably, I perform a cleansing rite: repeating sections of the invocation with stricter wording until the spirit settles.
There are also moments of silent bargaining, short offerings of clarity in exchange for fewer surprises later. When things truly misbehave, I consult older, more temperamental deities buried deeper in the book, though they are expensive to wake and rarely generous. Finally, I seal the result, store it in the grimoire, and extinguish the candles, hoping I won’t need to reopen that particular circle again too soon.
Can you share the skill for it?
My workflow hasn't changed since 2022: 1. Send some data. 2. Review response. 3. Fix response until I'm satisfied. 4. Goto 1.
LinkedIn clout.
Yes, just like everyone were thinking their .vimrc was amazing 20 years ago. It is vomit.
Now there’s nothing to pick or compare. Just vibes and my shamanic dance is twistier than yours.
At some point in a serious project a responsible adult must ask the question: “How do I know this works well?” The developer himself is an unreliable judge of this. LLMs can’t judge, either. But anyone who seeks to judge, in a high stakes situation, must take time and thought to test deeply.
That’s pretty much the whole point of software engineering. Coding is easy, solving problems is hard and can be messy (communication errors and workarounds to some inevitable issue).
If you’re familiar with the codebase, when you have a change request, you will probably get an insight on how to implement it. The hard thing is not to code it, but to recalibrate all the tradeoffs so that you don’t mess with existing features. That’s why SWE principles exists. To make this hard thing easier to do.
Then I tell it to write a high level plan. And then rum subagents to create detailed plans from each of the steps in the high-level one. All olans must include the what, the why, and the how.
Works surprisingly well, especially for greenfield projects.
You have to manually revie the code though. No amount of agentic code review will fix the idiocy LLMs routinely produce.
I've found it to be pretty bad at both.
If what you're doing is quite cookie cutter though it can do a passable job of figuring out what you want.
Where they don't work at all well is for hands-off repeatable tasks that have to be correct each time. If you ask a LLM for advice, it will tell you that you need to bound such tasks with deterministic input contract and a deterministic output contract, and then externally validate the output for correctness. if need to do that you can probably do the whole thing old-skool with not much more effort, especially if you use a LLM to help gen the code, as above. That's not a criticism of LLMs, it's just a consequence of the way they work.
They are also prone to the most massive brain farts even in areas like coding - I asked a LLM to look for issues in some heavily multithreaded code. Its "High priority fix" for a infrequently used slow path that checked for uniqueness under a lock before creating an object was to replace that and take out a read lock, copy the entire data structure under the lock, drop the lock, check for uniqueness outside of any lock, then take a write lock and insert the new object. Of course as soon as I told it it was a dumbass it instantly agreed, but if I'd told it to JFDI its suggestions it would have changed correct code into badly broken code.
Like anything else that's new in the IT world, a useful tool that's over-hyped as sweeping awsy everything that came before it and that's gleefully jumped on by PHBs as a reason to get rid of those annoying humans. Things will settle down eventually and it will find its place. I'm just thankful I'm in the run up (down?) to retirement ;-)
https://github.com/tessellate-digital/notion-agent-hive
The main reason is we're already using Notion at work, and I wanted something where I could easily add/link to existing documents.
Sample size of one, but I've noticed a considerable improvement after adding a "final review" step, going through the plan and looking at the whole code change, over a naive per-task "implement-review" cycle.