I've tried these larger agent skillsets in the past and felt it was a waste of time because it was just doing too much. Just like vim it's often better to pick and choose from the community instead of installing skills like they are an IDE. Skills are way too personal because every dev and dev team is different. So better to treat these as a reference for your own config rather than bulk install someone else's config.
ai_fry_ur_brain•about 1 hour ago
Cant wait for everyone to realize they've wasted a year + messing with agents and experiencing a feeling of psuedo productivity.
_sharp•5 minutes ago
Right, just like all the productivity lost when people stopped using paper ledgers to mess around with these so-called 'databases'
wahnfrieden•21 minutes ago
You haven’t made money from their use yet?
nothinkjustai•about 1 hour ago
You’ll get downvoted for this hearsay!
footy•13 minutes ago
I think you mean heresy. But maybe I don't get the reference you're making when you say hearsay
konaraddi•7 minutes ago
There’s so many ways, many redundant, to set up agents for software development that beyond personal/team/org needs+tastes, I need to look into setting up some benchmarks to evaluate what set up is optimal or whether the differences are even worth it.
thatmf•9 minutes ago
Why are people so excited to put themselves out of a job?
Not that these or any "skills" will do that, but just- in principle. This is like alienation from labor at scale.
turlockmike•about 3 hours ago
The best way to prompt an LLM is to describe the outcome you want, that's it. They are trained as task completers. A clear outcome is way better than a process.
If the LLM fails, either you didn't describe your outcome sufficiently or is misinterpreted what you said or it couldn't do it (rare).
Common errors should be encoded as context for future similar tasks, don't bloat skills with stuff that isn't shown to be necessary.
stingraycharles•about 2 hours ago
> The best way to prompt an LLM is to describe the outcome you want, that's it. They are trained as task completers. A clear outcome is way better than a process.
This is not true for anything complex. They’re instruction followers, of which task completion is just one facet.
They’re also extremely eager to complete tasks without enough information, and do it wrongly. In the case of just describing task completion, despite your best efforts, there are always some oversights or things you didn’t even realize were underspecified.
So it helps a lot to add some process around it, eg “look up relevant project conventions and information. think through how to complete the task. ask me clarifying questions to resolve ambiguities. blah blah”. This type of prompt will also help with the new Opus 4.7 adaptive thinking to ensure it thinks through the task properly.
stult•about 2 hours ago
Agreed, and further, I'd argue the OP's division of LLM instructions into either process or outcome specification is a false dichotomy. My agentic process specification is about automatically specifying the outcomes that I would otherwise repeatedly have to tell the LLM to consider, like making sure test coverage is maintained, or that decisions are documented on the original Github issue. Or it's about correcting common failure modes, like when the agent spends an enormous amount of time running repo-wide tests while debugging a focused change, because the agent doesn't consistently optimize around the time-to-implement as an outcome. Arguably part of addressing those failure modes boils down to pure process in the sense that I specify a logical order for achieving the outcomes, e.g. creating a plan before implementing. But that is mostly to organize approval gates for my convenience, rather than structuring the agent's work per se.
tecoholic•about 2 hours ago
If there is anything we have learned in decades of Software engineering, it's "A clear outcome" is not easy to describe. In many cases, it's impossible unless people from 4 different domains collaborate. That's why process matters. It allows for software to be built is a "semi-standardized" way that can allow iterations to get us closed towards the expected outcome, that might emerge over time.
Yes, not everything I use LLMs for going to have the same level of ambiguity or complex requirements. Optimizing by choosing to skip over parts of the process is exactly Addy is talking in this article.
alexjurkiewicz•about 2 hours ago
I agree that many skills are overblown and unnecessary. But there's a lot of value in giving AI the right process. See how much more effective Claude can be for moderate or large changes when using the superpowers skill.
tmaly•about 2 hours ago
Sometimes people don't know what they want.
I prefer the start small and iterate approach to arrive at a result.
Then I ask it to summarize. Sometimes after that I ask it to generalize.
peab•about 2 hours ago
a skill is just reusuable/shareable context. It's just text, really. It's useful for things like documentation on how to use an API (this works better than MCP in my opinion), or a non consensus way of doing something. For example, you can use remotion to generate video. There are useful remotion skills that allow you to reliably generate specific types of videos. Captions of a certain style, for example.
markbao•about 2 hours ago
That seems a bit reductive. Even with humans, there’s a range of interpretations and ways that something can be built or a task completed. Engineers remember stuff so you don’t have to keep repeating yourself. Skills are a way to describe your outcome without similar repetition.
CharlesW•about 3 hours ago
From an SEO/LLMO perspective, the discoverability of these skills will be difficult without a rename: https://agentskills.io/
This is like creating a React framework called ReactJS to compete with NextJS
consumer451•about 3 hours ago
I would love to know how many people are actually using superpowers.
I showed up on the agentic dev scene prior to superpowers, and I am getting concerned that >50% of my self-rolled processes are now covered by superpowers.
I no longer trust gh stars, can anyone chime in? Is superpowers now truly adopted?
If it is truly valuable, why hasn't Boris integrated the concepts yet?
marcus_holmes•about 2 hours ago
I adopted superpowers, but then adapted it. I've changed some things, added some things. I suspect that my set of agent skills is probably overlapping with OP's by quite a lot now.
I also found that I have different skills for different tasks; at work security is a huge concern and I over-emphasise security in the skills. At play I'm less bothered about security and so the skills I've written to help me build stupid one-shot exploratory websites are less about security and more about refactoring and exploring concepts.
RideOnTime22•about 1 hour ago
It's just the new thing.
People were hyping up Oh My Opencode. When they realized it didn't lead to any significant gains in performance they hopped on the next thing.
And when the same thing happens to Superpowers it'll be something else they cling on because "this time it's different"
nullstyle•about 3 hours ago
I just removed superpowers from my own setup. In my opinion, given the quality of the planning modes in both claude code and codex, superpowers was really just slowing things down and burning more tokens than vanilla.
consumer451•about 3 hours ago
Thank you for the data point.
To give back as much as I can, I use the two built-in CC review processes when appropriate. But, those only do "is this PR good code?"
Far too late did I finally roll my own custom review skill that tests: "does this PR accomplish what the specs required?"
If I could ask for one more vanilla CC skill, it might be that. However, maybe rolling your own repo-aware skill via prompt is better?
horsawlarway•about 1 hour ago
anecdata, but I ended up in the same spot.
I used superpowers - but it burns waay more tokens for basically the same outcome as a single line that states
"Please do planning and ask any required questions before implementing.
[my prompt]"
On the latest models and with a decent harness, the planning modes are quite good, and the single sentence telling it to ask you questions lets the model pick the right thing to ask about, instead of wasting a bunch of time/tokens on predefined skills that try to force basically the same result.
It does introduce a second set of required interactions, but you can have another agent be your "questions answerer" if you need it (result quality goes down a bit vs answering myself, but still quite good, especially if you spend a bit of time on the answerer prompt)
Basically - things are moving fast enough I'm not convinced buying into superpowers/agentskills/[daily prompt magic beans]/etc tooling really makes sense.
I'd stick to the defaults in the harness for most cases, and then work on being clear with the ask.
esafak•about 3 hours ago
Looks like a bunch of canned skills served through a plugin?
ricardobeat•about 3 hours ago
Does superpowers actually work? The main skill file doesn't inspire much confidence:
"If you think there is even a 1% chance a skill might apply to what you are doing, you ABSOLUTELY MUST invoke the skill."
CharlesW•about 2 hours ago
This kind of "overprompting" is one technique that even the best skills/agents use to compensate for under-invocation, which happens when more demure advisory language tends to be rationalized away by LLMs.
It shouldn't be your default, but should absolutely be tried when your skill/agent test suite displays evidence that it's not being reliably invoked without it.
zmmmmm•about 3 hours ago
I was surprised how long some of these skills are. They are pages and pages long with tables and checkbox lists and code examples, etc.
Curious how normal that is - it would only take a couple of these to really fill the context alot.
gwerbin•about 1 hour ago
I quickly skimmed and it looks like at least a few of them are intended to be more like system prompts for a tightly scoped sub agent than a skill as such. I agree, I wouldn't want to use a lot of of these in a longer-running work session.
I have been successful with short and focused skills so far. I treat them as a reusable snippet of context, but small ones. For example a couple of paragraphs at most about how to use Python in my project and how to run unit tests. I also have several short "info" skills that don't actually provide the agent instructions, they merely contain useful contextual information that the agent can choose to pull in if needed.
Even having too many skills can be an issue because the list of skill names and their descriptions all end up in the context at some point.
tecoholic•about 2 hours ago
I have written zero skills, so not sure how normal it is. I counted the words in couple of them and they seem to be around 2k range. So 5 skills would be around 10K. Even at a small LLM context of 128k, that's still around 10%. And for a 1M context window like the big ones, it barely registers.
sergiotapia•about 1 hour ago
I reviewed the line counts of my own project skill files, and the top 3 I have are:
805 lines
660 lines
511 lines
Maybe I am _too_ conservative here. Lots to explore.
mohamedkoubaa•about 1 hour ago
No, you aren't.
codemog•about 1 hour ago
Everyone who writes this kind of stuff skips the boring parts: science and engineering.
Yep, benchmarks, comparisons of with/without, samples of generated code with/without. This kind of stuff matters, and you may be making your agent stupider or getting worse results without real analysis.
Also this prose reads like the author has drunk the Google kool-aid and not much else.
ElijahLynn•about 4 hours ago
I've been using Agent Skills on a new side project and I'm really impressed so far! It really holds my hand a lot of the way and really lets me focus on developing a product instead of figuring out how to build it. I get to focus much more energy on high level architecture and product design.
Very grateful for this repository and everyone who contributed to it!
gavmor•about 3 hours ago
Naming things is such a hard problem that many devs don't even bother trying.
That being said, this post is full of reasonable assertions, so I'm looking forward to experimenting with this... whatever it is.
fragmede•about 2 hours ago
Wait, shit, are people using LLMs to name things now? I'm definitely out of a job then!
y-curious•about 4 hours ago
Thanks for this, going to steal a lot of this. I would install your plugin, but I worry about being able to delete it later. I also think that each one of these is better served customized to a developer. That said, I'm still going to grab some of these, thanks!
bvirkler•about 2 hours ago
A plugin is just a set of files, right? why wouldn't you be able to delete it later?
Advertisement
senko•about 3 hours ago
> This isn’t a coincidence. It’s the same SDLC every functioning engineering organisation runs, just in different vocabulary. [...] Amazon calls it the working-backwards memo and the bar raiser. Every healthy team has some version of this loop.
This (sdlc == working backwards & bar raiser) is so horribly wrong, that I hope this was an LLM hallucination.
In general, I'm starting to see these agent scaffolding systems as an anti-pattern: people obsess over systems for guiding agents and construct elaborate rube-goldberg machines and then others cargo-cult them wholesale, in an effort to optimize and control a random process and minimize human involvement.
yks•about 3 hours ago
The problem is it’s so rarely A/B tested, definitely not at scale. An engineer, who writes all these my-workflow-but-for-agents skills, proceeds to get the good outcome, while also seeing affirmations that the agent did follow the prescribed processes - that is considered a victory. In reality the outcome could’ve been just as good if they fed Claude a spec + acceptance criteria, or even a basic prompt for the simpler tasks.
AndyNemmity•about 1 hour ago
Yeah, I Blind A/B test everything, and a lot.
But I don't expect anyone to every use my stuff. It's complicated as hell. But it's for me, and it works without me having to remotely think about the complexity.
I love that.
BOOSTERHIDROGEN•about 2 hours ago
This is how similarly we collectively approach Taylorism, isn't it? However, the world favors capitalism, of which Taylorism becomes a handy scaffolding.
gosukiwi•about 3 hours ago
I wonder how does this compare to superpowers
AndyNemmity•about 2 hours ago
This is why I created the /do router, to route to all skills. I also have anti rationalization, progressive context discovery etc.
I only make it for me, so it's a bit complex and targeted towards me, and what I do, but it's pretty easy to adjust things.
Working on reading through Agent Skills, it seems we've converged on a lot of the same points, and I've never seen it, so trying to get an understanding of it.
Edit 1: I don't like all the commands. I just rely on a single router to automatically decide what I want, and that feels like the most reasonable way to me to communicate with it.
I don't want to remember things. And that's the way for me to scale the number of skills and activities. I don't have to think about them.
I personally wouldn't call theirs an intelligent router. They are dancing between a few different skills. We have extremely different setups there.
But of course, I'm using way more context to get it done. I'm even sending it out to Haiku to build the route choices.
I choose to use tokens to make things better for myself, not everyone would make the same choice, so I certainly see why they are using a few skills, and composing them.
Edit 3: This is much easier for a user to wrap their head around because there's much less.
I am only focused on the best improvements I can make that show value for my use cases. This is straight foward to reason about.
This seems like a nice way to get the best concepts for people trying to understand them. I commend them for a clean, simple approach.
Edit 4: Yeah, I think there are some things I can learn from them which is always good.
I especially like simple decisions like collapsing the install details for each harness in the readme.
I'm going to read over the entire thing and look for opportunities to improve my stuff.
We are all working together, learning, testing, building, trying to find the best way to implement things.
encoderer•about 4 hours ago
I adopted a couple of these, the api design and ui testing ones have been particularly helpful.
Discussion (45 Comments)Read Original on HackerNews
Not that these or any "skills" will do that, but just- in principle. This is like alienation from labor at scale.
If the LLM fails, either you didn't describe your outcome sufficiently or is misinterpreted what you said or it couldn't do it (rare).
Common errors should be encoded as context for future similar tasks, don't bloat skills with stuff that isn't shown to be necessary.
This is not true for anything complex. They’re instruction followers, of which task completion is just one facet.
They’re also extremely eager to complete tasks without enough information, and do it wrongly. In the case of just describing task completion, despite your best efforts, there are always some oversights or things you didn’t even realize were underspecified.
So it helps a lot to add some process around it, eg “look up relevant project conventions and information. think through how to complete the task. ask me clarifying questions to resolve ambiguities. blah blah”. This type of prompt will also help with the new Opus 4.7 adaptive thinking to ensure it thinks through the task properly.
Yes, not everything I use LLMs for going to have the same level of ambiguity or complex requirements. Optimizing by choosing to skip over parts of the process is exactly Addy is talking in this article.
I prefer the start small and iterate approach to arrive at a result.
Then I ask it to summarize. Sometimes after that I ask it to generalize.
If Addy reads this, how do you pitch this vs. Superpowers? https://github.com/obra/superpowers
I showed up on the agentic dev scene prior to superpowers, and I am getting concerned that >50% of my self-rolled processes are now covered by superpowers.
I no longer trust gh stars, can anyone chime in? Is superpowers now truly adopted?
If it is truly valuable, why hasn't Boris integrated the concepts yet?
I also found that I have different skills for different tasks; at work security is a huge concern and I over-emphasise security in the skills. At play I'm less bothered about security and so the skills I've written to help me build stupid one-shot exploratory websites are less about security and more about refactoring and exploring concepts.
People were hyping up Oh My Opencode. When they realized it didn't lead to any significant gains in performance they hopped on the next thing.
And when the same thing happens to Superpowers it'll be something else they cling on because "this time it's different"
To give back as much as I can, I use the two built-in CC review processes when appropriate. But, those only do "is this PR good code?"
Far too late did I finally roll my own custom review skill that tests: "does this PR accomplish what the specs required?"
If I could ask for one more vanilla CC skill, it might be that. However, maybe rolling your own repo-aware skill via prompt is better?
I used superpowers - but it burns waay more tokens for basically the same outcome as a single line that states
"Please do planning and ask any required questions before implementing.
[my prompt]"
On the latest models and with a decent harness, the planning modes are quite good, and the single sentence telling it to ask you questions lets the model pick the right thing to ask about, instead of wasting a bunch of time/tokens on predefined skills that try to force basically the same result.
It does introduce a second set of required interactions, but you can have another agent be your "questions answerer" if you need it (result quality goes down a bit vs answering myself, but still quite good, especially if you spend a bit of time on the answerer prompt)
Basically - things are moving fast enough I'm not convinced buying into superpowers/agentskills/[daily prompt magic beans]/etc tooling really makes sense.
I'd stick to the defaults in the harness for most cases, and then work on being clear with the ask.
It shouldn't be your default, but should absolutely be tried when your skill/agent test suite displays evidence that it's not being reliably invoked without it.
Curious how normal that is - it would only take a couple of these to really fill the context alot.
I have been successful with short and focused skills so far. I treat them as a reusable snippet of context, but small ones. For example a couple of paragraphs at most about how to use Python in my project and how to run unit tests. I also have several short "info" skills that don't actually provide the agent instructions, they merely contain useful contextual information that the agent can choose to pull in if needed.
Even having too many skills can be an issue because the list of skill names and their descriptions all end up in the context at some point.
Yep, benchmarks, comparisons of with/without, samples of generated code with/without. This kind of stuff matters, and you may be making your agent stupider or getting worse results without real analysis.
Also this prose reads like the author has drunk the Google kool-aid and not much else.
Very grateful for this repository and everyone who contributed to it!
That being said, this post is full of reasonable assertions, so I'm looking forward to experimenting with this... whatever it is.
This (sdlc == working backwards & bar raiser) is so horribly wrong, that I hope this was an LLM hallucination.
In general, I'm starting to see these agent scaffolding systems as an anti-pattern: people obsess over systems for guiding agents and construct elaborate rube-goldberg machines and then others cargo-cult them wholesale, in an effort to optimize and control a random process and minimize human involvement.
But I don't expect anyone to every use my stuff. It's complicated as hell. But it's for me, and it works without me having to remotely think about the complexity.
I love that.
I only make it for me, so it's a bit complex and targeted towards me, and what I do, but it's pretty easy to adjust things.
https://github.com/notque/vexjoy-agent
Working on reading through Agent Skills, it seems we've converged on a lot of the same points, and I've never seen it, so trying to get an understanding of it.
Edit 1: I don't like all the commands. I just rely on a single router to automatically decide what I want, and that feels like the most reasonable way to me to communicate with it.
I don't want to remember things. And that's the way for me to scale the number of skills and activities. I don't have to think about them.
Edit 2: We have very different routers.
https://github.com/addyosmani/agent-skills/blob/f504276d8e07...
vs
https://github.com/notque/vexjoy-agent/blob/main/skills/do/S...
I personally wouldn't call theirs an intelligent router. They are dancing between a few different skills. We have extremely different setups there.
But of course, I'm using way more context to get it done. I'm even sending it out to Haiku to build the route choices.
I choose to use tokens to make things better for myself, not everyone would make the same choice, so I certainly see why they are using a few skills, and composing them.
Edit 3: This is much easier for a user to wrap their head around because there's much less.
I am only focused on the best improvements I can make that show value for my use cases. This is straight foward to reason about.
This seems like a nice way to get the best concepts for people trying to understand them. I commend them for a clean, simple approach.
Edit 4: Yeah, I think there are some things I can learn from them which is always good.
I especially like simple decisions like collapsing the install details for each harness in the readme.
I'm going to read over the entire thing and look for opportunities to improve my stuff.
We are all working together, learning, testing, building, trying to find the best way to implement things.