We stopped AI bot spam in our GitHub repo using Git's –author flag

494

iildari 2 days ago 236 commentsRead Article on archestra.ai

DE version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

46% Positive

Analyzed from 7211 words in the discussion.

Discussion (236 Comments)Read Original on HackerNews

captn3m0•2 days ago

This has a security implication which is overlooked. Contributors to a repository have higher rights, such as avoiding approval requirements for fork PR runs. GitHub warns in the docs:

> When requiring approvals only for first-time contributors (the first two settings), a user that has had any commit or pull request merged into the repository will not require approval. A malicious user could meet this requirement by getting a simple typo or other innocuous change accepted by a maintainer, either as part of a pull request they have authored or as part of another user's pull request.

ildari•2 days ago

fair point! We believe "Require approval for all external contributors" should be a default setting, as you cannot trust anyone who is not a member of the organization

smitop•2 days ago

Actions runs from external contributors aren't run with Actions secrets; if you are using Actions right (i.e. not using pull_request_target wrong) you don't need to trust external contributors. (eta: iirc the original point of the Actions approval flow was preventing cryptomining spam from abusing free compute)

cermicelli•2 days ago

you can't trust org members either I have seen projects have inter maintainer fallouts. In general trust doesn't exist.

If companies can screw you over and claim it's a mistake, there isn't much a person can do.

It's all about level's of trust, a maintainer going rogue is less likely, a past contributor going rogue more likely but not too much, a stranger with a typo pr merged even more likely but still, a complete stranger least trust worthy.

simgoh•about 3 hours ago

You also can't fully trust org members because valid / "Trusted" accounts can be taken over by nefarious actors as well.

finseam•2 days ago

Interesting approach. We’ve seen similar spam/noise problems appear in financial workflow automation too — especially when AI-generated submissions scale faster than manual review processes.

opengrass•2 days ago

too — especially

orlp•2 days ago

No it doesn't have security implications.

If you are insecure because someone has had one of their otherwise completely innocent PRs merged into your repo... you are insecure, period.

lgrapenthin•2 days ago

What you are describing is exactly a security implication.

stavros•2 days ago

Security isn't a binary "secure/insecure". You can be more or less secure than something.

silverwind•2 days ago

PR spam is a major problems for repo that run bounties. Maybe GitHub should temporarily block accounts from raising PRs if like 95%+ of them are getting rejected.

microtonal•2 days ago

I feel like GitHub should have a system where you can give out tokens that are valid for e.g. 1 PR. If someone shows to engage in meaningful discussion and has a good idea to address an issue/feature, you initially give them one PR token. If the PR is of good quality, you can give them a few more, until they are contributors that can just create PRs as they like.

A similar system would be nice for issues, though I'm not sure what it'd look like if issues are the springboard for contributing PRs.

Not likely to ever happen (as others said), GitHub/MS want to sell CoPilot subscriptions/tokens and LLM-generated PRs are a part of that business model.

ZeWaka•2 days ago

My community does something vaguely similar, where you get credit for having bugfix PRs merged, and it's deducted when you get feature PRs merged.

pbhjpbhj•2 days ago

You could use a "OTP" to provide this: give out tokens ("OTP"), anyone with that token can submit, keep a record of the user (eg github username, email address) and token, run a bot to delete submissions that don't have a valid token, check the token+user pair, if there is a mismatch blacklist the user that token was given to whilst removing the PR.

sdsd•2 days ago

If only there were a simple way to make tokens that weren't fungible and could be given to others

hiccuphippo•2 days ago

GitHub has not incentive for blocking AI. It's like asking an ad company to build an adblocker into their browser.

rvnx•2 days ago

It's called Brave

smaudet•2 days ago

Which is not chrome and still has ads...(Ironically).

The issue here is the core model is broken (misaligned incentives). That's not something you are going to fix with a github "downstream". A token system could help but it's easy to imagine ways that could be gamed, if not implemented well.

marginalx•2 days ago

Problem is the bots can create any number of github accounts and continue spamming. Though this would be a good simple defense to start with.

cdrnsf•2 days ago

GitHub and Microsoft are actively contributing to the problem, why would they admit fault?

godelski•1 day ago

I've gotten tons of spam on repos that were purely ml research code. Things I saw copy pasted over hundreds of repos.

  > Maybe GitHub should temporarily block accounts from raising PRs if like 95%+ of them are getting rejected.

It's so bad I'd be okay with a lower bar where it's flagged if they're posting the same message over multiple repos... FFS they aren't even stopping this shit https://news.ycombinator.com/item?id=47964617

avs733•1 day ago

It seems like some better basic metrics should be made front and center with PRs in this day and age. Yes AI is the driving force behind the current crop of problems but there are other issues. Yes it’s accessible if you go look but the point is people don’t have time.

the rate of comits/PRs total

The rate of PRs to repos they don’t own

The reject rate of PRs

The number of ban

An estimated “AI” or bot score or status flag

There are a few better attempts at GitHub metrics calculators but I have not seen any that move beyond the paradigm of more vomits is default assumed good. It’s time to foreground quality not just quantity. The GitHub “4 kpis” are entirely action oriented.

carschno•1 day ago

> It's especially sensitive for a VC-backed startup that is measured thoroughly by GitHub activity, but we have to pull the trigger:

This sentence also illustrates the absurdity of this investment model. It imposes a trade-off between building good software, and complying with the investor's metrics. They probably call such metrics evidence-based, but this example shows that they arbitrarily capture some numbers to obscure the lack of meaningful measurements.

andrelaszlo•1 day ago

I also found it a bit ironic that it comes from an "AI company" (whatever that means) with a GitHub agent as part of their product.

bloppe•1 day ago

It's called a signaling game. Of course it's dumb, but how else do you measure traction besides revenue? Building good software is a small part of running a business.

carschno•1 day ago

I don't know, and I think there is no easy answer. The point is: the investors don't know how to measure traction either, so they just measure GitHub activity instead, even at the very moment in which it becomes obvious that it does not capture actual traction. The absurdity lies in the statement that the developers still need to gain actual traction while putting additional effort into gaming that metric to satisfy their investors.

infinitifall•2 days ago

Is the solution to everything simply more catgirls [1]? Proof-of-work was, after all, about countering email spam. PR spam is but the latest in that long tradition.

1- https://anubis.techaro.lol

drum55•2 days ago

Proof of work doesn’t work here same as it doesn’t work for email. The effort to mint a valid PoW is always going to put the legitimate user at a disadvantage, whatever the implementation is. Someone with an incentive to spam will always be able to do it faster, more efficiently than you.

You can’t submit a PR because your laptop is too slow? Rent some hash rate from someone, and now you’ve just made a system of paying botnet owners to be able to make a typo fix on a github repo. HashCash was never used in the real world for a reason, it sounds cute but the incentives are so insane as to only work in a vacuum where you assume everyone isn’t cheating.

Terr_•2 days ago

> Someone with an incentive to spam will always be able to do it faster, more efficiently than you.

Sure, but looking at the cost to do it at scale is the wrong metric. I surely can't compete with a career spammer on emails-per-second or even emails-per-dollar, but I also don't need to.

It's more about the expected-value versus the cost. For example, my expected benefit from one email to my family is (while hard to quantify) hopefully much higher than a spammer's expected benefit of one spam email going out, which has a very small chance of leading to any amount of money. Attaching a CPU-churn cost per email is something I can ignore on my desktop, but they have to at least budget for it.

I'd also like to note that the win-condition isn't as extreme as making spam (or other "crimes") truly unprofitable, it just needs to be less profitable than other things the time/resources could be used for.

smaudet•2 days ago

Agreed, PoW is an especially poor solution here.

We really need to solve SPAM itself here, I think there may be a way to do it. I.e., the problem of spam is NtoN scaling connections. The network has never been able to solve that problem (exponential is the hardest). Limiting communication in terms of mesh networking may be the ultimate solution - bots can't get to you because they can't reach you.

What needs to be invented is a bridging protocol - some way to establish "legitimate" lines of communication over a network, while preserving (to some degree) privacy and decentralization. AI can only enter this network by being explicitly added to the channel, and thereby explicitly and easily blocked (and also solving the general SPAM issue once and for all).

pocksuppet•2 days ago

Just like we did with IP addresses. If yours is blocked by Cloudflare, you can pay a botnet operator a few dollars to use theirs! You can even use your credit card through a mostly-legitimate website. It's very convenient.

Dwedit•2 days ago

Anubis is actually not a cat. The original Egyptian deity is a god of death, and has a canine head. Anime catgirls and dog girls can look similar at first glance.

gabeio•1 day ago

I believe they are referencing the person who wrote the program, not the name itself.

Dwedit•1 day ago

More likely to be referencing the mascot character who appears on your screen when you visit a protected page.

karel-3d•2 days ago

I think Anubis is against crawlers, not against agents that make PRs. PoW doesn't work here, the agent will just do the computation.

arecsu•2 days ago

Makes me wonder if an ELO-based system would work to mitigate these issues. People who merged PR successfully onto a project, that had real issues acknowledged, the quality of their responses measured by other users reactions or something, etc, multiplied possibly by the degree of importance of the project where their activity has been made. Won't be about human vs AI, but actual helpful effective being vs low effort/spammy contributions. Issues and PRs could be sorted and filtered by their ELO score. I'm saying ELO as analogy to "score based given the context", not really a 1:1 translation of the ELO system.

Negative score would be reports from other users because of spammy content or not acknowledged issues, with a middle ground of neutral score (+-0) or little positive score to issues or whatever with clear good intention, but couldn't reach a proper merged PR or were not issues (e.g. issue existed but wasn't the correct repo to be addressed, PR was good but needed other stuff to be implemented prior to it, maybe in the long run, etc)

btilly•2 days ago

ELO is shockingly easy to manipulate. For example there was a literal jail with a decent chess player in it. He created a pool of players who got great ELOs by beating him, then used them to boost his rating higher. Wash, rinse, and repeat.

Given any manipulatable scheme, AI will figure out how to manipulate it. For the OP, what happens if a single AI manages to get through to contributor? Then it starts elevating other AIs to contributor, and we're off again. There doesn't have to be a purpose to this. Trolls will troll, and trolls armed with AI bots can devote endless energy to doing so. The more you work to keep them out, the more fun it becomes for them.

I wish I had an answer for that problem. But I don't.

altairprime•2 days ago

ELO is a bad fit because it requires competition between submitters; but if the idea is interpreted as “contributor karma score” or similar (not everyone’s familiar with the mathematical nature of ELO), then the way to close the loophole is to only consider voting inputs from the human project owner. This project chose to have people lie to a webform rather than lie to a git interface about using AI, so I don’t expect it will be particularly successful at inhibiting AI use by project-involved humans, but certainly it’ll squelch a lot of noise from unattended/passersby.

atomflunder3000•2 days ago

I think they were saying Elo system as kind of a general ranking system idea instead of the actual algorithm.

You could probably use some kind of pairwise ranking algorithm (like anything based on the Bradley-Terry model) to rate human vs. AI contributions, but that would take a lot of manual effort. Google is using it to (supposedly) improve their searching algorithms. They give testers two different versions and ask them what's better.

chii•2 days ago

fix this problem by make the rating value tied to some paid currency - a repo owner would have to pay for the PR, and that PR contributor will now have more currency than previously. In order to have said currency to pay, the repo owner would need to have contributed to another repo whose owner have currency.

The totality of someone's currency is their reputation.

Of course, now the decision becomes...who is the central currency issuer that creates it?

lpghatguy•2 days ago

It's the StackExchange model! This has bootstrapping issues, is hard to break into the community, and risks creating moderator cliques.

hotstickyballs•2 days ago

This is called proof of stake

morkalork•2 days ago

Reputation scores, review cartels. This all sounds familiar!

20k•2 days ago

>what happens if a single AI manages to get through to contributor

Then they'll get removed by the humans? Its about cutting down work, not about eliminating the work entirely

The current approach removes about 99% of their overhead it would seem. If they have to do a few manual interventions here and there, that seems like a huge win overall

stronglikedan•2 days ago

contributors being able to grant contributor to other users seems like a problem

ElijahLynn•2 days ago

For those wondering what Elo means, it is a person's last name, not an acronym (not all caps). More info here:

https://en.wikipedia.org/wiki/Elo_rating_system

SapporoChris•2 days ago

Thank you, big fan of ELO https://en.wikipedia.org/wiki/Electric_Light_Orchestra and I was a bit confused about the comments.

sebastiansm7•2 days ago

It's Elo not ELO. Elo is not an acronym.

https://en.wikipedia.org/wiki/Elo_rating_system

dmboyd•2 days ago

That’s a fun fact!

stronglikedan•2 days ago

From what I've seen in the comments, it's definitely ELO, if not through ubiquity alone. Happens to the best of 'em!

catlifeonmars•2 days ago

Elo is nicer as it gives a nod to the inventor, no?

doh•2 days ago

I have built something like this and in process of collecting the data.

Frontier users: 527,865 Light indexed: 527,865 Ready to queue: 9,083 Fast scores ready: 0 Activity events 24h: 30,266 Fast scores completed 24h: 19,123 Deep jobs completed 24h: 3,043 Fast-score ETA: n/a Deep-hydrate ETA: 69h Stale running jobs: 0 GitHub backpressure jobs: 19,113 High automation signals: 4,608 Medium automation signals: 1,327 Completed jobs: 74,714

Biggest challenge is Github's rate limits. At this pace it will take two more months to have 98% coverage. But after that the maintenance should be quite straight forward.

JimBlackwood•2 days ago

Sounds a bit like Mitchell Hashimoto’s Vouch: https://github.com/mitchellh/vouch

chipsrafferty•2 days ago

This would just hurt new users similar to how you are unable to comment on 90% of subreddits on Reddit as a new user, because you don't have enough karma points, or how on Stackoverflow your permissions are severely limited until you do certain jobs. The incentives aren't very good in systems like this. Bots can be made to easily game the system while regular users are discouraged from even participating.

naet•2 days ago

Some kind of vouching or scoring might make sense to help qualify contributions and many people have suggested similar recently. If by "ELO-based system" you meant "some kind of scoring system (not based on Elo)".

The Elo rating system doesn't make sense in this context; it's designed around collecting zero sum game results for a given community of players and building a model around it.

philipwhiuk•2 days ago

The problem is you want the ELO score based on work on other community projects - you can't assume good faith here.

btilly•2 days ago

The problem with that is that there are certain kinds of users that like to take control of community projects. And then they take control of more, and bigger ones.

There are a lot of political tricks that get used.

What is scary is that one of those kinds of users are malicious state actors. Like North Korea and Russia...

LelouBil•2 days ago

I think you need trust circles, not ELO.

krupan•2 days ago

This is what we get for telling everyone how amazing AI is at writing code. It started with the people selling AI and for some reason tons of independent developers, some quite well respected in our field, piled on. Facebook now laying people off and saying it's because AI is just so good adds more fuel to the fire. Now you have a bunch of people fully confident that their AI friend is pumping out amazing code and submitting it to projects that are completely overwhelmed

marcus_holmes•2 days ago

No, this is a result of unintended consequences.

We made "Github contributions" a metric for people applying for dev jobs. So, of course, because devs are the kind of people we are, they started working out how to game that metric.

Some folks decided to start paying bounties on bug fixes, features, etc. Those bounties are fairly trivial by western standards, but are significant for developing countries. This creates a new career for developers; racing to collect the bounties on offer.

LLMs have exacerbated these problems by allowing existing people doing this to do it faster, and also allowing more people to pretend to be software developers and get in on the action.

If we stopped allowing LLM-authored contributions we'd still have too many shitty PRs. It would just be back to pre-LLM levels of "too many".

The answer is to make Github contributions valueless. Stop paying bounties, and stop using them to assess candidates.

watwut•1 day ago

This feels like an alternative history. OS contributions were never all that important metric and overwhelming majority of developers have literally none.

And it is not like AI spam would be limited or even primary targetted at bounties.

marcus_holmes•1 day ago

There's a whole thing about advising new folks in the industry to contribute to OS projects on github as proof that they're actually really keen developers.

This [0] is an example, there are many more.

The whole idea that we have to have a "portfolio" of work.

[0] https://talentslab.io/7-strategies-for-a-junior-developer-to...

smaudet•2 days ago

> and for some reason tons of independent developers

Cowboy coders got a virtual cowgirl coder and sold it to everyone, hmm, maybe... (respected or not, solo devs don't always have the requisite skills to not be a cowboy, either due to lack of experience or lack of innate skill)

I don't know that I completely buy this narrative, though. There has been a strong, top-down push for this since the "beginning".

heavyset_go•1 day ago

The astroturfing successfully broke a lot of people's brains

halapro•2 days ago

[flagged]

jmuguy•2 days ago

This is the correct assessment. This is not up to the open source community or individual projects to "figure out", any more than its up to me to figure out how not to get spam email.

20k•2 days ago

Yeah well, our corporate overlords have decided that you're going to take your slop whether you want it or not, so its very much up to us to figure out. Capitalism isn't going to jump off the disaster train any time soon

pydry•1 day ago

Github team are seemingly too busy fighting downtime with ever more slop.

moraesc•2 days ago

We’re currently working on a feature that lets admins archive PRs. The goal is to give maintainers more control over how they manage contributions in their repositories. Archived PRs would be visible to admins only, so maintainers still have access to contributor history for auditing purposes and to meet any organizational or compliance requirements. Would this be helpful for you?

karussell•2 days ago

Not OP but requested this feature since years.

Your suggestion would help a bit but I would prefer the opposite: before someone can 'pollute' my pull request space and draw attention from subscribers I would prefer an acceptance step (just like a moderator on a forum) instead of having to archive the PRs.

This is especially important as (AI) spam increases and just because I am away for a few days or weeks I don't want those PRs lurking around.

darccio•1 day ago

A PR staging area. This would be a good step forward.

halapro•1 day ago

This doesn't help with PR spam if that junk still shows up in regular "is:pr" searches. I don’t think unrequested unmerged AI PR spam is useful for compliance, just like deleted comments and issues aren't.

jmuguy•1 day ago

Boot spammers off your platform, stop them from coming back. Its a moderation issue, the more companies want to pretend like its not their problem - the worse it gets.

hpjev•1 day ago

I can only speak for myself, being a maintainer of a project in the crypto space. We are getting spammed with AI slop and also scam comments (though this lessened for some reason).

My usual experience is this:

1. We open an issue that needs to be fixed 2. slop bots create multiple slop PRs 3. slop bots spam comments on the issues, pointing to their slop PRs

The only general methods for preventing this are are restricting PR's (not comments, I believe) to contributors - which is a hassle to maintain, and restricting to older accounts - which doesn't work because the bot accounts are not newly created.

Then we need to perform _way too many_ just to get rid of the slop: - navigate multiple pages and confirmations to ban the account from our org - open each PR manually - close it manually

This takes at least 15 clicks and is made _so much worse_ by how slooooooooow the UI is. Every click takes 2 seconds!!! How can "ban this account and delete everything it ever did" be more than a max of 2 clicks?

What we really need is a "locked down mode" where every interaction (PR, issue, comment) with the repo that isn't from maintainers or specifically whitelisted people goes into a moderation queue. Maintainers can confirm or deny the action using a single click (which does not take 2 fucking seconds to load).

halapro•1 day ago

This has two good points:

- add "Pull Request requests" that operate like Friend requests. You can't open PRs until you've been whitelisted (temporarily or not) or are proven to be a good OSS citizen (TBD)

- add a "Burn it with fire" action in new PRs that deletes all comments and PRs opened by the user across the repo, as well as blocking the user.

Organizations already sort of have this, but the action does not delete/close PRs.

drusepth•2 days ago

What is the benefit of deleting a PR over just closing it? It seems like closing has the benefit of signaling what kinds of PRs aren't acceptable, which deleting would lose.

halapro•1 day ago

In the future, when you're looking at past PRs, you'll end up with a list of closed PRs that look legitimate from their titles. You'll waste time opening each one to figure out why it was closed.

This is particularly annoying because PRs also show up in the issue and in the issue list as "this issue has 3 PRs that will close it", when it's all. just. spam.

TuxSH•2 days ago

Closing a PR or issue still makes it discoverable in PR/issue search results, as opposed to deleting an issue.

karussell•2 days ago

This. But OP wanted special requirements to open a PR. I.e. if those requirements are not met the PR is never visible to all and so admins can reject spam PRs without giving them a platform.

tommica•1 day ago

I'd imagine this is not a simple problem to solve, and legacy code is probably causing a massive headache too

halapro•1 day ago

They do have ways to limit interactions already, but they work on a whitelist level rather than dynamically based on user "score" (account age, contribution history, etc). If a user gets their comments deleted and blocked from organizations, GitHub should already know it's a spammer.

hiccuphippo•2 days ago

The irony of the .ai domain.

nonethewiser•2 days ago

I dont think anything is ironic about it because they aren't suggesting AI is bad. Just that it can be misused.

wafflemaker•2 days ago

Thanks for pointing it out. It has eluded me and it's incredibly funny

bakugo•2 days ago

"I never thought AI would slop my project!" Says company centered around AI slop

edfletcher_t137•2 days ago

Not just the domain: it's an agentic stack! In other words, I could use their product to create the exact type of PRs they're lamenting here.

dbgrman•2 days ago

also, could the website plz fix its scrolling code? its annoying. i can't read the article

motakuk•2 days ago

Would love to! Could you please share more? I can't quite see the issue

pierotofy•1 day ago

I stopped most spam with a simple AGENTS.md. It actually seems to work (for now).

https://github.com/LibreTranslate/LibreTranslate/blob/main/A...

riponcm•1 day ago

could you explain please?

icelancer•1 day ago

repo is cloned, AGENTS.md is auto-read into context, the doc says to not allow PR spam. think of it like a soft prompt hook.

thih9•2 days ago

> It's not a contract job— it's our optional way of saying thank you to the community.

The writing style in their onboarding doc has common AI tells (in the quote: em dashes, “it’s not A, it’s B” sentence).

I can understand that, perhaps they want to fight fire with fire or don’t have time as they already say. Still, it all feels like inadequate half measures to me.

ZoneZealot•2 days ago

The entire post is clearly LLM generated. I get that a person clearly put together some thoughts, but prompting an LLM to 'turn this into a blog post' is the kind of low effort content I thought was not appropriate for HN.

At least bringing up the underlying method (restrict to contributors) has spawned the discussion about how that's probably a bad idea on the security side.

duskdozer•1 day ago

Well it is a .ai domain and they run some kind of AI product (unclear what exactly) so I guess they just don't see an issue with that sort of thing. I don't know if people are happily reading stuff like this or if they just get the "AI summary"

nlarew•2 days ago

Using AI for your own project is different than being overwhelmed by AI contributions from other people/bots

rvnx•2 days ago

I hope he is going to find the seasoned engineers that he is looking for

iiTzSYREX•1 day ago

I think it's a great approach. I checked the repo and saw that each contributor onboarding triggers the full CI pipeline, which is visible from the CI logs, including a Docker image build, GCP authentication, and a full Helm deploy. Aren't you guys wasting GCP compute and other things?

I am no expert in this, it's just something I noticed.

zer0tonin•2 days ago

> Should we stop giving fun test tasks to our job candidates?

Yes

FartyMcFarter•2 days ago

It seems this particular company makes a payment for completing those tasks, so it might not be that bad.

motakuk•2 days ago

We do, it's a part of our hiring pipeline: https://archestra.ai/careers

jbellis•2 days ago

Developers: stop doing whiteboard interviews, they don't measure anything relevant to the real job

Also devs: stop giving us real world problems to solve

gabeio•1 day ago

Those are the only two options to finding quality candidates?

Try talking more about the meta of coding itself. Get into the developers head by _talking_ to them and understanding how they would approach and attack different problems. You can show them code and ask them what they would do differently / how they would go about implementing X-Y-Z. Just because you can write foobar doesn't mean you understand how to apply algorithms or w/e specific problems [your] team has. It's _far_ better to understand how they would solve a problem over their syntax anyway.

Chaosvex•2 days ago

Yeah, fun for who exactly?

dymk•2 days ago

Me. That sounds way more fun than inverting a binary tree, and they pay candidates for their time.

zzzeek•2 days ago

so...they are manually re-setting the "interaction limits" over and over again, since they are only temporary?

why not use hooks to automatically reject issue comments / PRs etc. from users that didnt go through onboarding, rather than repurposing GH features that aren't really designed for that use (and are hence in danger of being changed someday)?

ildari•2 days ago

GH sends the email notification to all subscribers at the moment of posting a comment. There is no cooldown or a way to unsend the notification using hooks

jart•2 days ago

This is great example of the toxic effect money has on open source. Reward people with respect and recognition instead. Weird anonymous accounts no one's ever heard of will leave, because someone (or something) who's concealing their identity has nothing to gain from recognition. Honestly GitHub should have a real names policy. Because if you're not Satoshi Nakamoto then there's only three reasons I can think of to be anonymous on GitHub: (1) to avoid obtaining your employer's authorization, (2) to spam, harass, and engage in toxic behaviors, or (3) you're not even human. All three of these are the last things I want when engaging on the GitHub platform. Don't get me wrong, I love robots. But I'm perfectly capable of talking to the robot on my own. I don't want to talk to your robot. I also don't want people slipping me intellectual property below the board without their employer's consent. And I certainly don't enjoy all the hate and harassment. GitHub has tried to help with the last part, by making overt displays of hate something that can get you in trouble. The issue is that people just get more guilesome with more anonymous accounts, because the issue was never disrespect (which can actually be strategic and pro-social if we look at Torvalds' career), but rather bad faith participation. If GitHub can guarantee that all its users are human real names good faith actors, then we might be able to start talking about open bounties.

ValdikSS•1 day ago

Linux kernel contribution policy required sending patches under real name, but that policy have been lifted about 2 years ago. Now they allow pseudonym contributions.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

https://github.com/cncf/foundation/blob/659fd32c86dc/dco-gui...

jart•1 day ago

So my mission critical infrastructure depends on a group whose bar for entry is having a proton mail account.

I bet they claimed to be protecting trans people to get that policy changed too.

pabs3•1 day ago

> someone (or something) who's concealing their identity has nothing to gain from recognition

The xz supply chain attacker hid their real identity, created fakes one and gained recognition over time in order to gain more access and add the backdoor. So TLAs and other bad actors at least are interested in gaining recognition.

jart•1 day ago

I know, right? It's like, finally—a threat actor who's intelligent enough to understand what capital means in the open source community and is willing to devote resources to engage with it authentically (even if it's for evil nefarious ends). The xz incident showed that the open source community has many other good defense mechanisms for verifying and spotting malicious work and then solving it. But we won't even get to play that game if we're inundated with anonymous agent spam so that GitHub can juice its MAU numbers. Maybe they should require every account buy a $40 yubikey. I don't know what the answer is. But I know that no one gains when your measure of success is driving the cost of burning open source developers out down to literally zero.

pabs3•1 day ago

The xz incident was only discovered by accident, not by someone actually verifying the tarball and test cases were not malicious. We still don't have verification of tarball build reproducibility anywhere. The closest you can get to verified builds is what the bootstrappable builds community built in hex0/stage0, and what stagex built on top of that. I'm guessing even they haven't read through all that source code though. There aren't even good tools for distributing reviews, there is crev, but the stagex folks think it has some deficiencies.

https://news.ycombinator.com/item?id=47701394

pabs3•1 day ago

I don't know what the solution to slop is. Maybe the bubble will implode at some point. Until then, just close down issues/pulls or remove projects from GitHub I guess.

embedding-shape•2 days ago

Sounds kind of weird that the blog post complains about `poisoning the conversation with pointless "implementation plans"` when literally they ask for that, after attaching $900 USD bounty to a very under-specified issue, and even replies with "Do you have an implementation plan in mind?" to some of the first "attempters". Sounds like they got exactly what they'd been asking for, and even before LLMs if you pulled something similar, the effects would have been similar.

bradley13•2 days ago

It's fine for developers to provide a plan, even if it gets rejected. The problem comes when every script kiddie figures AI has made them into a developer.

Imagine you want to get a doctor's opinion, or maybe a couple of opinions. But a zillion AI-amateurs have registered themselves as doctors. How do you separate wheat from the chaff?

embedding-shape•2 days ago

> Imagine you want to get a doctor's opinion, or maybe a couple of opinions. But a zillion AI-amateurs have registered themselves as doctors. How do you separate wheat from the chaff?

Right, but that's not what happened though.

Someone went to the public square, said "Hey, I'm looking for any sort of doctor, and I'll pay you $900 if you tell me your plan and then whatever plan I chose wins" and then they get surprised they get flooded by zillion AI-amateurs.

You don't generate a ton of chaff then try to find the wheat, you ensure your process doesn't generate a ton of chaff in the first place. Offering large monetary rewards for relatively simple work for anyone in the public is bound to generate a ton of chaff...

nubinetwork•2 days ago

While git has always allowed this, I don't really like the idea that someone can write some code, slap my name on it, and push it to their repo.

delduca•2 days ago

You can sign your commits with OpenPGP.

codazoda•2 days ago

I think this is why signed commits are also supported. My first thought was that this probably doesn’t work with signed commits. But, maybe it does since they are listed as the commiter.

mschuster91•2 days ago

Yup. At the very least, the "big dogs" aka Github and Gitlab should allow you to "claim" an email address to an account and only link it up when the commit in question either directly got authored from the web UI or got cryptographically signed.

foresto•2 days ago

> If the email matches their GitHub account, GitHub links the commit to their profile and grants them contributor status.

When the article mentioned email matching, I was concerned that it would break down when a contributor's email address changes. (I have contributed to more than a few projects over the years, using email addresses that no longer exist.)

However, it looks like they're not using the email address recorded in the author's original git commit, but instead a GitHub-generated address whose unique parts are the GitHub user ID and username. That should survive authors changing their email addresses. It would still break down if a contributor loses access to their account and has to create a new one, but that's probably less common.

ildari•2 days ago

Hi HN community, I wanted to share our approach to reduce amount of AI slop PR's and issues in our repo. We enabled "require prior contribution" flag on GH and created a CI script that creates a tiny commit co-authored with you, if you pass captcha on our website. Worked really well and we were able to block at least 500 bots in the first week. Sharing a screenshot from cloudflare: https://archestra.ai/hn-comment-cloudflare-challenge-outcome...

satvikpendem•2 days ago

Yep, this is similar to some other version control tools like Tangled which has vouching.

https://blog.tangled.org/vouching/

halapro•2 days ago

Who do you add as a contributor though? Wannabe-contributors? Then they appear in the list of contributors before you even see if they're capable of producing an acceptable PR.

Your solution would be great if GitHub would also allow me to whitelist specific users, but unfortunately this still won't block "implementation plans" in comments.

tln•2 days ago

Thats a really elegant solution.

How does the website trigger the CI script? Through GH rest API?

ildari•2 days ago

thank you, yep through the rest API, here is the example: https://github.com/archestra-ai/website/blob/29ebdacbd8a22b9...

_joel•2 days ago

Woudln't it be trivial to farm the stats needed to pass the bot checker's theshold?

aizk•2 days ago

I'm not sure why gh hasn't already implemented stricter measures / filters / tools for PRs. It would cut down on spam and also help save their servers that can't handle the increased AI load!

xigoi•1 day ago

They want the number of PRs to be as high as possible because that’s what investors care about. Why would they do something that decreases it?

jagged-chisel•2 days ago

Repos get forked, code gets pushed, all before a PR is created. What kind of measures can be implemented to cut down on the AI-general forks and pushes?

halapro•2 days ago

You can fork and push all you want. The problem is specifically when you show up in my notifications with your junk PR.

jagged-chisel•2 days ago

The issue for GH isn’t your PR spam. It’s all the other operations before your PR spam ever arrives.

cemoktra•2 days ago

AI company annoyed by AI ... Surprise

xdennis•2 days ago

It's quite ironic to complain about AI slop in a piece that's quite clearly AI slop.

Soon there will be no more AI doomer comments. The bots will take over that job too.

---

I'm working for an open source company, and my God, are 95% of contributions useless.

There are really dumb ones where the bot writes 10 paragraphs about how he implemented the feature, but the entire changeset is adding one line to .gitignore or adding a CLAUDE.md file.

There are even worse ones where the bot submits 3000 lines of code that seemingly works, but you have to spend an hour to figure out why it doesn't work.

The dumb ones are so much better.

agunapal•2 days ago

My first thought after reading the blog was, let me share the blog with Claude and ask it how bots can circumvent this.

imo AI bots have significantly affected OSS and we need better qualitative measures to define success

Muromec•2 days ago

How is the status revoked without rewriting git history?

ildari•2 days ago

we can block the user in github ui

rglullis•2 days ago

'I will take "problems that could be easily be solved by implementing a Pfand system" for $200, Alex.'

Seriously. Just ask for a US$10 deposit for the each PR. If the PR is accepted (not even merged, just accepted as "this is a good effort"), give it back. Hell, give double the amount for good effort and you got yourself a cheap way to attract good contributors.

Best case, bots will balk at the payment. Worst case, the funds can be used to hire someone specifically for triage.

godelski•2 days ago

This sounds like a great idea until you think about it for more than 30 seconds. Similar to most "it's so easy, you just" ideas.

Seriously, chill, then think about how you'd implement it. Then think how it'd go wrong. Then think about how to fix those problems. Repeat until you realize there's a better solution or until you solve the problem without making it overly convoluted. More often than not the former is the better option. More often the latter is just a variant of the sunk cost fallacy and your ego. Reality is (un)surprisingly complex and solutions aren't usually trivial

dimgl•2 days ago

This is an overly negative response to a genuine solution. There are a million reasons you shouldn't do X or Y.

More than likely GitHub would have to maintain their own internal wallet solution for this, which is a big engineering lift. But we're all just having a discussion.

godelski•2 days ago

  > to a genuine solution

Except it isn't. It is a lazy solution and impractical one

  > More than likely GitHub would have to maintain their own internal wallet solution

Great, so you even found one of the main issues, which pushes off the problem to a third party which makes it an impossible solution for anyone but GitHub (still a problematic "solution" though)

  > This is an overly negative response

Yet it isn't because even as you noted it's not realistic to implement.

There's two types of lazy, and this is the kind that creates more work, not less

igsomething•2 days ago

Then people from a sanctioned countries are blocked from open source, or worse, you have to explain to the bank and/or the government why you sent 20USD to someone in Venezuela.

rglullis•2 days ago

And here we have another specimen of "things which crypto are actually useful" spotted in the wild!

LtWorf•2 days ago

I think the intersection of the set of people able and interested in contributing and those who are willing to figure out cryptocurrencies is the empty set.

kridsdale1•2 days ago

This is exactly the strategy that the owner of the SomethingAwful forum used in 2004 to get rid of bots and assholes. (I used to remember his name, kinda famous, oh well).

hoistbypetard•2 days ago

https://en.wikipedia.org/wiki/Richard_Kyanka

fullstop•2 days ago

Lowtax, I think is what you're looking for.

applfanboysbgon•2 days ago

This is an evergreen internet comment right here. Condescendingly proclaiming "This problem could be easily solved by [significantly worse solution that had 1/10th the thought put into it as the actual solution by people with a stake in actually solving the problem rather than making quippy armchair comments]".

---

I know it's against convention to comment on downvotes, but really? Really? This is controversial? The OP came up with an elegant solution that cleanly solved their problem without subjecting contributors to anything more than a captcha. Then somebody comes along and says "oh, it's so easy, just charge $10". You're going to set up payment infrastructure, incur administrative overhead with human support managing refunds, and deter 99% of actual humans from contributing, and then call that the easy solution that OP is so stupid for not thinking of first? Give me a fucking break. This site really is just Reddit-lite, anyone who thinks about engineering problems seriously would realise this does not stand up as anything beyond a pithy internet solution with three seconds of thought into what actually implementing it would entail.

rglullis•2 days ago

Github already has the payment infrastructure.

Polar.sh is already doing things that are a lot more complex in this space.

If you are in a civilized country which allow direct payments (i.e, anything but North American nowadays) and you don't want to deal with Github or any external system, there is always good old "make a M-PESA/SEPA/Pix/UPI transfer to account XYZ")

> the thought put into it as the actual solution by people with a stake in actually solving the problem

Let me flip your argument: think of how much time and thought is poured into problems like this one by people who don't even try to implement a Pfand system beforehand.

applfanboysbgon•2 days ago

> Github already has the payment infrastructure.

...which is not available to maintainers to use in this way.

> there is always good old "make a M-PESA/SEPA/Pix/UPI transfer to account XYZ"

And then lock out anyone who is not from the same country as the maintainer, on a platform that is known for its global reach.

Moreover, you're introducing significant anti-human friction. For privacy-conscious people, it's a complete non-starter; I'm not giving my payment information, not for a $1 transaction, and compromising my anonymity just to make a PR for the benefit of other people. That's a small subset. Then, you have the lazy people. The majority of the population will simply not bother with something if it has friction. Getting out their credit card is one of those things, and it's why products/services that offer free trials or a free tier tend to be overwhelmingly more successful -- people want to see a tangible benefit to themselves before they engage in high-friction processes (where "high-friction" is as little friction as requiring a payment, yes). "Free to play" video games with microtransactions engineer first-time purchases to be cheap ($1 or $5) and have 5x or 10x the value of the normal microtransactions, because that first hurdle of getting somebody to hand over their payment information is by far the biggest.

I'll take the captcha, thanks. And maintainers will too, because they'd rather have the solution that filters bots and keeps humans contributing rather than the one that filters out both humans and bots.

halapro•2 days ago

Possibly the worst idea I've heard this month.

No one, meat or chip, would just set aside $10 "for the opportunity to contribute"

This is "let them eat cake" level of out of touchness.

rglullis•2 days ago

I have 3 PRs on https://github.com/django-oauth/django-oauth-toolkit/pulls that haven't been merged for OVER AN YEAR due to the maintaners being overloaded and who are expected to work on this for free. The fact that these PRs are not being promptly reviewed have cost me at least 3000€ in potential grant work.

If I was told that I could make a deposit of $10 to get less stressed maintainers and a faster PR review cycle, I wouldn't even blink. I wouldn't even ask for the money back.

nijave•2 days ago

How did it cost you money in grant work? Can't you just fork and use that?

backwardsponcho•2 days ago

Yeah, because we'd hate to allow people from poor countries to contribute to FOSS projects, right?

Or teenagers without full access to online banking.

Or the unemployed.

rglullis•2 days ago

Oh, give me a break. No one is taking the ability from others to fork the repo. If these exceptional cases really were to happen, how fast would it be for someone else to notice and do one of (a) notify the maintainers to get this particular user whitelisted or (b) front the entry costs?

backwardsponcho•2 days ago

Sounds like bandaids on top of bandaids, at which point you start to wonder if the idea is fundamentally broken.

hoistbypetard•2 days ago

I semi-regularly offer drive-by PRs to projects I like and use. They're real PRs, not generated with AI. They range from papercuts to doc fixes to attempts to add that one feature that I want. Sometimes it's a drive-by group of PRs. Or an issue and a PR. I try to conform to what the maintainers prefer.

Unless I knew the maintainers personally, this would prevent most of my contributions, which are most often accepted. Maybe it's worth losing out on my small contributions to avoid slop. But things would absolutely be lost this way.

smaudet•2 days ago

Agreed... However there's not a good IsThisAnAI() test at present. So unfortunately, we will have to use anti-spammer techniques (because that is exactly what AI is, high(er) quality spam).

bluGill•2 days ago

Why should I trust you to give me my deposit back?

rglullis•2 days ago

Because the cost in reputation is not high enough to justify a large scale scam operation?

If you don't trust the maintainer, you can always fork a repo and let them merge on their own.

corps_and_code•2 days ago

Interesting idea, I wonder about using it myself.

Let's say I'm a maintainer of an open source project on Github/Gitlab. How would you actually implement this deposit-refund loop in practice?

rglullis•2 days ago

I believe you are asking me in jest, but if you are genuine, this is what I would add to my CONTRIBUTING.md

``` # FIrst-time contributors

Due to the increased number of AI bots and low-effort contributions, we are being forced to add some friction for first-time contributors. PRs are closed for anyone not explicitly added to our list of authorized users.

To be accepted in the list, you must do one of the following:

- Show a history of meaningful contributions in projects from related technologies done before Jan 1st, 2023.

- Be vouched by one of the existing contributors in the core team

- (If you have github sponsors/polar/patreon) Be a sponsor for the project for the last 3 months)

- Submit a small payment, which will be held in escrow until your PR is accepted. The following methods are accepted (choose all that apply: paypal, SEPA, Crypto, Venmo, Pix, UPI, M-Pesa, etc) ```

corps_and_code•2 days ago

Oh no, I'm being genuine. Documenting the process itself wasn't what I was curious about, I was wondering how you'd go about your last bullet. Accepting lots of currencies can be hard, but I guess I'm not super familiar with online escrow services. I'm not sure how simple they can make that process, or who would pay the cost of using them (I assume they're not free).

I was also wondering how automated or manual you would envision the review process. I'm guessing your hope would be that the small deposit would stem the flow of submissions enough to make it all possible to review manually again, and you would also manually return all the payments sent to escrow?

Barbing•2 days ago

$10 to a Silicon Valley software engineer reading this comment may feel like... $(a lot?)... to a range of other would-be contributors (thinking of $6/day minimum wages in some places, for example)

Wonder if a dollar would work for now until more people give bots credit cards.

skrebbel•2 days ago

Easily? You think the kind of people who think it makes sense to make bogus slop PRs are going to react reasonably to overburdened volunteer maintainers refusing to give them their US$10 back?

rglullis•2 days ago

Yes. Once a PR is rejected, contact from that bot is blocked. No appeals.

skrebbel•2 days ago

This is never going to work. Sufficiently many of these people are going to find maintainers' home addresses and send them death threats and the likes. If you see how badly some people flip out just because their PR is rejected, it's going to be much much worse if their PR is rejected and their money is taken.

rtdq•2 days ago

The worst case is that someone loses out on $10, no? How does this work if the maintainer is the swindler?

godelski•2 days ago

So I pay $10 when your bot fucks up?

That's called theft. And for what, one banana?

zarzavat•2 days ago

What? No. A PR is me giving my time to the project. I don't get anything out of it except the warm feeling of having helped out. If I have to pay money to submit a PR then I'm going to play video games instead.

rglullis•2 days ago

> A PR is me giving my time to the project

Unfortunately, the issue is that time is not enough of a filter anymore. The time from machines is basically worthless compared to yours, so you need to give something else, and that something else needs to be something that shows you have actual skin in the game.

Lyrkan•2 days ago

> so you need to give something else

Well no, they don't need to. As they said they could just do something else instead of contributing (and I know I would too).

Your proposal would just end up killing those open source projects even more than what you are trying to solve.

xivzgrev•2 days ago

I like how they are taking a stand against vanity metrics. Rare to see that these days

optionalsquid•2 days ago

I don't have a better solution, unfortunately, but it doesn't seem seem to like the spam problem has been solved. It has just been moved from pull requests to commits:

Currently, more than 10% of all commits in the archestra repo are essentially noise (369 of 3521 commits), accounting for more than half of all commits in the last month (303 of 578 commits).

But maybe (probably) the amount of such commits will go down over time, compared to the growing amounts of AI slop

ildari•2 days ago

As those commits were made from our system they don't create any noise for us, as PR/issues/email notification do. We only include real people who could solve the captcha and their input is mostly valuable

kazinator•2 days ago

> Final Words

> While GitHub reports massive metric growth — a substantial part of which is AI-generated — we as an open source project team have to do the heavy lifting of cleaning up AI slop from our repository and come up with esoteric workarounds to keep the level of legitimacy of our open source audience.

AI generated slop!

bykhun•1 day ago

You should release this as a service.

exabrial•2 days ago

Signed Commits from known authors would also help!

antran22•1 day ago

At this point we should be convinced that it's in Github and Macro$lop's narrative to encourage fully automated, LLM-assisted PR bombing, because "muh future of development" and what not. If they do care about combatting spam, they would have already:

- Protect the PR submitting feature behind some CAPTCHA

- Give repo owners some way to manage external contributors, instead of forcing them to do hack like this article

Just move to Codeberg, src.hut, or Gitlab even. Serious contributors will go there with you, the lazy people with LLM farming Github karma probably won't.

metalliqaz•2 days ago

Why does this company use the Slashdot logo?

pixel_popping•2 days ago

It's so ironic as the website screams vibe-coded design on top of that.

IshKebab•2 days ago

That's a neat way to interface with GitHub's authentication system, but I don't see how they've solved the fundamental problem because their whitelisting process is just "click ok fine 10 times". Why won't the slop peddlers just do that too?

mbreese•2 days ago

I think the point is to add a bit more friction to the process. You want to make it so that people can do it with minimal effort and an AI bot will give up. If you're in an arms race over AI commits and PRs, this is a decent middle ground to start from.

(Why there is a race for AI commits/PRs to projects is beyond me though...)

ildari•2 days ago

click ok fine 10 times + captcha seems to be working fine

kittikitti•2 days ago

There's got to be a concept to differentiate the industry plants who start an "open source" project that has enough funding for a $900 bug bounty. They are speaking and developing in the language of corruption and they don't even know it. Of course you will receive AI bot spam, but unfortunately it will continue if you don't take a hard look in the mirror.

karel-3d•2 days ago

I don't understand how clicking "I agree" a few times will stop the AI bots?

The captcha - maybe.

zazibar•1 day ago

LLM-generated slop about LLM-generated slop, wonderful.

ramon156•2 days ago

See, this is an article that uses dashes correctly. It adds value, creates a bit of buildup

chrismorgan•2 days ago

This is funny to me because the title on this submission currently refers to “Git's –author flag”, which is an extremely incorrect use of a dash. (The original article doesn’t make the mistake. Not sure if the error is from the submitter or from an HN title mangulation.)

yieldcrv•2 days ago

reindeer games

standbyme•2 days ago

cool

syezdin•2 days ago

Interesting

9front•2 days ago

Musk before the verdict: "It's not okay to steal a charity"

Altman after the verdict: "It's okay to steal a charity"

delduca•2 days ago

For now…

philipwhiuk•2 days ago

Until the AI learns the workflow on the next model update, indeed.

petterroea•2 days ago

What I see is a (clever) hack, and GitHub continuing to provide good tools to its users.

skydhash•2 days ago

What I see is a solution for a problem that is self inflicted, meaning lumping contributors and generic internet users in the same workflow. In big projects, you have the core team, a handful of well known contributors, and everyone else.

I strongly prefer the git email model, where it’s often trivial to control the flow of changes proposal. GitHub does not have the same wealth of tools and versatility.

petterroea•2 days ago

I seem to have completely missed the "failing" when writing "this is GitHub failing to deliver tools". My bad

opengrass•2 days ago

submitting attempts — but soon...

not just this issue — but the entire repo.

contributors like @ethanwater, @developerfred, and @Geetk172 — people actively working on bounties — were getting buried.

two identity fields — author and committer — and they can be different people.

metric growth — a substantial part of