The problem is really behavioural, not the tooling. People that do not understand, test and document their decision making in their PRs should not be submitting them, regardless of what tooling (AI or otherwise) they used to create them.
This problem existed before AI, but it is now just worse due to the spamming nature of these "contributors". It's another form of endless September where people unfamiliar with the norms of team software development are overwhelming existing project maintainers faster than maintainers can teach them the norms of behaviour.
In the end, some sort of gatekeeping mechanism is needed to avoid overwhelming maintainers, whether it's a reputation system, membership in an in-group, or something else.
You wouldn't hold that opinion it you did maintain a popular open-source repo or interact with AI "PR review" tools at a serious level. Even the most SOTA models are willing to accept/merge absolutely trash PRs so long at the summitter can convince it that is addressed it's review comments.
It’s starting to feel like we may need to go back to the model where you need to be invited to be able to submit code or PRs. The barrier is just too low now for popular projects.
It’s not just popular projects. On a small utility I have I received a PR that was more lines than the project had. I’m happy to be a good maintainer, but reviewing something that’s effectively an AI rewrite isn’t something I care to review and since I can’t vet it, can’t blindly accept it.
Logistically & brand-wise, they're messy to deal with, but they result in a "filter" of sorts that the original project can pick & choose to upstream back into their code.
Perhaps something where you can build a graph of who invited whom so you could prune entire sections that act maliciously. One might even consider it a to be a web of connections which are built on (or torn down by the loss of) trust.
Sounds futuristic. Maybe it's an NFT on an agentic blockchain for deep-sea solar farm mining?
I recently just started using Claude/ChatGPT/China models for some PS3 homebrew work.
Every model seemingly falls flat in this scope of programming. The PS3 is very complex and the tooling is fairly undocumented in a lot of instances. It doesn't surprise me most of these AI PR's are nonsense.
If anyone else has attempted writing PS3 homebrew apps using AI and has refined their tooling/systems/automation please let me know how you got the agents to work for you (:
I've been working on a project myself over the last few weeks where the documentation is quite minimal. To no surprise the LLMs fell flat at being able to generate any sort of meaningful code. However, I realized that if I focused first on building out documentation and coding tools (linters, parsers, formatters, etc...), LLMs can do a decent job at solving fundamental problems.
I like to send Claude Code or Codex on max settings off to try a problem in parallel while I work on it.
In a complex codebase it’s funny how often they’ll come back with gigantic commits that just make everything worse or accomplish the goal but have 1000 lines of unnecessary complexity.
Every time they present it with a confident summary. I can see how a junior or just lazy dev would think this is their ticket to becoming a contributor to a repo with some big thing to put on their resume.
This is like when everyone started opening code of conduct addition PRs against every opensource repos except now a lot of these AI ones take actual effort to know it's in bad faith.
Or maybe it's worse because a lot of them aren't in bad faith they are well meaning people who just don't know or understand enough to realize they aren't being helpful.
I'm curious what percentage of PRs are just the AI blindly writing code and submitting a PR without testing, and which have at least been locally tested to some degree. Any OS maintainers have any insights on this?
Thats the thing, what if the codebases had CLAUDE.md / AGENTS.md files, which clearly dictated that
A) tests need to pass
B) anything you write needs tests
C) the code quality must adhere to these standards
etc.etc.... Helping the LLMs that people Vibe code with, produce better quality results.
By not having these in place, it means people who want to help out, cant. because htey dont understand whats going on.
adding stuff to these files, woudl allow developers to give guidelines / guardrails for developement using these agents.
Should the barrier of entry be someone who knows how to code? or should the barrier of entry be someone who is motivated to help with open-source software.
> Should the barrier of entry be someone who knows how to code? or should the barrier of entry be someone who is motivated to help with open-source software.
The motivation to help the OSS project should also come with the obligation to learn how the software operates, at least on a conceptual level. The desire to help does not grant people the pass to sledgehammer their way into adding in a feature.
> Should the barrier of entry be someone who knows how to code? or should the barrier of entry be someone who is motivated to help with open-source software.
Probably yes? QED submitting slop PRs is not helping. If "helping" is sticking it through an LLM, the developers can do that themselves with better insight and guidance? If you must help via an LLM, donate cash for tokens.
If you can't code, and cant donate cash/machine time, help by confirming issue reproductions, design, wikis, documentation, whatever.
What motivation? Is it motivation to start Claude Code and let it run when you have no idea what’s going on? Is motivation the same as token spend?
Yes, the barrier should definitely be someone who knows how to code when submitting, well, code.
And since the training data seems to be very lacking, no amount of markdown would fix that.
I agree, and yet I think even with a well engineering agent harness, there are a lot of unknown unknowns out there.
I imagine the problem will persist if users continue to submit PRs that pass the harness without being able to validate for themselves that it actually works.
We've seen a few takes on this kind of issue, but the solution I liked the best was the linux "developers take full responsibility" approach. The "Assisted-by:" tag was a pretty nice touch too.
The article unfortunately feels more like a rant than a good exploration of the problem space.
I've struggled with this "responsibility" take. What does it mean in the context of an open source project? As far as I understand it, the original contributors of bugs are often not the ones fixing them (though they can be). Is it that if you write enough buggy code you get banned as a contributor? Is it that you're not allowed to say Claude ate my homework?
> Is it that if you write enough buggy code you get banned as a contributor?
If this is a consistent issue, your contribution would (ideally) be continuously put into a backlog until someone else with no connection to you verifies that it's as bug-free as it appears to be. (Excluding non-obvious security & performance issues)
> Is it that you're not allowed to say Claude ate my homework?
Yes. As the contributor, you should be the first one to look over the code, not someone else.
> the solution I liked the best was the linux "developers take full responsibility" approach.
The people who can realistically submit a Linux patch that will ever get looked at is already a super select group through who-you-know network effects.
You can't apply the same system to random open source projects, the best option for people that run random small to medium sized open source projects is just to ban all unsolicited PRs, otherwise you're going to spend way too much effort sorting through the slop.
I don't think that is true at all, I'm just a random FOSS dev with no connection to the Linux kernel community and I have gotten two small commits into the Linux kernel.
If you look into the arcane architecture of the Playstation 3 console you quickly gain an appreciation for just how impressive the RPCS3 emulator is. PS3 is definitely one of hardest emulation targets, so it's wild they have the majority of the library working (with enhancements like upscaling and higher frame rates on many of the titles).
I guess it's nice people want to help and AI assisted coding can be fine but I can't imagine submitting a PR to a high-profile, much-revered project like that without reviewing and thoroughly testing it myself.
what is the appeal of blindly blasting open source projects with high-volume PRs? If you're trying to help the project to accomplish something, it doesn't follow that a firehose approach is tenable, if only for the fact that reviewing the code takes time.
At this point these could just be gamers who want to play a game and are being annoyed by something not being right.
Maybe they use Claude or whatever and tell it to fix the problem and then just blindly submit it.
I could see people doing that without knowing enough to be able to compile and test the code, ignoring whether it’s good or not. So they just submit it and hope it gets merged to “fix” the problem, having no understanding of what’s involved or how much of a burden that is.
Now imagine a whole bunch of people doing that for a whole bunch of really complex bugs in 75 different games. It’s not like the PlayStation three was a simple system.
> what is the appeal of blindly blasting open source projects with high-volume PRs?
The prestige of being "the one that added feature X to OSS project Y". The things that would've been actually useful (bug diagnostics/troubleshooting, merging duplicate issues & PRs) do not offer the same level of prestige.
At some point it used to be in order to have things that you can show got merged into popular public projects in job interviews, but I'm not sure that is the case anymore since some of these people have no intention (as far as i can tell) of finding a SE job
The emulation space is particularly bad about this because there are a lot of semi-technical and "well meaning" users who will do anything to get their games to play better and AI gives them a way to make it seem like they are doing something useful, without being able to judge the quality of the output they are producing.
One of the projects I work on recently had a guy drop by and explain that he wanted to use Claude to clean up our backlog and he absolutely could not fathom why I kept bringing up that we would only accept PRs that reduced our work instead of increasing it. "Do you know what Opus 4.7 is?" "Why are you so close-minded?". Unfortunately it is very hard for these users to understand that the thing they are using has a bar for quality and the bugs that still slip through cannot be solved by waving a magic wand at it.
A good argument to use could be: I can use Claude myself, so I will if I need to, but you using Claude on my behalf doesn’t save me any work, it just introduces another layer of noise into the mix. (Yes calling the guy “noise” haha)
Over the last month, I've been using Claude to assist in some things that were at the edge of my ability (or maybe just a hair's breadth beyond it). I've added features to open source projects that everyone's been waiting years for. I always fork it telling myself that I want to be able to submit PRs, but really I'm just making the changes for myself, since I don't even have the nerve to show it off.
If these people can make changes to the emulators that will actually make the games more playable for them, the changes don't have to go back into the official project. It works for them and makes things better.
Right now, I've been working on some changes to the mkv container spec to have embedded scripting cable of doing Black Mirror: Bandersnatch in interactive mode. VLC and mpv. I've already added mutable torrent support to Transmission, and it works. But yeh, if someone took a look at it who really knew the code, they'd see it was AI slop and do a hard pass.
I’ve read so many stories like this that I’ve actually gotten scared of making PRs open source projects.
There’s one in particular where a feature I really wanted didn’t exist, so I forked and had Codex 5.5 assist with building the feature on my local version. It works perfectly. My life has been improved in being able to have this feature now.
Normally I’d want to share it back with the community so others can benefit as well (presumably if I wanted this feature, others probably want it too.) But…I am not pretending this is perfect, great, or even good code. I spent about an hour total on it - it works, I haven’t had any issues with it, but it’s probably slop by any hard-core engineering account. And I neither want to get attacked for submitting slop nor do I have the time to properly engineer it to be hand-coded, so the net result is that it lives on my machine alone.
Is this the right outcome? I feel guilty that I’m getting a better version of this software and others aren’t. I want to help makes others lives easier too, but I don’t want to burden the project maintainers or get yelled at for submitting slop.
First, you don't have to feel guilty of anything, since forking open source projects to make changes tailored to your use case is as old as open source itself. It is, in fact, the primary benefit of open source.
Second, it is not a given that your change would be accepted regardless of who wrote it. Maybe the feature is too niche for its complexity, maybe it is better implemented with more generality or extensibility that does not make sense for your own use. In those cases, your change might have been rejected upstream, so having it only locally is a perfect fine solution.
Third, if you believe it is actually useful for broader users, open an issue requesting that feature, and say LLM implemented it in an hour. Then the maintainers can prompt their own LLM to implement it with ease, or do whatever they want with their project.
I did this recently too, didn't really care about the code quality of a small tool, just asked Claude to add in the features I wanted and it produced something that worked.
I just pushed the changes to my fork of the project and left it at that. Leaves the feature around for me and anyone that stumbles across my fork, without wasting the original dev's time looking at code I didn't care to look at.
Even before AI coding I think it was relatively common to fork some code and edit it to have something you want, then to either leave it as a personal version, or to never actually get a response on the PR.
It isn't clear that AI generated code is copyrightable, so that portion of the code wouldn't able to have the license enforced against violators, and so the authors wouldn't accept such code. Of course if its permissively licensed, the authors probably don't care to enforce the license, so might be fine including the code.
To submit the code, at minimum, you should review and fix the code diff, run the appropriate static analysis tools against it, write the pull request description and commit messages yourself, read the contribution guidelines, make sure everything matches that, disclose that you used AI and for what, and the prompts used.
Just go for it. Do it enough, and over time you'll either find yourself resilient enough, or conclude that people do not actually deserve it (or rather that you do not deserve the struggle), and you'll be cured of this compulsion. The only way to go is forward.
As a maintainer, discovering that a PR is AI-generated just absolutely saps any motivation I have to actually review it. I've never been a great reviewer, and AI means I have to watch out for really different kinds of errors. There's also the potential for extra friction with interactions with the "author": some people try to pull a "I'm just a smol bean, not a programmer, how dare you ask me to do anything" in response to changes, while others just play a middleman role in between you and the AI they're using.
If you're actually motivated to get a working fix upstream, and you're willing to do more than be a passive player, then it's not necessarily a problem to submit it (subject to responsible disclosure, of course)... but you also say that you don't have the time to properly engineer it, which makes me think you don't have the time to be sufficiently engaged in the upstreaming process anyways.
AI has inverted the effort - in the past a PR meant someone had to come in, read your ticket, documentation, code and tests to successfully author a PR. Subsequently reviewing that PR would typically take less time than authoring it and you would receive fewer PRs.
Now it is it the opposite, maintainers are flooded with low effort PRs that take more effort to review than author, but the author is unable to see why this is problematic to the maintainer and the project.
Exvuse me, I've been doing drive by manual slop PRs for at least a decade.
I certainly didn't read a ticket; I ran into the problem myself. I probably didn't read documentation or write tests either. I just fixed my problem and tried to help others a bit.
If you don't have time to properly engineer it, then you can't submit. Why would you feel guilty? Others can throw a coin in the laundromat, too, if they are so inclined.
I'm glad it works for you, but please do not submit low-effort stuff like this, if you're not willing to do the rest of the work to make it maintainable.
I get the desire to help -- that's fine -- but AI code is abundant and of low value. Don't sandbag them with more work and increase their maintenance burden, with stuff they could easily vibe code themselves.
Yes, if you can't vouch for the quality of the code that is the correct outcome. The long term health and maintainability of an open source project takes precedence over adding another feature. This was the case before repos were flooded with AI slop as well. Virtually no project would have accepted a random code dump if the person submitting it does not understand it because that just means the burden falls on someone else which would very quickly get any software project into big trouble.
I just took a look at the RPCS3 PR history, and it doesn't look that bad. (Certainly worse than "no slop", but not what I'd call a flood.)
I went 10 pages back on GitHub, and the overwhelming number of PRs look like good PRs that have been merged. There's really only a single handful of rejected slop-looking PRs. (And another handful from a single user who seemingly didn't know how to use Git/GitHub and was turning local non-compiling commits into PRs somehow.)
I’m hopeful that in a year or so the models will be good enough to help productively with emulator development and that you will see a similar shift to these PRs that you did with security this spring.
Aww, PRs no longer open/welcome? Whatever will the usual suspects parrot now?
My personal schadenfreude aside, I wonder if this will follow a similar trajectory as security bug reports did recently. I'd be surprised, for a number of reasons, but the overall shape is looking awfully similar.
This problem existed before AI, but it is now just worse due to the spamming nature of these "contributors". It's another form of endless September where people unfamiliar with the norms of team software development are overwhelming existing project maintainers faster than maintainers can teach them the norms of behaviour.
In the end, some sort of gatekeeping mechanism is needed to avoid overwhelming maintainers, whether it's a reputation system, membership in an in-group, or something else.
Something like a big emulator is very complex and has a LOT of motivated users who aren’t going to be able to make quality submissions.
So they get it in volume where it may be nearly impossible to deal with.
Logistically & brand-wise, they're messy to deal with, but they result in a "filter" of sorts that the original project can pick & choose to upstream back into their code.
Sounds futuristic. Maybe it's an NFT on an agentic blockchain for deep-sea solar farm mining?
Why are they doing that? Who knows.
Every model seemingly falls flat in this scope of programming. The PS3 is very complex and the tooling is fairly undocumented in a lot of instances. It doesn't surprise me most of these AI PR's are nonsense.
If anyone else has attempted writing PS3 homebrew apps using AI and has refined their tooling/systems/automation please let me know how you got the agents to work for you (:
In a complex codebase it’s funny how often they’ll come back with gigantic commits that just make everything worse or accomplish the goal but have 1000 lines of unnecessary complexity.
Every time they present it with a confident summary. I can see how a junior or just lazy dev would think this is their ticket to becoming a contributor to a repo with some big thing to put on their resume.
Or maybe it's worse because a lot of them aren't in bad faith they are well meaning people who just don't know or understand enough to realize they aren't being helpful.
There’s no need to test the PR when you already asked the AI to not make any mistakes.
A) tests need to pass
B) anything you write needs tests
C) the code quality must adhere to these standards
etc.etc.... Helping the LLMs that people Vibe code with, produce better quality results.
By not having these in place, it means people who want to help out, cant. because htey dont understand whats going on.
adding stuff to these files, woudl allow developers to give guidelines / guardrails for developement using these agents.
Should the barrier of entry be someone who knows how to code? or should the barrier of entry be someone who is motivated to help with open-source software.
The motivation to help the OSS project should also come with the obligation to learn how the software operates, at least on a conceptual level. The desire to help does not grant people the pass to sledgehammer their way into adding in a feature.
This strikes me as the ideal LLM first contribution/PR, a file explaining the projects standards and testing and structure.
Probably yes? QED submitting slop PRs is not helping. If "helping" is sticking it through an LLM, the developers can do that themselves with better insight and guidance? If you must help via an LLM, donate cash for tokens.
If you can't code, and cant donate cash/machine time, help by confirming issue reproductions, design, wikis, documentation, whatever.
And since the training data seems to be very lacking, no amount of markdown would fix that.
I imagine the problem will persist if users continue to submit PRs that pass the harness without being able to validate for themselves that it actually works.
The article unfortunately feels more like a rant than a good exploration of the problem space.
If this is a consistent issue, your contribution would (ideally) be continuously put into a backlog until someone else with no connection to you verifies that it's as bug-free as it appears to be. (Excluding non-obvious security & performance issues)
> Is it that you're not allowed to say Claude ate my homework?
Yes. As the contributor, you should be the first one to look over the code, not someone else.
The people who can realistically submit a Linux patch that will ever get looked at is already a super select group through who-you-know network effects.
You can't apply the same system to random open source projects, the best option for people that run random small to medium sized open source projects is just to ban all unsolicited PRs, otherwise you're going to spend way too much effort sorting through the slop.
I guess it's nice people want to help and AI assisted coding can be fine but I can't imagine submitting a PR to a high-profile, much-revered project like that without reviewing and thoroughly testing it myself.
Maybe they use Claude or whatever and tell it to fix the problem and then just blindly submit it.
I could see people doing that without knowing enough to be able to compile and test the code, ignoring whether it’s good or not. So they just submit it and hope it gets merged to “fix” the problem, having no understanding of what’s involved or how much of a burden that is.
Now imagine a whole bunch of people doing that for a whole bunch of really complex bugs in 75 different games. It’s not like the PlayStation three was a simple system.
The prestige of being "the one that added feature X to OSS project Y". The things that would've been actually useful (bug diagnostics/troubleshooting, merging duplicate issues & PRs) do not offer the same level of prestige.
One of the projects I work on recently had a guy drop by and explain that he wanted to use Claude to clean up our backlog and he absolutely could not fathom why I kept bringing up that we would only accept PRs that reduced our work instead of increasing it. "Do you know what Opus 4.7 is?" "Why are you so close-minded?". Unfortunately it is very hard for these users to understand that the thing they are using has a bar for quality and the bugs that still slip through cannot be solved by waving a magic wand at it.
If these people can make changes to the emulators that will actually make the games more playable for them, the changes don't have to go back into the official project. It works for them and makes things better.
Right now, I've been working on some changes to the mkv container spec to have embedded scripting cable of doing Black Mirror: Bandersnatch in interactive mode. VLC and mpv. I've already added mutable torrent support to Transmission, and it works. But yeh, if someone took a look at it who really knew the code, they'd see it was AI slop and do a hard pass.
There’s one in particular where a feature I really wanted didn’t exist, so I forked and had Codex 5.5 assist with building the feature on my local version. It works perfectly. My life has been improved in being able to have this feature now.
Normally I’d want to share it back with the community so others can benefit as well (presumably if I wanted this feature, others probably want it too.) But…I am not pretending this is perfect, great, or even good code. I spent about an hour total on it - it works, I haven’t had any issues with it, but it’s probably slop by any hard-core engineering account. And I neither want to get attacked for submitting slop nor do I have the time to properly engineer it to be hand-coded, so the net result is that it lives on my machine alone.
Is this the right outcome? I feel guilty that I’m getting a better version of this software and others aren’t. I want to help makes others lives easier too, but I don’t want to burden the project maintainers or get yelled at for submitting slop.
What’s the future look like here?
Second, it is not a given that your change would be accepted regardless of who wrote it. Maybe the feature is too niche for its complexity, maybe it is better implemented with more generality or extensibility that does not make sense for your own use. In those cases, your change might have been rejected upstream, so having it only locally is a perfect fine solution.
Third, if you believe it is actually useful for broader users, open an issue requesting that feature, and say LLM implemented it in an hour. Then the maintainers can prompt their own LLM to implement it with ease, or do whatever they want with their project.
I just pushed the changes to my fork of the project and left it at that. Leaves the feature around for me and anyone that stumbles across my fork, without wasting the original dev's time looking at code I didn't care to look at.
Even before AI coding I think it was relatively common to fork some code and edit it to have something you want, then to either leave it as a personal version, or to never actually get a response on the PR.
I feel like the issue is people contributing code they don't understand and presenting it as if they do.
To submit the code, at minimum, you should review and fix the code diff, run the appropriate static analysis tools against it, write the pull request description and commit messages yourself, read the contribution guidelines, make sure everything matches that, disclose that you used AI and for what, and the prompts used.
For practically no effort, you were able to customise free software to your liking.
That's a surprising and really cool dynamic.
Is your "about an hour of ... using Codex 5.5" really something others can't do for themselves, that it's worth communicating the change?
If you're actually motivated to get a working fix upstream, and you're willing to do more than be a passive player, then it's not necessarily a problem to submit it (subject to responsible disclosure, of course)... but you also say that you don't have the time to properly engineer it, which makes me think you don't have the time to be sufficiently engaged in the upstreaming process anyways.
Now it is it the opposite, maintainers are flooded with low effort PRs that take more effort to review than author, but the author is unable to see why this is problematic to the maintainer and the project.
I certainly didn't read a ticket; I ran into the problem myself. I probably didn't read documentation or write tests either. I just fixed my problem and tried to help others a bit.
Tldr, pr review has always been hard.
I'm glad it works for you, but please do not submit low-effort stuff like this, if you're not willing to do the rest of the work to make it maintainable.
I get the desire to help -- that's fine -- but AI code is abundant and of low value. Don't sandbag them with more work and increase their maintenance burden, with stuff they could easily vibe code themselves.
Why? None of what you did is special. What stops anyone else from asking their AI to implement the same feature you did, if they need it?
Yes, if you can't vouch for the quality of the code that is the correct outcome. The long term health and maintainability of an open source project takes precedence over adding another feature. This was the case before repos were flooded with AI slop as well. Virtually no project would have accepted a random code dump if the person submitting it does not understand it because that just means the burden falls on someone else which would very quickly get any software project into big trouble.
I went 10 pages back on GitHub, and the overwhelming number of PRs look like good PRs that have been merged. There's really only a single handful of rejected slop-looking PRs. (And another handful from a single user who seemingly didn't know how to use Git/GitHub and was turning local non-compiling commits into PRs somehow.)
But in such a niche area where the documentation or other solutions often flat out don’t exist how are they supposed to get better through training?
My personal schadenfreude aside, I wonder if this will follow a similar trajectory as security bug reports did recently. I'd be surprised, for a number of reasons, but the overall shape is looking awfully similar.