Evaluating and mitigating the growing risk of LLM-discovered 0-days

(red.anthropic.com)

52 points | by lebovic 1 day ago

10 comments

lebovic 9 hours ago
The post is light on details, and I agree with the sentiment that it reads like marketing. That said, Opus 4.6 is actually a legitimate step up in capability for security research, and the red team at Anthropic – who wrote this post – are sincere in their efforts to demonstrate frontier risks.
Opus 4.6 is a very eager model that doesn't give up easily. Yesterday, Opus 4.6 took the initiative to aggressively fuzz a public API of a frontier lab I was investigating, and it found a real vulnerability after 100+ uninterrupted tool calls. That would have required lots of of prodding with previous models.
If you want to experience this directly, I'd recommend recording network traffic while using a web app, and then pointing Claude Code at the results (in Chrome, this is Dev Tools > Network > Export HAR). It makes for hours of fun, but it's also a bit scary.
samfundev 20 hours ago
Glad to see that they brought in humans to validate and patch vulnerabilities. Although, I really wish they linked to the actual patches. Here's what I could find:
https://cgit.ghostscript.com/cgi-bin/cgit.cgi/ghostpdl.git/c...
https://github.com/OpenSC/OpenSC/pull/3554
https://github.com/dloebl/cgif/pull/84
[-]
- shoo 11 hours ago
  Yeah, having a layer of human experts to sanity check and weed out hallucinated false positive issues seems like an important part of this process:
  > To ensure that Claude hadn’t hallucinated bugs (i.e., invented problems that don’t exist, a problem that increasingly is placing an undue burden on open source developers), we validated every bug extensively before reporting it. [...] for our initial round of findings, our own security researchers validated each vulnerability and wrote patches by hand. As the volume of findings grew, we brought in external (human) security researchers to help with validation and patch development.
  Based on the experiences shared by curl's maintainers over the last couple of years, resulting in them ending their bug bounty program [1] [2] [3], I'd suggest the "growing risk of LLM-discovered [security issues]" is primarily maintainers being buried under a deluge of low-effort zero-value LLM-hallucinated false positive security issue reports, where the reporter copy-pastes LLM output without validation.
  [1] https://daniel.haxx.se/blog/2026/02/03/open-source-security-...
  [2] https://daniel.haxx.se/blog/2026/01/26/the-end-of-the-curl-b...
  [3] https://daniel.haxx.se/blog/2025/07/14/death-by-a-thousand-s...
  [-]
  - sublinear 5 hours ago
    Ending a bug bounty program seems like a mistake.
    Why not just change the incentives? Don't pay for patches. Move the money over to human review of the infinite cesspool with an emphasis on how the findings are presented. Maintainers rank and filter by how concise the reviews are and how critical the bugs are. Stop allowing wide open pull requests for bugs and make that it's own new workflow.
    Bugs rarely happen in isolation and many are regressions. Many are related to features added or refactors. Fixing bugs should be more about understanding the nature of the project than just playing whack-a-mole. LLMs don't have as good of a memory as humans and much of the meta discussion would be out-of-band for the LLMs. We shouldn't be paying for monkey work. We should be paying the humans that deeply understand "the lore" of the project and can apply it in a meaningful way.
    In the first place, it's a long time coming that some maintainers feel the pressure to take the direction of the projects more seriously, and in some cases let others step up. So many open source projects need to be stop being the stereotype of lone genius pet projects or cultish power grabs. When people whine about open source not getting paid, this is the real reason why. It's not that the money or value isn't there, but a lack of confidence in the maintainers.
jsnell 2 hours ago
Precisely discussed in https://news.ycombinator.com/item?id=46902909
throwa356262 2 hours ago
I just tested this using Calude and at least with 4.5 this does not seem to be possible. The context grows very quickly and the LLM gets lost and starts hallucinating. Maybe I am missing some key ingredient here?
Of course, if you have large team of AI and security experts and an unlimited token budget things can look different.
tznoer 12 hours ago
Grepping for strcat() is at the "forefront of cybersecurity"? The other one that applied a GitHub comment to a different location does not look too difficult either.
Everything that comes out of Anthropic is just noise but their marketing team is unparalleled.
[-]
- blackqueeriroh 11 hours ago
  Did they discover a vulnerability or not?
  [-]
  - dmbche 11 hours ago
    Not
nielsbot 4 hours ago
Wondering how many of these memory errors would be caught by running the Clang Static Analyzer (or similar) on them.
https://clang-analyzer.llvm.org
Alternatively, testing these projects with ASan enabled:
https://clang.llvm.org/docs/AddressSanitizer.html
octoberfranklin 11 hours ago
This reads like an advertisement for Anthropic, not a technical article.
[-]
- blackqueeriroh 11 hours ago
  Okay, so if that’s the case, what do you have that’s constructive to say about it?
  [-]
  - irishcoffee 10 hours ago
    Their comment was constructive for me, now I’m not going to read the article.
cyanydeez 11 hours ago
Is there a polymarket on the first billion dollar AI company to 0$ by their own insecure Model deployment?
username223 9 hours ago
"Evaluating and mitigating the growing risk of LLM-developed 0-days" would be much more interesting and useful. Try harder, guys.
catlifeonmars 6 hours ago
> Our view is this is a moment to move quickly—to empower defenders and secure as much code as possible while the window exists.
Yawn.