

It does seem advantageous to the defender.
Another factor Mozilla didn’t mention (and that Anthropic wouldn’t like to emphasize) is that major LLMs are pretty similar. And their development is way more conservative than you’d think. They use similar architectures and formats, train from the same data, distill each other, further pollute the internet with the same output and so on. So if (for example) Mozilla red teams with Mythos, I’d posit it’s likely that attacker LLMs would find the same already-patched bugs, instead of something new.
…So yeah. I’d wager Mozilla’s sentiment is correct.




Eh, I don’t totally agree. AI can discover novel exploits that aren’t already in some database, and likely have in this case.
I’m just saying the operating patterns between different LLMs are more similar than you’d expect, like similar tools from the same factory.