The Science of Detecting LLM-Generated Text

(dl.acm.org)

15 points | by vinhnx 5 hours ago

2 comments

xomiachuna 45 minutes ago
This is an article from 2024, when open weights models like llama were only beginning to emerge. With those you basically cannot reliably do any detection (as the authors admit by the end).
Which is really boiling down to text having statistically very similar properties to human generated one. Introduce a more motivated attacker and the text would be indistinguishable from real (with occasional typos, no use of "delve", "it's not x its y", emdashes and so on).
It really is a lost battle: you cannot embed extra information in the text that will survive even basic postprocessing (in contrast to, say, steganography)
giancarlostoro 1 hour ago
I see a lot of people claiming just about everything is AI these days, including totally normal videos, photos and text. I'm not sure what the solution will be to this phenomena but we're in for a bit of trouble for a while.