Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

May 28, 2025

0

Anthropic’s alignment team was doing routine safety testing in the weeks leading up to the release of its latest AI models when researchers discovered something unsettling: When one of the models detected it was being used for “egregiously immoral” purposes, it would attempt to “use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above,” researcher Sam Bowman wrote in a post on X last Thursday.

Bowman deleted the post shortly after he shared it, but the narrative about Claude’s whistleblower tendencies had already escaped containment. “Claude is a snitch,” became a common refrain in some tech circles

→ Continue reading at WIRED

Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

Similar Articles

Most Popular

Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

Similar Articles

Sora Has Lost Its App Store Crown to Drake and Free Chicken

Inside the Messy, Accidental Kryptos Reveal

Most Popular

Race for citywide Seattle City Council seat focuses on housing, crime reduction and local businesses

Corgi dog recovering after alleged abuse caught on camera in south Pierce County

Caleb Williams start or sit: Week 7 fantasy football advice