A New Trick Uses AI to Jailbreak AI Models—Including GPT-4

When the board of OpenAI suddenly fired the company’s CEO last month, it sparked speculation that board members were rattled by the breakneck pace of progress in artificial intelligence and the possible risks of seeking to commercialize the technology too quickly. Robust Intelligence, a startup founded in 2020 to develop ways to protect AI systems from attack, says that some existing risks need more attention.

Working with researchers from Yale University, Robust Intelligence has developed a systematic way to probe large language models (LLMs), including OpenAI’s prized GPT-4 asset, using “adversarial” AI models to discover “jailbreak” prompts that cause the language models to misbehave.

While

→ Continue reading at WIRED

Similar Articles

Advertisment

Most Popular