OpenAI’s new confession system teaches models to be honest about bad behaviors

December 3, 2025

3

OpenAI announced today that it is working on a framework that will train artificial intelligence models to acknowledge when they’ve engaged in undesirable behavior, an approach the team calls a confession. Since large language models are often trained to produce the response that seems to be desired, they can become increasingly likely to provide sycophancy or state hallucinations with total confidence. The new training model tries to encourage a secondary response from the model about what it did to arrive at the main answer it provides. Confessions are only judged on honesty, as opposed to the multiple factors that are used to judge main replies, such as helpfulness, accuracy

→ Continue reading at Engadget

OpenAI’s new confession system teaches models to be honest about bad behaviors

Similar Articles

Most Popular

OpenAI’s new confession system teaches models to be honest about bad behaviors

Similar Articles

Microsoft issues emergency fix after a security update left some Windows 11 devices unable to shut down

Three months of Audible is on sale for $3 right now

Most Popular

Apple’s Siri AI will be powered by Gemini

Antonio Banderas, Danny Trejo to Star in Fantastical Sports Drama ‘Armadillo United’ (EXCLUSIVE)

It’s the last day to get up to $90 off reMarkable E Ink tablet bundles